Infrastructure Drift: Definition, Detection, and Management
Using Infrastructure as Code (IaC) is the preferred method of managing the infrastructure which also safeguards the entire process of cloud provisioning. Understanding the resources that are not managed by IaC in the cloud is a challenge. Furthermore, finding whether they remain in the same cloud defined in the code is yet another task.
Frequent changes in the cloud workload lead to an increase in the number of running workloads, which in turn leads to an increase in the number of services communicating with the infrastructure and the people working alongside it. All of these changes will make the codebase larger, making it a complex task to track changes. With that in mind, drift detection is utilized. This article will elaborate all about infrastructure drift detection, why you need it, and the tools that will help in this task.
What is Infrastructure Drift?
Infrastructure drift is when the state of infrastructure does not match with the defined state in the Infrastructure as Code (IaC) configurations. Having infrastructure drift can make tracking IaC codebases harder and more complex, especially when there are too many of them. The causes of such drift can be many, but higher IaC coverage of cloud resources is proven to diminish infrastructure drift.
What Causes Infrastructure Drift?
There are several factors that cause infrastructure drift in the cloud. Given below are some of the actions that can lead to infrastructure drift.
- Manual Changes: One of the most common reasons for infrastructure drift is manual changes done by personnel. There are many stances when the professional has to increase the number of resources to manage a rising load. When the load is high, such modification in the number of resources is acceptable as the focus is to adapt to the changing conditions. However, the issue arises when any such change is not reverted to the IaC definition files. Such action will lead to an infrastructure drift. Moreover, this was just a single possibility of drift caused by manual change. Other practices like improper communication, poor IaC practices, and inappropriate permissions, among others can also cause the same issue.
- Conflicting IaC Code: Sometimes, sets of IaC definition files are used to manage resources. In these cases, implementation of such definition files can revert the changes by any other file. This can lead to conflicting IaC code which is not a direct fault of human intervention. On the other hand, when different teams manage infrastructure over time, it can lead to overlapping boundaries resulting in infrastructure drift.
- Inappropriate Functioning of Microservices: Microservices should perform as desired to attain the right outcome. Another major cause of infrastructure drift in the cloud is the inappropriate performance of microservices. Inadequate functioning of microservices can cause the App to be out of its desired configuration and operation.
Repercussions of Unmanaged Infrastructure Drift:
Without a doubt, minimizing or managing infrastructure drift is crucial. However, the question is what will happen if infrastructure drift is left unchecked. The answer to that question is explained below:
- Data Breaches: As infrastructure drift is caused by improper code configurations, it can make the entire code vulnerable to threats of cyberattacks. In simpler terms, the biggest issue with leaving infrastructure drift unmanaged is potential data breaches. Such breaches can not only put the data at risk but can lead to ransomware attacks, resulting in hefty losses to the organization.
- Deployment Failure: IT deployments are necessary for a cloud environment and failure of any such deployment can affect the entire infrastructure. Whenever there is a deployment failure, the designated team will begin identifying the issue and one of the major issues of such failure is infrastructure drifts. The change in the configuration in code can create unwanted issues in the infrastructure, resulting in failure.
- Downtime: For an organization, there is nothing worse than facing downtime. Leaving infrastructure drift unmanaged for a prolonged period of time can cause application downtime or frequent crashes. Facing such downtime can ruin the user experience, hamper the organization’s image, and cause major losses as well.
What is Drift Detection?
When it comes to identifying whether your infrastructure is facing any drift or not, then drift detection should be initiated. It is a process that detects infrastructure drift in the cloud infrastructure. Certain drift detection tools are used which provide a detailed report in a way that assists developers in diagnosing and fixing the issue in the cloud infrastructure.
Drift detection indirectly helps in finding misconfigurations in the IaC, making the infrastructure more secure. Managing infrastructure with IaC through automation can make you believe that your infrastructure is highly secure which might not be entirely true. With drift detection, you can monitor the automated actions that allow you to build security in the infrastructure lifecycle.
Infrastructure Drift Detection Tools:
Performing infrastructure drift detection is a challenge without using the right tools. With that in mind, here are the top tools that can help you in drift detection, be it managed or unmanaged.
- Terraform: Developed and managed by Hashicorp, Terraform is one of the most widely used tools to detect infrastructure drifts in the cloud infrastructure. Using Terraform is a consistent method of detecting drifts on managed resources and it also supports all the other Terraform resources as well.
You can detect a Terraform drift by comparing the Terraform state file with the monitoring metrics of the infrastructure.
Furthermore, if the drift is detected in the state file and you want to fix it post-making the necessary changes outside Terraform, you can do that by using the Terraform plan.
However, Terraform can only detect the drift of resources that are managed by Terraform and not by any other configuration management tool.
- driftctl: driftctl is an open-source command line interface that is specifically designed to detect infrastructure drift in the DevSecOps environment. With this tool, you can detect any change not just in your workflow but outside of it as well.
One of the best things about this tool is that you can schedule checks where the tool will automatically scan the code and provide an in-depth report on any drift detected. Moreover, you can also track and detect managed and unmanaged infrastructure drift.
On the contrary, the biggest drawback of this open-source tool is that it does not support all the Terraform resources. Moreover, users have reported API throttling errors during scans.
- CloudQuery: CloudQuery is another open-source cloud asset inventory tool that is capable of infrastructure drift detection. This tool attains resources from necessary cloud providers, and loads them to PostgreSQL. Apart from that, it builds a command of drift detection on top of it, converting the drift into a data issue. It supports multiple state file scanning and is capable of detecting unmanaged resources using a simple command. However, it requires a SQL database to perform and its support for all the Terraform resources is missing as well.
How to Fix Infrastructure Drift through Tools?
Tools used for detecting infrastructure drift can also help in fixing it. Drift has a non-empty list of proposed changes where no change is made in the definition files. In that case, the drift can be fixed by the implementation of those proposed changes in the state. The usage of the right tools can help in removing this infrastructure drift to bring it back to the required infrastructure.
#1:Terraform– Terraform is among the most popular tools for removing infrastructure drift. To accomplish that task, you need to run the Terraform apply command.
#2:AWS CloudFormation– AWS CloudFormation is capable of removing drift, but only in limited stances. For instance, it can fix a drift by recreating a resource when it is missing. On the other hand, alteration in any property of a resource might go unnoticed. In that case, you need to perform the reversal manually through the following process.
- The first thing that you should do is add a DeletionPolicy attribute, and set it to Retain, to the resource. Doing so will keep the existing resource when it is eradicated from the stack.
- Afterward, you should remove the resource from the template followed by running a stack update operation.
- In the next step, you should add the same resource to the stack to fix infrastructure drift. This can be done by defining the state of the resource in the stack template and importing it back to the stack.
#3:Spacelift– Spacelift is also a tool for drift detection and fixing. Once an infrastructure drift is detected, this tool can provide an option to revert the changes that are identified by following the workflow used for IaC code changes and implementing configured safety components as well.
How to Avoid Infrastructure Drift?
Honestly, infrastructure drift is inevitable and will happen to every IaC at some point in time. Still, the best approach is to take all the measures to avoid infrastructure drift as much as possible. Here are the primary actions that will help you in minimizing the possibility of infrastructure drift.
- Principle of Least Privilege: One of the biggest causes of infrastructure drift in IaC is manual changes made in the code and controlling this factor will reduce the possibility of drift exponentially. The solution to this issue is applying the principle of least privilege. Giving permission to an engineer or any other professional for only necessary tasks. Doing so will ensure that fewer people can modify the infrastructure manually, thus reducing the possibility of infrastructure drift due to manual changes.
- Know about Backport/Reversion: Implementing the principle of least privilege is surely a step in accomplishing this goal, but sometimes, engineers with the infrastructure access end up causing infrastructure drift. Every personnel with infrastructure access should be aware of the process to backport or revert the changes executed within the right time to minimize drift possibility.
What to Consider while Picking an Infrastructure Drift Tool?
As mentioned above, there are multiple tools available for detecting and removing infrastructure drift from the cloud. However, the question is how to pick the right one. Before making a decision, you need to consider the biggest factor, the level of access you are ready to give to the tool. There are different categories of giving access including read-only, full access, least-privileged, and many others.
For instance, tools like Terraform need full access to perform adequately. On the other hand, other tools like driftctl require least-privileged access to run. Pick the tool according to your requirements and consider the access level that you can provide to the tool for the best drift detection and mitigation.
Infrastructure Drift Detection and Management with ThinkSys:
If you want professional assistance in infrastructure drift detection and management, then ThinkSys is the name you can rely upon. With years of experience combined with the expertise of our professionals, ThinkSys is sure to help you in eradicating drift from your cloud. ThinkSys will take all measures to secure your infrastructure lifecycle and minimize drift exponentially.
Detecting drift in IaC infrastructure is the first step toward drift management. ThinkSys can provide you with a drift detection service so your code can remain secure.
- Detect drift within the IaC infrastructure.
- Run different scans on the cloud.
- Use suitable tools as per the requirements to detect infrastructure drift.
- Secure cloud from the vulnerabilities caused due to drift.
Unmanaged resources can hamper the overall IaC. ThinkSys will perform a systematic test on all the resources to scale existing IaC.
- Identify manually created and unmanaged resources.
- Follow the right practices to fix all the resources.
- Analyze and understand the issue within IaC.
Unmanaged resources should be handled with caution as the state or configuration file does not have any such resource defined. With the right approach, ThinkSys can ensure that all your unmanaged resources are taken care of.
- Find unmanaged resources in the IaC.
- Reporting of drift resources to developers.
- Determine the components that should be deleted or fixed
- Provide an evaluation report on the output.
What are the causes of infrastructure drift?
Infrastructure drift makes the code vulnerable to attacks while causing significant downtime as well which is why managing it is essential. The primary causes of infrastructure drift are:
- Manual changes.
- Conflicting IaC code.
- Inappropriate functioning of microservices.
What is the difference between managed and unmanaged resources?
There are two types of drifts; managed and unmanaged and they are quite different from each other.
- Managed resources – In managed resources, all the resources applied and deployed to the cloud will be included in the state and configuration files. In that case, the IaC tool can detect any change made outside the file.
- Unmanaged resources – Here, the configuration or the state file will not have the resource defined, making it highly complex to detect any change causing the drift.
What are the top tools for drift detection?
Currently, many tools are available that can help in detecting drift. Here is a list of the top tools that you can consider for your infrastructure.