Skip to content

Terraform Code Quality

Key Steps to a good quality for your Infrastructure Code.

Terraform code quality

This article is an transcript from a talk held at the London Cloud Native Computing Foundation Meetup

The original talk is available here

Terraform code quality is important and there are a lot of tools to improve it. A lot of them are quite difficult to use. Here are a few tools that we find really useful and can be set up in minutes for you.

Terraform code quality starts by the basics with Terraform Validate

Terraform works with providers for each cloud and has resources. Basically, you can see it as an instance to launch in which you describe what you want. Let’s see how internal tools can help you improve your Terraform code quality.

Terraform validate is a subcommand in Terraform that will only address structure and coherence, which means that an obviously bad code like this one will be perfectly right in the eyes of Terraform :

provider "aws" {
  region = "BOGUS"
}

resource "aws_bogus_resource" "vm" {
  ami                    = "BOGUS"
  instance_type          = "BOGUS_TOO"
  vpc_security_group_ids = ["123456789"]
  key_name               = "BOGUS"
  tags = {
    Name      = "CNCF London Meetup"
  }
} 
Terraform code quality Terraform validate

The only way Terraform validate will be useful for you, is if you request a bad resource or if you make a typo in a variable or that kind of thing. This command is not really useful to improve the quality or the security of your infrastructure code.

Terraform code quality Terraform validate output 2

Terraform code quality has to be tested using the right Terraform version

The second basic thing we wanted to cover is the Terraform version : as we know, developers build their code based on specific versions. In this case Terraform added some features like the support for Tencent Cloud storage or trim functions.
Terraform code quality Tencent support
terraform {
  required_version = "= 0.12.21"
  required_providers {
      aws    = "= 2.62"
      random = "~> 2.2"
      }
} 

Putting tests in a CI/CD system without the correct version of Terraform will end up with a fail while your code is perfectly correct.  
 
TFEnv is a very useful tool to have here. It is inspired by rbenv (for all the Ruby users that know this tool). Basically this tool ensures that you are testing the correct code using the correct version of Terraform, which is very handy because managing dozens of versions in your CI/CD system is a real pain and managing them by hand or with docker is just a loss of time. 

But you have to keep in mind that this tool does not improve the quality of your Terraform code. It just brings you a standard ability to test it. 

It will allow you to start building step by step a matrix of Terraform versions for your infrastructure code. We all have customers using older versions than ours or production environments slightly differing from our dev or staging environments.

terraform code quality testing matrix

You can find TFEnv at  https://github.com/tfutils/tfenv

How static checks can help you improve your Terraform code quality

Drilling down to other tools with statics checks, TFLint is the first one really worth mentioning.  Let’s go back to our first example where we requested a bogus AMI with a bogus instance type with obviously a highly unlikely security group ID, etc…
provider "aws" {
  region = "BOGUS"
}

resource "aws_bogus_resource" "vm" {
  ami                    = "BOGUS"
  instance_type          = "BOGUS_TOO"
  vpc_security_group_ids = ["123456789"]
  key_name               = "BOGUS"
  tags = {
    Name      = "CNCF London Meetup"
  }
} 
Terraform code quality TFLint

By executing TFlint, you will statically analyze your code and see that this instance type is not valid. This tool might not cover everything but a simple execution will already bring you an improvement and allow you instantaneously to catch typos and mismatches.   

But TFLint does a lot more and goes way further than a static test. TFlint also has a deep check ability, and using your credentials it will ask the Cloud provider API if the AMI you requested exists.   

Even better! It will do the same for the security group ID and also check for your ability to use it. Indeed : maybe the security group ID you requested does exist but not in your account, and in this case your code is perfectly correct but you do not have the right to use it. 

Terraform code quality TFLint Security group ID

Using that kind of tool you can start improving the quality of your code because you will not push and apply stuff that seems correct but is in fact not usable or painfully wrong in production. 

TFLint also helps with reviews :  If you are working with a team, you probably do pull requests to review your infrastructure code. Let’s imagine that someone in your team requested a T1 micro.  What you, as a reviewer would probably suggest your coworker is : “t1 is a bit old. The performances are terrible and the cost is higher than a t3 micro today. You probably do not want to use that. Please change your instance type.” 

resource "aws_instance" "cncf-example" {
  instance_type = "t1.micro"
} 
Terraform code quality TFLint instance type

The tool here will do it automatically for you if you insert it directly on the pull request in your CI/CD system. It will save you a lot of time.

It can enforce best practices as well. As you know, you can type your variables in Terraform and define them as strings ,  booleans , arrays … and add descriptions, etc… It is probably your role as a reviewer to check for variable types and descriptions and say : “ Hello, you forgot to type your variables. You should type it.” 

variable "foo" {
  default = "bar"
  # type    = string
  # description = "i'm a major variable"
} 

So same here: you can enforce that best practice directly from the pull request using TFLint. It just takes seconds and will already bring you value from something that was quite easy to deploy for you.

TFLint works with a lot of things from Terraform directly : configuration, requirements, pinned versions…. AWS and Azure (through a plugin). GCP support is coming soon and you can find it at at  https://github.com/terraform-linters/tflint

Let’s move on to a second tool now : TFSec. It will perform statics checks like TFlint, but the focus is more on security issues and is positioned very early in the process.

Put yourself in the situation where someone in your team is making a pull request like this : “here’s my secret key” or “I have a variable named password and that’s its default value”. Probably you would comment on this pull request with something like : “Hmmm… are you sure?”

provider "aws" {
  region         = "eu-central-1"
  aws_access_key = "12345"
  aws_secret_key = "s3cr3t"
}

variable "password" {
  default = "sup3rs3cr3t"
} 

Problem solved with TFSec. This tool is going to do the job for you and the user will have feedback immediately directly on the pull request. So you will gain a lot of time and the code will improve by itself. Your developer will learn on the job just by getting feedback not from you but from tools that enforce what you anyway wanted in the end.

Terraform code quality with TFSec
Let’s take a more complex example : imagine that I don’t know how to create an AWS S3 bucket So I asked it on Reddit and Reddit it gives me four lines that perfectly work. My next step would be to send them to my pull request and my boss might just read the pull request and say : “OMG! You just created a bucket with public read/write. Where did you find this code? On Reddit??? …”  TFSec would catch that directly and provide you with direct feedback. 
resource "aws_s3_bucket" "example_from_reddit" {
  bucket = "my-secret-bucket"
  acl    = "public-read-write"
  Tags = {
      Name = "my secret S3 bucket"
  } 
Terraform code quality TFSec problem 1
Terraform code quality TFSec problem 3

Again, very obvious mistakes can be corrected directly, just right at the pull request stage. Some would even like to put this earlier on the process, like directly on the developers laptop, but you can hardly force developers to configure their pre-commit hooks on laptops…   

But that’s the easy part….

What about logging? Following my previous example, with TFSec I just discovered that S3 had logging enabled. So far, I did not know that I could encrypt objects in my S3 buckets. So right now, I started with only three or four lines of code and I end up with a properly secured bucket with an encrypted configuration. So once again, through this tool, in just a couple of seconds you add a lot of value and your developer learned a lot of things and will not make this mistake again. His/Her next bucket  will be securely created. 

resource "aws_s3_bucket" "example_from_reddit" {
  bucket = "my-secret-bucket"
  acl    = "private"
  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        kms_master_key_id = aws_kms_key.mykey.arn
        sse_algorithm     = "aws:kms"
      }
    }
  }
  logging {
    target_bucket = "my-logging-bucket"
    target_prefix = "log/"
  }
  tags = {
    Name = "My Secret S3 Bucket"
  }
} 

TFSec works for Terraform on AWS, Azure and GCP and you can find it at https://github.com/liamg/tfsec

Terraform code quality : checking for compliance

So far, we talked about the initial steps, with a very static analysis and we got a bit further with TFLint by asking the API for real IDs, checks for reality, etc… 

So if you’re familiar with Terraform after this analysis step, you have the planning phase. The planning phase basically just creates all your code, a diff, and checks against the cloud provider’s API for what you are supposed to create. If you want to create a new S3 bucket, and you don’t have it, then it is going to make a plan to create it. Your next steps will be to review the plan and then to apply it to create that resource.

At this stage you might want to enforce some compliance with a tool creatively named Terraform compliance. 

Quickly making a digression here, if you’re not familiar with the cucumber BDD (behavior driven development) structure : it implies features name, a scenario and conditions : 

  • Given “blah blah blah” you are expected to have “this result”

This is executed in the background by some code, and will then provide you with some results. The BDD system is very useful to code, but in this case it is used for compliance using natural language.

In our case, if we take the previous example, we could have written something like that and it would have been perfectly supported by Terraform compliance against the plan.

  • Given I have AWS S3 Bucket defined
  • Then it must contain server_side_encryption_configuration

Let’s be more specific : 

resource "aws_s3_bucket" "example_from_reddit" {
  bucket = "my-secret-bucket"
  acl    = "private" 
Terraform code quality Terraform compliance output

In this case we created this bucket but we didn’t create a tag. We might have a process where it is very important that all our resources have tags. So we can write something very easy like this scenario :

Feature: All Resources

    Scenario: Ensure all resources have tags
        Given I have resource that supports tags defined
        Then it must contain tags
        And its value must not be null 

Why is that? 

You can create a tag with a variable, and it might work all the way through the previous steps but not just this one, because the variable went wrong. Maybe it was supposed to be a number and it ended up being null?

During the planning phase, there is some computation, and this computation can go wrong. This is why compliance checks are very important now that things are computed and not just statically analyzed.

Terraform code quality Terraform compliance output 3

Integrating those checks and their feedback in your CI/CD system (in a pull request) will give you a much better view on what your code is really doing compliance-wise.

Terraform compliance is a provider agnostic tool, including your own custom providers. There are a lot of ready to use examples and you really can get started in minutes just by using the examples they serve directly on the documentation. It is obviously security oriented by all the usual suspects, like KMS etc….

This tool also has cool features like allowing you to enforce naming conventions. Probably you want to enforce naming for various items like your country, your continent, resources… and here prefixes can be self documented and forbidden resources as well. 

In the case you need to be PCI DSS compliant, you might just be forbidden to use a list of resources at AWS that are not PCI compliant. This can be done at the compliance level way before asking the IAM user on AWS for your rights to use resources.

You can find Terraform compliance at terraform.compliance.com

Don’t forget to perform integration tests to improve your Terraform code quality.

In between the planning and the verification phases come all integration tools. We won’t spend much time reviewing them as this is not our focus here, but Terratest from Gruntwork is one of them. It is a very powerful tool but very complicated to use. Terratest is basically a GO library in which you include your code and just manipulate Terraform programmatically using pure GO. It is highly powerful. It will allow you to do absolutely anything you want with it but it is also quite complicated to use. 

Maybe you know other systems like Goss or even ServerSpec, ChefSpec etc… As those tools are quite compact and a lot of things are taken care of for you, they are easy to use. In the case of Terratest you need to do everything by yourself. The framework is here to provide you tools using a language.

Verification checks : the last step towards Terraform code quality.

And now we can skip directly to verification and focus on a tool named InSpec. 

Although it is originally from the Chef guys it is not related to Chef, but rather to Serverspec, Rspec and “whatever Spec” has been around for the past years testing server things using Rspec, and it is really complete as you will see.

The first question we need to answer is why would we need to validate what we just launched?

Because a lot of things can go wrong the correct way : let’s say we launched a simple EC2 instance in a VPC using a security group with a mix of the usual things like dynamic names and tags etc… Everything went right : it has been tested right, it has been planned right and it has been applied correctly as well.

As you see, that’s a very simple code that executed it : there is a data source (a data source is simply a request on the cloud provider’s API), a global variable that can be overridden. There is a simple internal look up for another resource on your Terraform code, and a local variable.

resource "aws_instance" "vm" {
  ami                    = data.aws_ami.amazon-linux.id
  instance_type          = var.instance_type
  vpc_security_group_ids = [aws_security_group.allow_ssh.id]
  key_name               = aws_key_pair.admin.key_name


  tags = {
    Name      = local.vm_name,
    Terraform = "true"
  }
} 

There are a lot of ways this whole mix can look right in the static analysis and in the compliance as well, but turn out to be not what you expected : for example, maybe the AMI returned exists and is correct but it is not the one you expected; or this instance type by default was something like a t3 small for example and you thought it was overridden by your value and it wasn’t… This is why checking for reality is very important and why validation exists. 

So how does it work?

It is based on simple expectations, and you can inject dynamic content straight from the Terraform state using a mix of dynamic information and static information like t3 nano versions, strings , boolean or whatever…

describe aws_ec2_instance(EC2_INSTANCE_ID) do
    it { should be_running }
    its('instance_type') { should eq 't3.nano' }
    its('security_groups') { should include(id: EC2_SG_ID, name: 'allow_ssh') }
    its('tags') { should include(key: 'Terraform', value: 'true') }
end

describe aws_security_group(group_name: 'allow_ssh') do
    it { should exist }
    its('group_name') { should eq 'allow_ssh' }
    its('vpc_id') { should eq VPC_ID }  
end 
As it is really readable, it doesn’t require a high level of coding skills. You can get started very easily and immediately gain a lot of value from it. There are a lot of situations and locations in the code where things could have gone wrong, and now you can check it directly from the Terraform state.
Terraform code quality dynamic values InSpec

If you’re not familiar with what a state is in Terraform,  it is the last thing that you get after the apply phase. So once you apply your code on your Cloud provider, you will get back a list of values that you could not have known about beforehand.

output "public_ip" {
  value = aws_instance.vm.public_ip
}

output "vpc_id" {
  value = data.aws_vpc.default.id
}

output "ec2_instance_id" {
  value = aws_instance.vm.id
}

output "aws_security_group_id" {
  value = aws_security_group.allow_ssh.id
}
 

For example, you might not know what your VPC ID is, but what you want to know is if you use the default one. You can get this right from the state with InSpec.

InSpec is a really powerful tool for that kind of checks, for AWS, GCP, Azure, Digital Ocean and much more. It has hundreds of checks and inside each check you can have a multitude of sub-checks for each and every component of your cloud systems and even for operating systems. For example, there are hundreds of checks for every kind of Linux flavor you can imagine and even Windows.

You can find InSpec at inspec.io

Wrap up

Whether you are just getting started with Terraform or if you are an experimented infrastructure as code user, it is very easy to get more value by basically at least using the correct version of Terraform and staying up to date.

Linting your code, checking for security, compliance and  validity will help you improve your Terraform code quality very easily from hundreds of different resources and cloud providers.

Terraform automation for growing teams

Speed up deployment cycles | Reduce mistakes | Empower your team

About us

CloudSkiff is an Infrastructure as code platform that provides Terraform automation and collaboration. We help growing teams safely ship infrastructure in short cycles and make their code better.