Terraform Code Quality
Key Steps to a good quality for your Infrastructure Code.

This article is an transcript from a talk held at the London Cloud Native Computing Foundation Meetup
The original talk is available here
Terraform code quality is important and there are a lot of tools to improve it. A lot of them are quite difficult to use. Here are a few tools that we find really useful and can be set up in minutes for you.
Terraform code quality starts by the basics with Terraform Validate
Terraform works with providers for each cloud and has resources. Basically, you can see it as an instance to launch in which you describe what you want. Let’s see how internal tools can help you improve your Terraform code quality.
Terraform validate
is a subcommand in Terraform that will only address structure and coherence, which means that an obviously bad code like this one will be perfectly right in the eyes of Terraform :
provider "aws" {
region = "BOGUS"
}
resource "aws_bogus_resource" "vm" {
ami = "BOGUS"
instance_type = "BOGUS_TOO"
vpc_security_group_ids = ["123456789"]
key_name = "BOGUS"
tags = {
Name = "CNCF London Meetup"
}
}

The only way Terraform validate will be useful for you, is if you request a bad resource or if you make a typo in a variable or that kind of thing. This command is not really useful to improve the quality or the security of your infrastructure code.

Terraform code quality has to be tested using the right Terraform version
trim
functions. 

terraform {
required_version = "= 0.12.21"
required_providers {
aws = "= 2.62"
random = "~> 2.2"
}
}
Putting tests in a CI/CD system without the correct version of Terraform will end up with a fail while your code is perfectly correct.
TFEnv is a very useful tool to have here. It is inspired by rbenv
(for all the Ruby users that know this tool). Basically this tool ensures that you are testing the correct code using the correct version of Terraform, which is very handy because managing dozens of versions in your CI/CD system is a real pain and managing them by hand or with docker is just a loss of time.

But you have to keep in mind that this tool does not improve the quality of your Terraform code. It just brings you a standard ability to test it.
It will allow you to start building step by step a matrix of Terraform versions for your infrastructure code. We all have customers using older versions than ours or production
environments slightly differing from our dev
or staging
environments.

You can find TFEnv at https://github.com/tfutils/tfenv
How static checks can help you improve your Terraform code quality
AMI
with a bogus instance
type with obviously a highly unlikely security group ID
, etc… provider "aws" {
region = "BOGUS"
}
resource "aws_bogus_resource" "vm" {
ami = "BOGUS"
instance_type = "BOGUS_TOO"
vpc_security_group_ids = ["123456789"]
key_name = "BOGUS"
tags = {
Name = "CNCF London Meetup"
}
}

By executing TFlint, you will statically analyze your code and see that this instance type is not valid. This tool might not cover everything but a simple execution will already bring you an improvement and allow you instantaneously to catch typos and mismatches.
But TFLint does a lot more and goes way further than a static test. TFlint also has a deep check ability, and using your credentials it will ask the Cloud provider API
if the AMI
you requested exists.
Even better! It will do the same for the security group ID
and also check for your ability to use it. Indeed : maybe the security group ID you requested does exist but not in your account, and in this case your code is perfectly correct but you do not have the right to use it.

Using that kind of tool you can start improving the quality of your code because you will not push
and apply
stuff that seems correct but is in fact not usable or painfully wrong in production.
TFLint also helps with reviews : If you are working with a team, you probably do pull requests to review your infrastructure code. Let’s imagine that someone in your team requested a T1 micro
. What you, as a reviewer would probably suggest your coworker is : “t1 is a bit old. The performances are terrible and the cost is higher than a t3 micro today. You probably do not want to use that. Please change your instance type.”
resource "aws_instance" "cncf-example" {
instance_type = "t1.micro"
}

The tool here will do it automatically for you if you insert it directly on the pull request in your CI/CD system. It will save you a lot of time.
It can enforce best practices as well. As you know, you can type your variables in Terraform and define them as strings
, booleans
, arrays
… and add descriptions, etc… It is probably your role as a reviewer to check for variable types and descriptions and say : “ Hello, you forgot to type your variables. You should type it.”
variable "foo" {
default = "bar"
# type = string
# description = "i'm a major variable"
}

So same here: you can enforce that best practice directly from the pull request using TFLint. It just takes seconds and will already bring you value from something that was quite easy to deploy for you.
TFLint works with a lot of things from Terraform directly : configuration, requirements, pinned versions…. AWS and Azure (through a plugin). GCP support is coming soon and you can find it at at https://github.com/terraform-linters/tflint
Let’s move on to a second tool now : TFSec. It will perform statics checks like TFlint, but the focus is more on security issues and is positioned very early in the process.
Put yourself in the situation where someone in your team is making a pull request like this : “here’s my secret key” or “I have a variable named password and that’s its default value”. Probably you would comment on this pull request with something like : “Hmmm… are you sure?”
provider "aws" {
region = "eu-central-1"
aws_access_key = "12345"
aws_secret_key = "s3cr3t"
}
variable "password" {
default = "sup3rs3cr3t"
}
Problem solved with TFSec. This tool is going to do the job for you and the user will have feedback immediately directly on the pull request. So you will gain a lot of time and the code will improve by itself. Your developer will learn on the job just by getting feedback not from you but from tools that enforce what you anyway wanted in the end.

S3 bucket
.
So I asked it on Reddit and Reddit it gives me four lines that perfectly work. My next step would be to send them to my pull request and my boss might just read the pull request and say : “OMG! You just created a bucket with public read/write. Where did you find this code? On Reddit??? …”
TFSec would catch that directly and provide you with direct feedback. resource "aws_s3_bucket" "example_from_reddit" {
bucket = "my-secret-bucket"
acl = "public-read-write"
Tags = {
Name = "my secret S3 bucket"
}



Again, very obvious mistakes can be corrected directly, just right at the pull request stage. Some would even like to put this earlier on the process, like directly on the developers laptop, but you can hardly force developers to configure their pre-commit hooks on laptops…
But that’s the easy part….
What about logging? Following my previous example, with TFSec I just discovered that S3 had logging enabled. So far, I did not know that I could encrypt objects in my S3 buckets. So right now, I started with only three or four lines of code and I end up with a properly secured bucket with an encrypted configuration. So once again, through this tool, in just a couple of seconds you add a lot of value and your developer learned a lot of things and will not make this mistake again. His/Her next bucket will be securely created.
resource "aws_s3_bucket" "example_from_reddit" {
bucket = "my-secret-bucket"
acl = "private"
server_side_encryption_configuration {
rule {
apply_server_side_encryption_by_default {
kms_master_key_id = aws_kms_key.mykey.arn
sse_algorithm = "aws:kms"
}
}
}
logging {
target_bucket = "my-logging-bucket"
target_prefix = "log/"
}
tags = {
Name = "My Secret S3 Bucket"
}
}
TFSec works for Terraform on AWS, Azure and GCP and you can find it at https://github.com/liamg/tfsec
Terraform code quality : checking for compliance
So far, we talked about the initial steps, with a very static analysis and we got a bit further with TFLint by asking the API for real IDs, checks for reality, etc…
So if you’re familiar with Terraform after this analysis step, you have the planning phase. The planning phase basically just creates all your code, a diff, and checks against the cloud provider’s API for what you are supposed to create. If you want to create a new S3 bucket
, and you don’t have it, then it is going to make a plan to create it. Your next steps will be to review the plan
and then to apply
it to create that resource.
At this stage you might want to enforce some compliance with a tool creatively named Terraform compliance.
Quickly making a digression here, if you’re not familiar with the cucumber BDD (behavior driven development) structure : it implies features name, a scenario and conditions :
- Given “blah blah blah” you are expected to have “this result”
This is executed in the background by some code, and will then provide you with some results. The BDD system is very useful to code, but in this case it is used for compliance using natural language.
In our case, if we take the previous example, we could have written something like that and it would have been perfectly supported by Terraform compliance against the plan.
Given I have AWS S3 Bucket defined
Then it must contain server_side_encryption_configuration
Let’s be more specific :
resource "aws_s3_bucket" "example_from_reddit" {
bucket = "my-secret-bucket"
acl = "private"

In this case we created this bucket but we didn’t create a tag. We might have a process where it is very important that all our resources have tags. So we can write something very easy like this scenario :
Feature: All Resources
Scenario: Ensure all resources have tags
Given I have resource that supports tags defined
Then it must contain tags
And its value must not be null
Why is that?
You can create a tag with a variable, and it might work all the way through the previous steps but not just this one, because the variable went wrong. Maybe it was supposed to be a number and it ended up being null?
During the planning phase, there is some computation, and this computation can go wrong. This is why compliance checks are very important now that things are computed and not just statically analyzed.


Integrating those checks and their feedback in your CI/CD system (in a pull request) will give you a much better view on what your code is really doing compliance-wise.
Terraform compliance is a provider agnostic tool, including your own custom providers. There are a lot of ready to use examples and you really can get started in minutes just by using the examples they serve directly on the documentation. It is obviously security oriented by all the usual suspects, like KMS etc….
This tool also has cool features like allowing you to enforce naming conventions. Probably you want to enforce naming for various items like your country, your continent, resources… and here prefixes can be self documented and forbidden resources as well.
In the case you need to be PCI DSS compliant, you might just be forbidden to use a list of resources at AWS that are not PCI compliant. This can be done at the compliance level way before asking the IAM user on AWS for your rights to use resources.
You can find Terraform compliance at terraform.compliance.com
Don’t forget to perform integration tests to improve your Terraform code quality.
In between the planning and the verification phases come all integration tools. We won’t spend much time reviewing them as this is not our focus here, but Terratest from Gruntwork is one of them. It is a very powerful tool but very complicated to use. Terratest is basically a GO library in which you include your code and just manipulate Terraform programmatically using pure GO. It is highly powerful. It will allow you to do absolutely anything you want with it but it is also quite complicated to use.
Maybe you know other systems like Goss or even ServerSpec, ChefSpec etc… As those tools are quite compact and a lot of things are taken care of for you, they are easy to use. In the case of Terratest you need to do everything by yourself. The framework is here to provide you tools using a language.
Verification checks : the last step towards Terraform code quality.
And now we can skip directly to verification and focus on a tool named InSpec.
Although it is originally from the Chef guys it is not related to Chef, but rather to Serverspec, Rspec and “whatever Spec” has been around for the past years testing server things using Rspec, and it is really complete as you will see.
The first question we need to answer is why would we need to validate what we just launched?
Because a lot of things can go wrong the correct way : let’s say we launched a simple EC2 instance
in a VPC
using a security group with a mix of the usual things like dynamic names and tags etc… Everything went right : it has been tested right, it has been planned right and it has been applied correctly as well.
As you see, that’s a very simple code that executed it : there is a data source (a data source is simply a request on the cloud provider’s API), a global variable that can be overridden. There is a simple internal look up for another resource on your Terraform code, and a local variable.
resource "aws_instance" "vm" {
ami = data.aws_ami.amazon-linux.id
instance_type = var.instance_type
vpc_security_group_ids = [aws_security_group.allow_ssh.id]
key_name = aws_key_pair.admin.key_name
tags = {
Name = local.vm_name,
Terraform = "true"
}
}
There are a lot of ways this whole mix can look right in the static analysis and in the compliance as well, but turn out to be not what you expected : for example, maybe the AMI returned exists and is correct but it is not the one you expected; or this instance type by default was something like a t3 small for example and you thought it was overridden by your value and it wasn’t… This is why checking for reality is very important and why validation exists.
So how does it work?
It is based on simple expectations, and you can inject dynamic content straight from the Terraform state using a mix of dynamic information and static information like t3 nano
versions, strings
, boolean
or whatever…
describe aws_ec2_instance(EC2_INSTANCE_ID) do
it { should be_running }
its('instance_type') { should eq 't3.nano' }
its('security_groups') { should include(id: EC2_SG_ID, name: 'allow_ssh') }
its('tags') { should include(key: 'Terraform', value: 'true') }
end
describe aws_security_group(group_name: 'allow_ssh') do
it { should exist }
its('group_name') { should eq 'allow_ssh' }
its('vpc_id') { should eq VPC_ID }
end
state
. 
If you’re not familiar with what a state
is in Terraform, it is the last thing that you get after the apply phase. So once you apply your code on your Cloud provider, you will get back a list of values that you could not have known about beforehand.
output "public_ip" {
value = aws_instance.vm.public_ip
}
output "vpc_id" {
value = data.aws_vpc.default.id
}
output "ec2_instance_id" {
value = aws_instance.vm.id
}
output "aws_security_group_id" {
value = aws_security_group.allow_ssh.id
}
For example, you might not know what your VPC ID
is, but what you want to know is if you use the default one. You can get this right from the state with InSpec.
InSpec is a really powerful tool for that kind of checks, for AWS, GCP, Azure, Digital Ocean and much more. It has hundreds of checks and inside each check you can have a multitude of sub-checks for each and every component of your cloud systems and even for operating systems. For example, there are hundreds of checks for every kind of Linux flavor you can imagine and even Windows.
You can find InSpec at inspec.io
Wrap up
Whether you are just getting started with Terraform or if you are an experimented infrastructure as code user, it is very easy to get more value by basically at least using the correct version of Terraform and staying up to date.
Linting your code, checking for security, compliance and validity will help you improve your Terraform code quality very easily from hundreds of different resources and cloud providers.