How we structure our Terraform code

How we structure our Terraform code
Photo by Alain Pham on Unsplash

The difficult part about structuring a Terraform project is that several solutions exist and there is no universal guide. The code structure would highly depend on the size of the project, complexity, the number of environments, how often the infrastructure changes, etc...
For simple projects which contain just a few resources, we could put everything inside our main.tf file and everything would work out great. However, when resources start to add up, we need a proper structure to keep our infrastructure scalable and maintainable.

Keywords

Before talking about structure, there are some key Terraform concepts we need to know.

  • Resources: A resource is a cloud component that is created based on a specified configuration (e.g. virtual networks, compute instances, DNS records, ...).
  • Data sources:  Allow Terraform to use information defined outside of Terraform. In contrast to resources, data sources only give us information and do not create anything.
  • Providers: Plugins used by Terraform to interact with cloud providers. A configuration must define a provider, so Terraform can install and use them.
  • Variables and outputs: Variables serve as parameters for a Terraform module, so users can customize behavior without editing the source. Outputs are like return values for a Terraform module.
  • Modules: Containers for multiple resources that are used together.
  • State: Mechanism used by Terraform to keep track of all the resources that are deployed in the cloud.
  • Workspace: Each Terraform configuration has an associated backend that defines how operations are executed and where persistent data such as the Terraform state are stored. The persistent data stored in the backend belongs to a workspace.

Recommendations

Here are some quick tips worth mentioning, that could help us build a better structured Terraform project (more information can be found in this article):

  • Don't put your entire infrastructure code in a single file. It is easier and faster to work with a smaller number of resources.
  • When breaking down components, keep in mind the blast radius, rate of change, the scope of responsibility, and ease of management. Insulating unrelated resources reduces the risk in case something goes wrong.
  • Use a remote state. Storing the terraform state locally in real-life projects is not an option. Managing the state in Git could easily turn into a nightmare.
  • Use a consistent structure and naming convention to make the code readable and maintainable.
  • Don't hardcode values if they can be passed as variables or defined as data sources.
  • Separate environments into their own directories.

Proposed structure

The following structure is more suitable for mid to large-sized projects, but it could also be used for smaller projects. We will be using AWS as a cloud provider. We will use a single AWS account and three different environments (development, staging, production).

The sample project contains the following components:

  • A Node.js Express API
  • Two React.js clients
  • A Postgres database

The following AWS services will be used:

  • IAM for managing users, roles, and permissions.
  • Security groups for controlling access to instances.
  • VPC, Subnets, Internet Gateway, Route Tables, and Route 53 records for networking.
  • RDS for the database.
  • ECR and ECS for managing our containerized API.
  • EC2, ALB, and Autoscaling Groups for scaling and load balancing.
  • S3 for static website hosting and for storing files.
  • CloudWatch for monitoring.
  • SNS for notifications.

Let's now see how the code is structured to manage all the above-mentioned resources.

infrastructure
├── modules
│   ├── containers
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── templates
│   │   │   └── app.json.tpl
│   │   ├── variables.tf
│   │   └── versions.tf
│   ├── database
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   └── variables.tf
│   │   └── versions.tf
│   ├── management
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   └── variables.tf
│   │   └── versions.tf
│   ├── network
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   │   └── versions.tf
│   ├── notifications
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── templates
│   │   │   └── email-sns-stack.json.tpl
│   │   └── variables.tf
│   │   └── versions.tf
│   ├── scaling
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── variables.tf
│   ├── security
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── templates
│   │   │   ├── ecs-ec2-role-policy.json.tpl
│   │   │   ├── ecs-ec2-role.json.tpl
│   │   │   └── ecs-service-role.json.tpl
│   │   ├── variables.tf
│   │   └── versions.tf
│   └── storage
│       ├── main.tf
│       ├── outputs.tf
│   	├── variables.tf
│   	└── versions.tf
├── development
│   ├── main.tf
│   ├── outputs.tf
│   ├── provider.tf
│   ├── variables.tf
│   └── versions.tf
├── staging
│   ├── main.tf
│   ├── outputs.tf
│   ├── provider.tf
│   ├── variables.tf
│   └── versions.tf
└── production
    ├── main.tf
    ├── outputs.tf
    ├── provider.tf
    ├── variables.tf
    └── versions.tf

As we can see from the directory tree above, there are 4 directories on the root level of the project:

  • Three separate directories for each environment (development, staging, production): Each of these directories contains the infrastructure definition for the specific environment. Although the structure is the same, it does not mean that all environments have the same infrastructure definition. They could differ in their configuration options (e.g. in a production environment the autoscaling group would have more than one EC2 instance running, in a development or staging environment it wouldn't be necessary). They could also differ in the components they contain (e.g. we could omit Cloudfront or certain monitoring tools for development or staging, to save costs).
  • A directory for modules: We defined what modules are in the first section. In each module, we have grouped resources based on their functionalities and we have made them reusable. These modules will then be called on the main.tf files of each environment definition.

Let's now see how a module is structured. We will study the storage module for simplicity, as it has fewer components than other modules. Here are the contents of the "storage" directory:

  • The main.tf file: This is the most important file within a module. Here we define all the necessary resources and their configurations. To see what configuration parameters each resource accepts, we can check the official documentation of the Terraform registry. In our example, we define three S3 bucket resources (two buckets for static website hosting and one bucket for file storage). In addition, we define three S3 bucket policy resources to control the access for each bucket.
  • The variables.tf file: In this file, we define the external variables that this module expects. Variables help us make our modules reusable. In our example, we define three variables that are needed for the bucket names (the app name, the environment, and the domain).
  • The outputs.tf file: In this file, we define the return values of this module. In our example, we have defined some outputs that are needed on the network module and on the CI/CD scripts.

Finally, let's see how the modules are used. We will check the development directory and see what files it contains.

  • The provider.tf file: Here we configure the provider and the backend (we use Terraform Cloud as a remote backend).
  • The versions.tf file: Here we define the minimum required Terraform version.
  • The main.tf file: This is the place where we call all the necessary modules and pass them the required configurations. As an example, to use our storage components we would define the "storage" block and pass the following parameters - the source (the relative path where the module is defined if it is defined locally, or the remote URL) and the expected variables that we mentioned above (the app name, the environment, and the domain).
  • The variables.tf file: Here we define the variables that the whole development infrastructure stack requires. Some of the variables can be hardcoded, whereas the sensitive ones must be defined in the terraform.tfvars file (if we run Terraform locally) or in the Terraform Cloud workspace configuration in our case.
  • The outputs.tf file: Here we define a few return values which will be needed on the CI/CD scripts (e.g. the ALB domain name which is needed when building the client React applications).

Final words

To conclude, the above structure can be useful for the majority of project types and sizes. Splitting code makes the infrastructure more readable and maintainable. We should, however, pay extra attention to how we define modules, how dynamic and reusable we make them, and how much functionality they encapsulate.

For the full implementation (including some sample applications), please check this repository.

References