Scott's Blog

Why You Should Use IaC

Infrastructure as Code isn’t a new concept, but is one a lot of folks are missing out on. I’m a huge believer that cloud resources should be in a stable and predictable state. Using tools such as Terraform, Azure ARM templates, or any other IaC provider creates that predictable state. I’m going to talk a little about why it’s important, and my opinion on how to best implement it.

Before we get started, I do want to say this isn’t a guide on any specific technology. Rather, it’s a discussion on why it’s important for scalable systems to use IaC technology.

Predictability

Predictability in infrastructure is one of the hardest things a SysAdmin will fight against. As a cloud based team grows, people will build new resources in the way they know how, not how the team has decided to standardize it’s structure. Creating templates used for deployment are the best way to create a reusable structure for deploying a resource.

A good example is that all Azure Resource Groups should be created in the eastus2 region. All your customers exist there, so your resources should be deployed there. Let’s also say that you are part of a global team supporting this one region. Developers in Europe would prefer the resources they’re building and supporting are closer to them, to reduce latency. When they create a new resource group in Azure, they select Europe and deploy their resources accordingly. Now, your customers are frustrated because your web app latency is higher for them, after some investigation, you discover that the resource group is in the wrong region.

By creating a Terraform template, we can consistently build a set of resources in a desired and standard way. Let’s look at some code:

# Configure the Azure provider
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 2.65"
    }
  }

  required_version = ">= 0.14.9"
}

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "rg" {
  name     = var.resource_group_name
  location = "eastus2"
}

Here we define our main.tf to create a resource group in eastus2 and take the name of the resource group as a variable. Now, we need a variables.tf file to define which variables we expect to pass when we run terraform apply.

variable "resource_group_name" {
  default = "myTFResourceGroup"
}

Now, since we’ve provided a default value, running terraform apply will create a resource group called myTFResourceGroup in eastus2. However, we can now add options for the name of our resource group using the -var flag and pass the name of our resource group. Here’s what that might look like terraform apply -var "resource_group_name=my_new_resource"

Now, adding the resource_group_name parameter, a resource group is predictably created in eastus2 with the desired name my_new_resource.

Stability

Stability is our largest goal as DevOps Engineers, Site Reliability Engineers, or even Application Developers. We want our service to be available to our user base and not impact the day to day operations. The example I used earlier of placing a resource group in the wrong region is a great example of why IaC is important for stability. If the bulk of our customer or user base is located in a specific region, it doesn’t make sense to locate our resources outside of that region.

In addition to that example, a great way to achieve stability is with proper monitoring and telemetry on your resources. Using an Azure Virtual Machine as an example, we want to collect the VM’s performance metrics each time we deploy a machine. That can be achieved by adding a Log Analytics extention in the VM’s extention settings. Great, so each time we create a new VM, just go add the extention and select the correct Log Analytics instance. But, oh no! We just had a production VM go offline, and we forgot to add it to Log Analytics.

It’s never a good feeling when you don’t have the data you need during a root cause investigation. We can fix that with IaC, using our templates we can ensure each created VM is part of a specific log analytics VM. There for creating a predictable deployment pattern that we will have the data we need to perform live analysis of system performance or the data we need to diagnose an issue. Here’s one way to achieve that using Terraform:

In our main.tf we can add

resource "azurerm_virtual_machine_extension" "mmaagent" {
  name                 = "mmaagent"
  virtual_machine_id   = azurerm_virtual_machine.vm.id
  publisher            = "Microsoft.EnterpriseCloud.Monitoring"
  type                 = "MicrosoftMonitoringAgent"
  type_handler_version = "1.0"
  auto_upgrade_minor_version = "true"
  settings = <<SETTINGS
    {
      "workspaceId": "${var.la_workspace_id}"
    }
    SETTINGS
    
    protected_settings = <<PROTECTED_SETTINGS
    {
      "workspaceKey": "${var.la_workspace_key}"
    }
    PROTECTED_SETTINGS
}

and our variables.tf

variable "la_workspace_id" {
  default = "[WORKSPACE_ID]"
}

variable "la_workspace_key" {
  default = "[WORKSPACE_KEY]"
}

So above, we have our main.tf which defines a virtual machine extention for mmaagent. We provide some variables la_workspace_id and la_workspace_key, with default values so the machines by default are enrolled into our desired instance but can be changed if we’d like to.

Security

The fun part, security. Security is usually the most critical part of your infrastructure. Ensuring your data stays within your walls, and no one who shouldn’t have it is able to get it. Usually, this is achieved with complex firewall architecture and proper permission systems. If there’s a complex microservice architecture, this might be accomplished using Network Security Groups or NSGs. Since your architecture is highly complex, you might want to deploy multiple NSGs depending on the application you’re running. A web service would get an NSG that allows inbound traffic on port 443, while blocking all remaining traffic. A MSSQL database would allow traffic from that application on port 1433, but block all other traffic.

IaC would add predictability and stability to these NSGs when resources are deployed. Let’s check back in with our pal terraform to look at an example.

Our main.tf will look something like:

resource "azurerm_resource_group" "example" {
  name     = var.resource_group_name
  location = "eastus2"
}

resource "azurerm_network_security_group" "app_traffic" {
  name                = "app_traffic_nsg"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name

  security_rule {
    name                       = "https"
    priority                   = 100
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "443"
    destination_port_range     = "*"
    source_address_prefix      = "*"
    destination_address_prefix = "*"
  }

  security_rule {
    name                       = "denyAll"
    priority                   = 1000
    direction                  = "Inbound"
    access                     = "Deny"
    protocol                   = "*"
    source_port_range          = "*"
    destination_port_range     = "*"
    source_address_prefix      = "*"
    destination_address_prefix = "*"
  }
}

resource "azurerm_network_security_group" "db_traffic" {
  name                = "db_traffic_nsg"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name

  security_rule {
    name                       = "MSSQL"
    priority                   = 100
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "1433"
    destination_port_range     = "*"
    source_address_prefix      = "10.0.0.0/24"
    destination_address_prefix = "*"
  }

  security_rule {
    name                       = "denyAll"
    priority                   = 1000
    direction                  = "Inbound"
    access                     = "Deny"
    protocol                   = "*"
    source_port_range          = "*"
    destination_port_range     = "*"
    source_address_prefix      = "*"
    destination_address_prefix = "*"
  }
}

and our variables.tf

variable "resource_group_name" {
  default = "my_new_resource"
}

In the above example, we allow traffic to our application on port 443, and deny any other traffic to the instance. Our database will only accept incoming network calls on port 1433 from our private VNET address range of 10.0.0.0/24. Each time we use terraform to deploy a new resource group, very predictably and reliably we are able to create appropriate network segregation of our database and applications without having to worry if we’ve missed a step.

Compliance

Here’s where some additional complexity is implemented with IaC. In the above examples, I talked about writing these templates and deploying them from a local machine. But, that means anyone with access to the templates could modify them prior to a deployment. To mitigate that risk, we should use a version control system such as Github to store our terraform templates and a CI/CD provider, such as GitHub Actions to deploy these resources after a pull request has been created and approved by another member of the team or maybe a change control board.

Using IaC goes hand-in-hand with automated deployments and approval processes. Using this mechanism of deployment, you can ensure that no one is making changes to your Terraform templates without approval. It provides a step of validation to ensure you don’t make an errors. Ideally, you would have branch protections that restrict deployments to your default branch (usually main) and other branches will run terraform plan to visualize the proposed changes before actually approving and deploying those changes.

Conclusion

We talked about why IaC is important for 4 of the major scopes of a DevOps Engineer: Stability, Predictability, Security and Compliance. If you work in the space, there’s a solid chance you have these 4 items on your list of things that are irratating to deal with. You can make them less irratating by implementing a IaC solution with automated deployments.

If you have any questions, comments or wanna talk about how you use IaC and pipelines to deploy your infrastructure, please reach out to me on twitter @scwheele

Until next time, go learn something new! :)