Infrastructure as Code

IaC Security

Security controls that keep infrastructure code from turning small mistakes into exposed resources.

IaC security is the discipline of preventing unsafe infrastructure from being provisioned in the first place, by catching misconfigurations in code before they become live cloud resources. The core insight is that the same code-review and automation discipline applied to application code can be applied to infrastructure definitions, making security checks automatic, consistent, and fast. In DevSecOps, IaC security shifts the cost of finding a misconfiguration from an incident response exercise to a pull-request comment, which is orders of magnitude cheaper and faster to resolve.

Learning objectives

What you should be able to do after reading.
  • Catch misconfigurations before deployment.
  • Explain how policy and review reduce unsafe infrastructure changes.
  • Protect state, credentials, and execution paths from unnecessary exposure.

At a glance

Fast mental model before you dive in.
Shift left
  • Misconfiguration scanning
  • Policy as code
  • Review gates
Protect the control plane
  • Least privilege
  • State security
  • Trusted automation
Reduce blast radius
  • Secure defaults
  • Narrow exceptions
  • Guardrails

Core idea

The safest IaC workflow prevents bad infrastructure from being created in the first place. Security checks should run where engineers already work, so risky changes are visible before they become live resources. A misconfiguration detected in a pull request is fixed in minutes. The same misconfiguration discovered after deployment may require incident response, customer notification, and compliance reporting.

That means scanning configuration for known patterns, enforcing policy automatically, and protecting the machinery that runs the automation. Each of these layers addresses a different risk. Scanning finds known-bad patterns, policy enforcement blocks classes of decisions that should not be made without explicit review, and protecting the automation prevents an attacker from using the IaC pipeline itself as an entry point.

Prevention

  • Run misconfiguration scans on every pull request so findings appear before merge, not after deployment.
  • Use policy-as-code checks to block obviously unsafe patterns, such as publicly accessible storage and open security groups, automatically.
  • Keep review requirements in the same workflow as the code change so security decisions are not a separate process that gets skipped under time pressure.

Baseline

  • Store state in a backend with tight access control, encryption at rest, and versioning so the state is protected like a production credential.
  • Give CI/CD automation only the cloud permissions it needs for the current scope, scoped to environment and purpose.
  • Apply default-deny thinking when a resource or setting can introduce public exposure or broad access.

Signals to watch for

Patterns worth investigating further.
  • Repeated policy exceptions are becoming normal.
  • Public resources appear without a clear reason or owner.
  • Too many people can read or modify state and credentials.

DEEP DIVE

Misconfiguration scanning

Misconfiguration scanning analyses IaC files for patterns that are known to create security risks when deployed. The scan runs against Terraform HCL, CloudFormation YAML, Kubernetes manifests, and other infrastructure definition formats without executing any deployment. Findings represent misconfigurations in the code that would become live security weaknesses if applied, such as an S3 bucket with public access enabled, a security group with ingress open to 0.0.0.0/0, or a database without encryption at rest.

The major open-source IaC scanning tools are Checkov, tfsec, and KICS. Checkov is written in Python and supports the broadest range of IaC formats with over 1,000 built-in checks, including support for Terraform, CloudFormation, Kubernetes, Helm, and ARM templates. It also supports custom policies written in Python or YAML. tfsec is a Terraform-focused scanner written in Go that is fast enough to run on every save in a developer's editor, providing near-instant feedback. KICS (Keeping Infrastructure as Code Secure) is an open-source project by Checkmarx that supports broad format coverage and is designed for CI integration.

Integrating scanners into the CI pipeline ensures that no IaC change can be merged without passing a baseline security check. The scanner runs as a pipeline step on every pull request, and the results are posted as PR comments or pipeline status checks. Setting a failure threshold, for example failing the pipeline on any critical or high severity finding, turns the scanner from an advisory tool into an enforcement gate. Teams should tune their rulesets to the specific cloud and IaC combination in use, disabling checks that do not apply and enabling additional checks for their specific risk profile.

A common mistake is running IaC scanning only in CI and not in the developer's local environment. Developers who only see scan results after pushing a commit get slower feedback and are more likely to have to rework code. IDE plugins and pre-commit hooks that run the scanner locally give immediate feedback at the earliest possible point. The same rules should apply in both places so there are no surprises when the CI pipeline runs.

Policy as code

Policy as code expresses security and governance rules as machine-executable logic that can run automatically in the pipeline. Instead of relying on human reviewers to remember every security requirement every time, the rules are encoded once and enforced consistently. A policy that says all S3 buckets must have server-side encryption enabled will catch that misconfiguration on every PR, regardless of which engineer wrote it or how experienced they are with cloud security.

Open Policy Agent (OPA) is the most widely used policy engine for infrastructure policy. OPA uses the Rego language to express policies as logical rules. Conftest is a tool that wraps OPA and makes it straightforward to apply Rego policies to any structured data format including Terraform plan output, Kubernetes YAML, and Dockerfile content. HashiCorp Sentinel is a similar policy-as-code framework built into Terraform Cloud and Terraform Enterprise, designed specifically for Terraform workflows and capable of enforcing policies on plan output before apply is permitted.

Good policy as code is precise, readable, and directly tied to a concrete security or compliance requirement. A policy should specify exactly what it is checking, why the check matters, and what a passing configuration looks like. Vague policies that are difficult to satisfy or that fire on benign configurations erode trust in the policy system and push teams to request exemptions or bypass the checks. A policy library should be maintained like a codebase. Reviewed, tested, versioned, and retired when no longer relevant.

The most effective use of policy as code in IaC workflows is to enforce it at both plan time and during PR review. Running policies against the plan output catches issues that emerge from resource interactions that are not visible in the code alone. For example, a policy that checks whether any IAM policy grants wildcard permissions is more reliable when run against the plan-time representation of the IAM policy JSON than against the Terraform source that generates it.

State security

The Terraform state file contains more sensitive information than most practitioners realise. It stores every resource ID, ARN, IP address, and attribute value for every resource Terraform manages. This includes database endpoint addresses, load balancer DNS names, and, depending on how the configuration is written, actual secret values passed as resource attributes. An attacker with read access to the state file has a detailed map of the entire infrastructure and potentially the credentials needed to access it.

State must be stored in a remote backend with encryption at rest and strict access controls. For AWS, the standard pattern is an S3 bucket with server-side encryption, versioning enabled, public access blocked, and a bucket policy that allows only the CI/CD role and designated operators to read or write the state. The DynamoDB table used for state locking should be equally protected. The state backend should not be accessible from developer laptops or from workloads running in the environment it manages.

Access to state should follow least privilege and be audited. The CI/CD pipeline that runs terraform plan and apply needs read and write access to state. A developer reviewing plan output may need read-only access. A security audit role may need read access for compliance purposes. No one should have access to state that they do not need. State access events should be logged in CloudTrail or equivalent so that unexpected reads or writes are detectable.

State file exports and downloads are a common source of accidental exposure. Engineers who run terraform state pull to inspect or debug state are creating a local copy of a sensitive file. These local copies should be treated with the same care as production credentials and deleted immediately after use. Committing a state file to a git repository, even temporarily or accidentally, should be treated as a security incident requiring immediate investigation and rotation of any credentials that appear in the state.

Secure defaults

Secure defaults reduce the chance that an engineer provisioning infrastructure has to remember every security-relevant setting on every resource type. When the platform, modules, and templates provide safe defaults, the engineer focuses on the business requirement and the security properties are handled automatically. If every team member must independently remember to enable encryption, block public access, and enable logging on every resource they create, some percentage will forget some percentage of the time.

In practice, secure defaults are implemented at the module level. A shared module for creating an S3 bucket should have encryption enabled, public access blocked, and logging configured by default. If a team needs to create a public bucket for static website hosting, they pass an explicit override to the module that enables public access. This pattern makes the secure choice the default and the insecure choice require deliberate action, which makes misconfigurations easier to catch in code review.

Cloud provider service control policies (SCPs in AWS Organizations, Organization Policies in GCP) can enforce defaults at the account or project level, independent of any IaC configuration. An SCP that denies any action to create unencrypted storage volumes acts as a safety net even when IaC scanning is bypassed, misconfigured, or missing. Combining module-level secure defaults with account-level policy enforcement creates defence in depth for infrastructure security.

A common mistake is treating secure defaults as a one-time configuration decision. Cloud providers regularly introduce new services, new resource types, and new default behaviours. A module library that was current two years ago may not include controls for services introduced since then. Regular review of the module library against current security guidance, and a process for updating defaults when best practices evolve, are necessary to keep the defaults actually secure.

Least privilege

Least privilege in IaC security means that every identity involved in the IaC workflow has only the permissions it needs to do its specific job. This applies to human operators, CI/CD pipelines, and the infrastructure resources themselves. A CI pipeline that runs terraform plan in a pull request check needs read access to the state backend and the ability to call cloud APIs to read existing resource configurations. It does not need write access to any cloud resource. The pipeline that runs terraform apply on merge needs write access scoped to the specific resources and environments it manages.

Over-permissive IAM roles for CI/CD pipelines are one of the most common and impactful IaC security weaknesses. A pipeline with AdministratorAccess or equivalent can, if compromised, create arbitrary resources, exfiltrate data, delete production infrastructure, and modify access policies. Scoping CI/CD permissions to the minimum required for the specific configuration under management eliminates these risks. The permissions required for a specific Terraform configuration can be derived by analysing the provider calls it makes and creating an IAM policy that allows exactly those calls on exactly those resource types.

The infrastructure resources themselves should also follow least privilege. IAM roles assigned to application workloads should allow only the specific API calls the application makes against specific resources. Security groups should allow only the ports and protocols the application uses from the sources it accepts traffic from. S3 bucket policies should allow only the specific operations that the workload performs. These controls reduce the blast radius of a compromised workload by limiting what it can do even with full control of its own credentials.

Least privilege enforcement in IaC is enabled by policy as code. A policy that detects and rejects IAM policies with Action: '*' or Resource: '*' catches the most egregious over-permission patterns automatically. Tools like IAM Access Analyzer, Cloudsplaining, and Parliament can analyse IAM policies in IaC code for excessive permissions and suggest more specific replacements. These tools are most useful when integrated into the PR review pipeline so that over-permissive policies are flagged before they are deployed.

Guardrails

Guardrails are the controls that keep infrastructure from drifting into unsafe territory even when teams are moving quickly and not focused on security. They are distinct from policies and scanning rules because guardrails operate at a higher level of abstraction. They define the boundaries within which teams can operate freely, rather than checking individual resource configurations. A guardrail might say no resource in this account may have a public IP unless it is in the designated public subnet, rather than checking whether a specific resource has a public IP.

Guardrails can be implemented at multiple levels. At the module level, a module can enforce guardrails by refusing to provision resources without specific required inputs, such as tags for ownership and data classification. At the pipeline level, a policy step can block plans that include resource types or configurations outside the approved envelope. At the cloud account level, service control policies can prevent entire categories of unsafe configuration regardless of what IaC attempts to do. The combination of all three levels creates a layered safety system.

Effective guardrails are designed to be firm but transparent. Engineers should be able to understand why a guardrail exists, what it prevents, and what the approved path forward is when their legitimate requirement falls outside the default boundary. A guardrail that blocks legitimate work without providing a clear path to resolution pushes teams to find workarounds, request broad policy exceptions, or abandon IaC discipline in favour of direct console access. Clear documentation of each guardrail and a lightweight exception process for genuinely unusual requirements make the system trustworthy.

Guardrail coverage should grow as the organisation's cloud footprint grows. Starting with the most impactful controls, such as preventing public storage and enforcing encryption, and adding controls progressively as new services are adopted is more sustainable than attempting to address every possible misconfiguration at once. A guardrail that is not well understood or that fires on many legitimate configurations will be disabled or ignored. Starting narrow and expanding coverage over time keeps guardrails relevant and respected.