Strategies to optimize the costs of your builds on AWS CodeBuild

AWS CodeBuild is a fully managed continuous integration service that compiles source code, runs tests, and produces ready-to-deploy software packages. With CodeBuild, you don’t need to provision, manage, and scale your own build servers. You just specify the location of your source code and choose your build settings, and CodeBuild will run your build scripts for compiling, testing, and packaging your code.

CodeBuild uses simple pay-as-you-go pricing. There are no upfront costs or minimum fees. You pay only for the resources you use. You are charged for compute resources based on the duration it takes for your build to execute.

There are three main factors that contribute to build costs with CodeBuild:

Build duration
Compute types
Additional services

Understanding how to balance these factors is key to optimizing costs on AWS and this blog post will take a look at each.

Compute Types

CodeBuild offers three compute instance types with different amounts of memory and CPU, for example the Linux GPU Large compute type has 255GB of memory and 32 vCPUs and enables you to execute CI/CD workflow for deep learning purpose (ML/AI) with AWS CodePipeline. Incremental changes in your code, data, and ML models can now be tested for accuracy, before the changes are released through your pipeline.

The Linux 2XLarge instance type is another instance type with 145GB of memory and 72 vCPUs and is suitable for building large and complex applications that require high memory and CPU resources. It can help reduce build time, speed up delivery, and support multiple build environments.

The GPU and 2XLarge compute types are powerful but are also the most expensive compute types per minute. For most build tasks the small, medium or large instance compute types are more than adequate. Using the pricing listed in US East (Ohio) we can see the price variance between the small, medium and large Linux instance types in Figure 1 below.

Figure 1. AWS CodeBuild small, medium and large compute types vs cost per minute

Analyzing the CodeBuild compute costs leads us to a number of cost optimization considerations.

Right Sizing AWS CodeBuild Compute Types to Match Workloads

Right sizing is the process of matching instance types and sizes to your workload performance and capacity requirements at the lowest possible cost. It’s also the process of looking at deployed instances and identifying opportunities to eliminate or downsize without compromising capacity or other requirements, which results in lower costs.

Right sizing is a key mechanism for optimizing AWS costs, but it is often ignored by organizations when they first move to the AWS Cloud. They lift and shift their environments and expect to right size later. Speed and performance are often prioritized over cost, which results in oversized instances and a lot of wasted spend on unused resources.

CodeBuild monitors build resource utilization on your behalf and reports metrics through Amazon CloudWatch. These include metrics such as

CPU
Memory
Disk I/O

These metrics can be seen within the CodeBuild console, for an example see Figure 2 below:

Figure 2. Resource utilization metrics

Leveraging observability to measuring build resource usage is key to understanding how to rightsize and CodeBuild makes this easy with CloudWatch metrics readily available through the CodeBuild console.

Consider ARM / Graviton

If we compare the costs of arm1.small and general1.small over a ten minute period we can see that the arm based compute type is 32% less expensive.

Figure 3. Comparison of small arm and general compute types

But cost per minute while building is not the only benefit here, ARM processors are known for their energy efficiency and high performance. Compiling code directly on an ARM processor can potentially lead to faster execution times and improved overall system performance.

The ideal workload to migrate to ARM is one that doesn’t use architecture-specific dependencies or binaries, already runs on Linux and uses open-source components, for example: Migrating AWS Lambda functions to Arm-based AWS Graviton2 processors.

AWS Graviton processors are custom built by Amazon Web Services using 64-bit Arm Neoverse cores to deliver the best price performance for your cloud workloads. The AWS Graviton Fast Start program helps you quickly and easily move your workloads to AWS Graviton in as little as four hours for applications such as serverless, containerized, database, and caching.

Consider migrating Windows workloads to Linux

If we compare the cost of a general1.medium Windows vs Linux compute type we can see that the Linux Compute type is 43% less expensive over ten minutes:

Figure 4. Build times on Windows compared to Linux

Migrating to Linux is one strategy to not only reduce the costs of building and testing code in CodeBuild but also the cost of running in production.

The effort required to re-platform from Windows to Linux varies depending on how the application was implemented. The key is to identify and target workloads with the right characteristics, balancing strategic importance and implementation effort.

For example, older .Net applications may be able to be migrated to later versions of .NET (previously named .Net Core) first before deploying to Linux. AWS have a Porting Assistant for .NET that is an analysis tool that scans .NET Framework applications and generates a cross-platform compatibility assessment, helping you port your applications to Linux faster.

See our guide on how to escape unfriendly licensing practices by migrating Windows workloads to Linux.

Build duration

One of the dimensions of the CodeBuild pricing is the duration of each build. This is calculated in minutes, from the time you submit your build until your build is terminated, rounded up to the nearest minute. For example: if your build takes a total of 35 seconds using one arm1.small Linux instance on US East (Ohio), each build will cost the price of the full minute, which is $0.0034 in that case. Similarly, if your build takes a total of 5 minutes and 20 seconds, you’ll be charged for 6 minutes.

When you define your CodeBuild project, within a buildspec file, you can specify some of the phases of your builds. The phases you can specify are install, pre-build, build, and post-build. See the documentation to learn more about what each of those phases represent. Besides that, you can define how and where to upload reports and artifacts each build generates. It means that on each of those steps, you should do only what is necessary for the task you want to achieve. Installing dependencies that you won’t need, running commands that aren’t related to your task, or performing tests that aren’t necessary will affect your build time and unnecessarily increase your costs. Packaging and uploading target artifacts with unnecessary large files would cause a similar result.

On top of the CodeBuild phases and steps that you are directly in control, each time you start a build, it takes additional time to queue the task, provision the environment, download the source code (if applicable), and finalize. See Figure 5 below a breakdown of a succeeded build:

Figure 5. AWS CodeBuild Phase details

In the above example, for each build, it takes approximately 42 seconds on top of what is specified in the buildspec file. Considering this overhead, having several smaller builds instead of fewer larger builds can potentially increase your costs. With this in mind, you have to keep your builds as short as possible, by doing only what is necessary, so that you can minimize the costs. Furthermore, you have to find a good balance between the duration and the frequency of your builds, so that the overhead doesn’t take a large proportion of your build time. Let’s explore some approaches you can factor in to find this balance.

Build caching

A common way to save time and cost on your CodeBuild builds is with build caching. With build caching, you are able to store reusable pieces of your build environment, so that you can save time next time you start a new build. There are two types of caching:

Amazon S3 — Stores the cache in an Amazon S3 bucket that is available across multiple build hosts. If you have build artifacts that are more expensive to build than to download, this is a good option for you. For large build artifacts, this may not be the best option, because it can take longer to transfer over your network.

Local caching — Stores a cache locally on a build host that is available to that build host only. When you choose this option, the cache is immediately available on the build host, making it a good option for large build artifacts that would take long network transfer time. If you choose local caching, there are multiple cache modes you can choose including source cache mode, docker layer cache mode and custom cache mode.

Docker specific optimizations

Another strategy to optimize your build time and reduce your costs is using custom Docker images. When you specify your CodeBuild project, you can either use one of the default Docker images provided by CodeBuild, or use your own build environment packaged as a Docker image. When you create your own build environment as a Docker image, you can pre-package it with all the tools, test assets, and required dependencies. This can potentially save a significant amount of time, because on the install phase you won’t need to download packages from the internet, and on the build phase, when applicable, you won’t need to download e.g., large static test datasets.

To achieve that, you must specify the image value on the environment configuration when creating or updating your CodeBuild project. See Docker in custom image sample for CodeBuild to learn more about how to configure that. Keep in mind that larger Docker images can negatively affect your build time, therefore you should aim to keep your custom build environment as lean as possible, with only the mandatory contents. Another aspect to consider is to use Amazon Elastic Container Registry (ECR) to store your Docker images. Downloading the image from within the AWS network will be, in most of the cases, faster than downloading it from the public internet and can avoid bottlenecks from public repositories.

Consider which tests to run on the feature branch

If you are using a feature-branch approach, consider carefully which build steps and tests you are going to run on your branches. Running unit tests is a good example of what you should run on the feature branches, but unless you have very specific requirements, you probably don’t need penetration or integration tests at this point. Usually the feature branch changes often, hence running all types of tests all the time is a potential waste. Prefer to have your complex, long-running tests at a later stage of your CI/CD pipeline, as you build confidence on the version that you are to release.

Build once, deploy everywhere

It’s widely considered a best practice to avoid environment-specific code builds, therefore consider a build once, deploy everywhere strategy. There are many benefits to separating environment configuration from the build including reducing build costs, improve maintainability, scalability, and reduce the risk of errors.

Build once, deploy everywhere can be seen in the AWS Deployment Pipeline Reference Architecture where the Beta, Gamma and Prod stages are created from a single artifact created in the Build Stage:

Figure 6. Application Pipeline reference architecture

Additional Services

CloudWatch Logs and Metrics

Amazon CloudWatch can be used to monitor your builds, report when something goes wrong, take automatic actions when appropriate or simply keep logs of your builds.

CloudWatch metrics show the behavior of your builds over time. For example, you can monitor:

How many builds were attempted in a build project or an AWS account over time.
How many builds were successful in a build project or an AWS account over time.
How many builds failed in a build project or an AWS account over time.
How much time CodeBuild spent running builds in a build project or an AWS account over time.
Build resource utilization for a build or an entire build project. Build resource utilization metrics include metrics such as CPU, memory, and storage utilization.

However, you may incur charges from Amazon CloudWatch Logs for build log streams. For more information, see Monitoring AWS Codebuild in the CodeBuild User Guide and the CloudWatch pricing page.

Storage Costs

You can create an CodeBuild build project with a set of output artifacts and publish then to S3 buckets. Using S3 as a repository for your artifacts, you only pay for what you use. Check the S3 pricing page.

Encryption

Cloud security at AWS is the highest priority and encryption is an important part of CodeBuild security. Some encryption, such as for data in-transit, is provided by default and does not require you to do anything. Other encryption, such as for data at-rest, you can configure when you create your project or build. Codebuild uses Amazon KMS to encrypt the data at-rest.

Build artifacts, such as a cache, logs, exported raw test report data files, and build results, are encrypted by default using AWS managed keys and are free of charge. Consider using these keys if you don’t need to create your own key.

If you do not want to use these KMS keys, you can create and configure a customer managed key. For more information, see the documentation on creating KMS Keys and AWS Key Management Service concepts in the AWS Key Management Service User Guide.

Check the KMS pricing page.

Data transfer costs

You may incur additional charges if your builds transfer data, for example:

Avoid routing traffic over the internet when connecting to AWS services from within AWS by using VPC endpoints

Traffic that crosses an Availability Zone boundary typically incurs a data transfer charge. Use resources from the local Availability Zone whenever possible.
Traffic that crosses a Regional boundary will typically incur a data transfer charge. Avoid cross-Region data transfer unless your business case requires it
Use the AWS Pricing Calculator to help estimate the data transfer costs for your solution.
Use a dashboard to better visualize data transfer charges – this workshop will show how.

Here’s an Overview of Data Transfer Costs for Common Architectures on AWS.

Conclusion

In this blog post we discussed how compute types; build duration and use of additional services contribute to build costs with AWS CodeBuild.

We highlighted how right sizing compute types is an important practice for teams that want to reduce their build costs while still achieving optimal performance. The key to optimizing is by measuring and observing the workload and selecting the most appropriate compute instance based on requirements.

Further compute type cost optimizations can be found by targeting AWS Graviton processors and Linux environments. AWS Graviton Processors in particular offer several advantages over traditional x86-based instances and are designed by AWS to deliver the best price performance for your cloud workloads.

For further reading, see the summary of CI/CD best practices from the Practicing Continuous Integration and Continuous Delivery on AWS whitepaper,  my CI/CD pipeline is my release captain and also the cost optimization pillar from the AWS Well-Architected Framework which focuses on avoiding unnecessary costs.

About the authors:

Leandro Cavalcante Damascena

Leandro Damascena is a Solutions Architect at AWS, focused on Application Modernization and Serverless. Prior to that, he spent 20 years in various roles across the software delivery lifecycle, from development to architecture and operations. Outside of work, he loves spending time with his family and enjoying water sports like kitesurfing and wakeboarding.

Rafael Ramos

Rafael is a Solutions Architect at AWS, where he helps ISVs on their journey to the cloud. He spent over 13 years working as a software developer, and is passionate about DevOps and serverless. Outside of work, he enjoys playing tabletop RPG, cooking and running marathons.

Matt Laver

Matt Laver is a Solutions Architect at AWS working with SMB customers in the UK. He is passionate about DevOps and loves helping customers find simple solutions to difficult problems.

Extending CloudFormation and CDK with Third-Party Extensions

Did you know you can use CloudFormation to manage third-party resources? The AWS CloudFormation Public Registry provides a searchable collection of CloudFormation extensions and makes it easy to discover and provision them in CloudFormation templates and AWS Cloud Development Kit (CDK) applications. In the past three months, we’ve added a number of new, exciting partners to the Public Registry, including GitLab, Okta, and PagerDuty.

The extensions available on the registry are wide-ranging and include third-party resources from partners such as MongoDB; hooks, which are preventative controls that add safeguards to provisioning; and modules, which are re-usable components that take into account best practices and opinionated definitions of resources. AWS Partner Network (APN), third parties, and the developer community contribute these extensions to the Public Registry. Using extensions, customers no longer need to create and maintain custom provisioning logic for resource types from third-party vendors.

Over last few months, AWS collaborated with partners to develop and publish over 80 new resources across 14 providers to Public Registry for CloudFormation. Below is a summary of the new resource type additions.

Recently Updated Third-Party Providers

Provider
Use case

MongoDB Atlas

Manage components in MongoDB Atlas. Add, edit, or delete administrative objects within Atlas, including projects, users, and database deployments

Note: You cannot read or write data to Atlas Clusters with Atlas Admin APIs and AWS CloudFormation resources. To read and write data in Atlas, you must use the Atlas Data API

GitLab
Manage the users and groups in an organization, set up a new project with the right users, groups, and access token, tag a project automatically for every active CI/CD deployment

New Relic
Create a new Dashboard with custom Pages, Widgets and Layout, add tags to your data to help improve data organization and findability, workloads-related tasks

GitHub
Manage the users and groups in an organization, set up a new project with the right users, groups, and access token, Add a webhook to a repo

Dynatrace
Set up a new project with service level objective, locations, monitors and metrics

Okta
Onboard a new application into Okta with the right users and groups

PagerDuty
Set up monitoring of a new or existing application

Databricks
Set up a Databricks cluster and jobs

Fastly
Configure Fastly as a CDN for your web app

BigID
Connect S3 and DynamoDB data sources into your BigID application

Rollbar
Set up a new Rollbar project and manage rules, teams, and users

Cloudflare
Configure a DNS record and load-balancing using Cloudflare

Lacework
Configure Lacework alert profiles, rules, channels and manage queries

Snowflake
Create databases, users, and manage privileges

Key Benefits

Here are some of the benefits for extension builders and consumers when publishing extensions to the public registry:

Discoverability – Publishing your extensions in the public registry will make them discoverable by 1M+ active CloudFormation and CDK customers.

CDK Support – We’re seeing rapid growth in the adoption of the CDK amongst the developer population. Upon publishing to the registry, L1 CDK Constructs will automatically be created for your third party resources making them compatible with the CDK with no added work required. These constructs will also be listed on Construct Hub and aids discoverability discoverable by customers. Note: Automated L1 CDK construct generation is currently an experimental feature.

Drift detection – Third-party resource types in the public registry also integrate with drift detection. After creating a resource from a third-party resource type, CloudFormation will detect changes to the third-party resource from its template configuration, known as configuration drift, just as it would with AWS resources.

AWS Config – You can also use AWS Config to manage compliance for third-party resources consumed from the registry. The resource types are automatically tracked as Configuration Items when you have configured AWS Config to record them, and used CloudFormation to create, update, and delete them. Whether the resource types you use are third-party or AWS resources, you can view configuration history for them, in addition to being able to write AWS Config rules to verify configuration best practices.

Abstraction of Best Practices with Modules – Browse and use modules from the registry when creating your CloudFormation templates to ensure you’re provisioning resources while adhering to best practices.

AWS Cloud Control API – The AWS Cloud Control API allows AWS partners and customers to interface with your resource type through API calls using Create, Read, Update, Delete, and List (CRUD-L) operations. Resources in the registry will be automatically integrated with our AWS Cloud Control API and expands your third party resource compatibility to even more AWS services and IaC tools.

We’ve seen great momentum from our partners and developer community over the past year. We are looking forward to continued investment and innovation in the Public Registry.

How to Get Started

For Resource Type Users: Explore and Activate Third Party Resource Types

Third party resource types must first be activated before they can be used. You do this by logging into your AWS Console > Navigate to CloudFormation > Registry > Public extensions > Set the Publisher to Third Party. This will show you a list of available third-party resources in your region (note that different regions may have a different set of third-party resource types). Select the radio box next to the resource types you want to activate and click the activate button at the top of the list.

Figure 1:

Don’t see the extension you need in the registry?

You can submit requests for new third-party extensions through our Community Registry Extensions Github repo issue tracker! Click the New Issue button and describe the third-party extension along with information about your use case.

For Developers and Publishers: Join the CloudFormation Developer Community and Start Building

You can see several of the community-built registry extensions in the AWS CloudFormation Community Registry Extensions repository and even contribute yourself. You can also read about the experiences and lessons learned from publishing to the Registry through this blog written by Cloudsoft.

For developers looking to create new resource types to add to the public Registry, follow this creating resource types walkthrough help you get started. If you need assistance creating, publishing resources, or just want to join the discussion, you can join the conversation today in our CloudFormation Discord Channel. We’d love to hear about your experiences and use cases in developing innovations with registry extensions.

About the authors:

Anuj Sharma

Anuj Sharma is a Sr Container Partner Solution Architect with Amazon Web Services. He works with ISV partners and drives Partner-AWS product development and integrations.

Lucas Chen

Lucas is a Senior Product Manager at Amazon Web Services. He leads the CloudFormation Registry and its integrations with third-party products. Prior to AWS, he spent 9 years at VMware working on its end user computing product, Workspace ONE.

Rahul Sharma

Rahul is a Senior Product Manager-Technical at Amazon Web Services with over two years of product management spanning AWS CloudFormation and AWS Cloud Control API.

Improve collaboration between teams by using AWS CDK constructs

There are different ways to organize teams to deliver great software products. There are companies that give the end-to-end responsibility for a product to a single team, like Amazon’s Two-Pizza teams, and there are companies where multiple teams split the responsibility between infrastructure (or platform) teams and application development teams. This post provides guidance on how collaboration efficiency can be improved in the case of a split-team approach with the help of the AWS Cloud Development Kit (CDK).

The AWS CDK is an open-source software development framework to define your cloud application resources. You do this by using familiar programming languages like TypeScript, Python, Java, C# or Go. It allows you to mix code to define your application’s infrastructure, traditionally expressed through infrastructure as code tools like AWS CloudFormation or HashiCorp Terraform, with code to bundle, compile, and package your application.

This is great for autonomous teams with end-to-end responsibility, as it helps them to keep all code related to that product in a single place and single programming language. There is no need to separate application code into a different repository than infrastructure code with a single team, but what about the split-team model?

Larger enterprises commonly split the responsibility between infrastructure (or platform) teams and application development teams. We’ll see how to use the AWS CDK to ensure team independence and agility even with multiple teams involved. We’ll have a look at the different responsibilities of the participating teams and their produced artifacts, and we’ll also discuss how to make the teams work together in a frictionless way.

This blog post assumes a basic level of knowledge on the AWS CDK and its concepts. Additionally, a very high level understanding of event driven architectures is required.

Team Topologies

Let’s first have a quick look at the different team topologies and each team’s responsibilities.

One-Team Approach

In this blog post we will focus on the split-team approach described below. However, it’s still helpful to understand what we mean by “One-Team” Approach: A single team owns an application from end-to-end. This cross-functional team decides on its own on the features to implement next, which technologies to use and how to build and deploy the resulting infrastructure and application code. The team’s responsibility is infrastructure, application code, its deployment and operations of the developed service.

If you’re interested in how to structure your AWS CDK application in a such an environment have a look at our colleague Alex Pulver’s blog post Recommended AWS CDK project structure for Python applications.

Split-Team Approach

In reality we see many customers who have separate teams for application development and infrastructure development and deployment.

Infrastructure Team

What I call the infrastructure team is also known as the platform or operations team. It configures, deploys, and operates the shared infrastructure which other teams consume to run their applications on. This can be things like an Amazon SQS queue, an Amazon Elastic Container Service (Amazon ECS) cluster as well as the CI/CD pipelines used to bring new versions of the applications into production.
It is the infrastructure team’s responsibility to get the application package developed by the Application Team deployed and running on AWS, as well as provide operational support for the application.

Application Team

Traditionally the application team just provides the application’s package (for example, a JAR file or an npm package) and it’s the infrastructure team’s responsibility to figure out how to deploy, configure, and run it on AWS. However, this traditional setup often leads to bottlenecks, as the infrastructure team will have to support many different applications developed by multiple teams. Additionally, the infrastructure team often has little knowledge of the internals of those applications. This often leads to solutions which are not optimized for the problem at hand: If the infrastructure team only offers a handful of options to run services on, the application team can’t use options optimized for their workload.

This is why we extend the traditional responsibilities of the application team in this blog post. The team provides the application and additionally the description of the infrastructure required to run the application. With “infrastructure required” we mean the AWS services used to run the application. This infrastructure description needs to be written in a format which can be consumed by the infrastructure team.

While we understand that this shift of responsibility adds additional tasks to the application team, we think that in the long term it is worth the effort. This can be the starting point to introduce DevOps concepts into the organization. However, the concepts described in this blog post are still valid even if you decide that you don’t want to add this responsibility to your application teams. The boundary of who is delivering what would then just move more into the direction of the infrastructure team.

To be successful with the given approach, the two teams need to agree on a common format on how to hand over the application, its infrastructure definition, and how to bring it to production. The AWS CDK with its concept of Constructs provides a perfect means for that.

Primer: AWS CDK Constructs

In this section we take a look at the concepts the AWS CDK provides for structuring our code base and how these concepts can be used to fit a CDK project into your team topology.

Constructs

Constructs are the basic building block of an AWS CDK application. An AWS CDK application is composed of multiple constructs which in the end define how and what is deployed by AWS CloudFormation.

The AWS CDK ships with constructs created to deploy AWS services. However, it is important to understand that you are not limited to the out-of-the-box constructs provided by the AWS CDK. The true power of AWS CDK is the possibility to create your own abstractions on top of the default constructs to create solutions for your specific requirement. To achieve this you write, publish, and consume your own, custom constructs. They codify your specific requirements, create an additional level of abstraction and allow other teams to consume and use your construct.

We will use a custom construct to separate the responsibilities between the the application and the infrastructure team. The application team will release a construct which describes the infrastructure along with its configuration required to run the application code. The infrastructure team will consume this construct to deploy and operate the workload on AWS.

How to use the AWS CDK in a Split-Team Setup

Let’s now have a look at how we can use the AWS CDK to split the responsibilities between the application and infrastructure team. I’ll introduce a sample scenario and then illustrate what each team’s responsibility is within this scenario.

Scenario

Our fictitious application development team writes an AWS Lambda function which gets deployed to AWS. Messages in an Amazon SQS queue will invoke the function. Let’s say the function will process orders (whatever this means in detail is irrelevant for the example) and each order is represented by a message in the queue.

The application development team has full flexibility when it comes to creating the AWS Lambda function. They can decide which runtime to use or how much memory to configure. The SQS queue which the function will act upon is created by the infrastructure team. The application team does not have to know how the messages end up in the queue.

With that we can have a look at a sample implementation split between the teams.

Application Team

The application team is responsible for two distinct artifacts: the application code (for example, a Java jar file or an npm module) and the AWS CDK construct used to deploy the required infrastructure on AWS to run the application (an AWS Lambda Function along with its configuration).

The lifecycles of these artifacts differ: the application code changes more frequently than the infrastructure it runs in. That’s why we want to keep the artifacts separate. With that each of the artifacts can be released at its own pace and only if it was changed.

In order to achieve these separate lifecycles, it is important to notice that a release of the application artifact needs to be completely independent from the release of the CDK construct. This fits our approach of separate teams compared to the standard CDK way of building and packaging application code within the CDK construct.

But how will this be done in our example solution? The team will build and publish an application artifact which does not contain anything related to CDK.
When a CDK Stack with this construct is synthesized it will download the pre-built artifact with a given version number from AWS CodeArtifact and use it to create the input zip file for a Lambda function. There is no build of the application package happening during the CDK synth.

With the separation of construct and application code, we need to find a way to tell the CDK construct which specific version of the application code it should fetch from CodeArtifact. We will pass this information to the construct via a property of its constructor.

For dependencies on infrastructure outside of the responsibility of the application team, I follow the pattern of dependency injection. Those dependencies, for example a shared VPC or an Amazon SQS queue, are passed into the construct from the infrastructure team.

Let’s have a look at an example. We pass in the external dependency on an SQS Queue, along with details on the desired appPackageVersion and its CodeArtifact details:

export interface OrderProcessingAppConstructProps {
    queue: aws_sqs.Queue,
    appPackageVersion: string,
    codeArtifactDetails: {
        account: string,
        repository: string,
        domain: string
    }
}

export class OrderProcessingAppConstruct extends Construct {

    constructor(scope: Construct, id: string, props: OrderProcessingAppConstructProps) {
        super(scope, id);

        const lambdaFunction = new lambda.Function(this, ‘OrderProcessingLambda’, {
            code: lambda.Code.fromDockerBuild(path.join(__dirname, ‘..’, ‘bundling’), {
                buildArgs: {
                    ‘PACKAGE_VERSION’ : props.appPackageVersion,
                    ‘CODE_ARTIFACT_ACCOUNT’ : props.codeArtifactDetails.account,
                    ‘CODE_ARTIFACT_REPOSITORY’ : props.codeArtifactDetails.repository,
                    ‘CODE_ARTIFACT_DOMAIN’ : props.codeArtifactDetails.domain
                }
            }),
            runtime: lambda.Runtime.NODEJS_16_X,
            handler: ‘node_modules/order-processing-app/dist/index.lambdaHandler’
        });
        const eventSource = new SqsEventSource(props.queue);
        lambdaFunction.addEventSource(eventSource);
    }
}

Note the code lambda.Code.fromDockerBuild(…): We use AWS CDK’s functionality to bundle the code of our Lambda function via a Docker build. The only things which happen inside of the provided Dockerfile are:

the login into the AWS CodeArtifact repository which holds the pre-built application code’s package
the download and installation of the application code’s artifact from AWS CodeArtifact (in this case via npm)

If you are interested in more details on how you can build, bundle and deploy your AWS CDK assets I highly recommend a blog post by my colleague Cory Hall: Building, bundling, and deploying applications with the AWS CDK. It goes into much more detail than what we are covering here.

Looking at the example Dockerfile we can see the two steps described above:

FROM public.ecr.aws/sam/build-nodejs16.x:latest

ARG PACKAGE_VERSION
ARG CODE_ARTIFACT_AWS_REGION
ARG CODE_ARTIFACT_ACCOUNT
ARG CODE_ARTIFACT_REPOSITORY

RUN aws codeartifact login –tool npm –repository $CODE_ARTIFACT_REPOSITORY –domain $CODE_ARTIFACT_DOMAIN –domain-owner $CODE_ARTIFACT_ACCOUNT –region $CODE_ARTIFACT_AWS_REGION
RUN npm install order-processing-app@$PACKAGE_VERSION –prefix /asset

Please note the following:

we use –prefix /asset with our npm install command. This tells npm to install the dependencies into the folder which CDK will mount into the container. All files which should go into the output of the docker build need to be placed here.
the aws codeartifact login command requires credentials with the appropriate permissions to proceed. In case you run this on for example AWS CodeBuild or inside of a CDK Pipeline you need to make sure that the used role has the appropriate policies attached.

Infrastructure Team

The infrastructure team consumes the AWS CDK construct published by the application team. They own the AWS CDK Stack which composes the whole application. Possibly this will only be one of several Stacks owned by the Infrastructure team. Other Stacks might create shared infrastructure (like VPCs, networking) and other applications.

Within the stack for our application the infrastructure team consumes and instantiates the application team’s construct, passes any dependencies into it and then deploys the stack by whatever means they see fit (e.g. through AWS CodePipeline, GitHub Actions or any other form of continuous delivery/deployment).

The dependency on the application team’s construct is manifested in the package.json of the infrastructure team’s CDK app:

{
  “name”: “order-processing-infra-app”,
  …
  “dependencies”: {
    …
    “order-app-construct” : “1.1.0”,
    …
  }
  …
}

Within the created CDK Stack we see the dependency version for the application package as well as how the infrastructure team passes in additional information (like e.g. the queue to use):

export class OrderProcessingInfraStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);   

    const orderProcessingQueue = new Queue(this, ‘order-processing-queue’);

    new OrderProcessingAppConstruct(this, ‘order-processing-app’, {
       appPackageVersion: “2.0.36”,
       queue: orderProcessingQueue,
       codeArtifactDetails: { … }
     });
  }
}

Propagating New Releases

We now have the responsibilities of each team sorted out along with the artifacts owned by each team. But how do we propagate a change done by the application team all the way to production? Or asked differently: how can we invoke the infrastructure team’s CI/CD pipeline with the updated artifact versions of the application team?

We will need to update the infrastructure team’s dependencies on the application teams artifacts whenever a new version of either the application package or the AWS CDK construct is published. With the dependencies updated we can then start the release pipeline.

One approach is to listen and react to events published by AWS CodeArtifact via Amazon EventBridge. On each release AWS CodeArtifact will publish an event to Amazon EventBridge. We can listen to that event, extract the version number of the new release from its payload and start a workflow to update either our dependency on the CDK construct (e.g. in the package.json of our CDK application) or a update the appPackageVersion which the infrastructure team passes into the consumed construct.

Here’s how a release of a new app version flows through the system:

Figure 1 – A release of the application package triggers a change and deployment of the infrastructure team’s CDK Stack

The application team publishes a new app version into AWS CodeArtifact
CodeArtifact triggers an event on Amazon EventBridge
The infrastructure team listens to this event
The infrastructure team updates its CDK stack to include the latest appPackageVersion

The infrastructure team’s CDK Stack gets deployed

And very similar the release of a new version of the CDK Construct:

Figure 2 – A release of the application team’s CDK construct triggers a change and deployment of the infrastructure team’s CDK Stack

The application team publishes a new CDK construct version into AWS CodeArtifact
CodeArtifact triggers an event on Amazon EventBridge
The infrastructure team listens to this event
The infrastructure team updates its dependency to the latest CDK construct
The infrastructure team’s CDK Stack gets deployed

We will not go into the details on how such a workflow could look like, because it’s most likely highly custom for each team (think of different tools used for code repositories, CI/CD). However, here are some ideas on how it can be accomplished:

Updating the CDK Construct dependency

To update the dependency version of the CDK construct the infrastructure team’s package.json (or other files used for dependency tracking like pom.xml) needs to be updated. You can build automation to checkout the source code and issue a command like npm install [email protected]_VERSION (where NEW_VERSION is the value read from the EventBridge event payload). You then automatically create a pull request to incorporate this change into your main branch. For a sample on what this looks like see the blog post Keeping up with your dependencies: building a feedback loop for shared librares.

Updating the appPackageVersion

To update the appPackageVersion used inside of the infrastructure team’s CDK Stack you can either follow the same approach outlined above, or you can use CDK’s capability to read from an AWS Systems Manager (SSM) Parameter Store parameter. With that you wouldn’t put the value for appPackageVersion into source control, but rather read it from SSM Parameter Store. There is a how-to for this in the AWS CDK documentation: Get a value from the Systems Manager Parameter Store. You then start the infrastructure team’s pipeline based on the event of a change in the parameter.

To have a clear understanding of what is deployed at any given time and in order to see the used parameter value in CloudFormation I’d recommend using the option described at Reading Systems Manager values at synthesis time.

Conclusion

You’ve seen how the AWS Cloud Development Kit and its Construct concept can help to ensure team independence and agility even though multiple teams (in our case an application development team and an infrastructure team) work together to bring a new version of an application into production. To do so you have put the application team in charge of not only their application code, but also of the parts of the infrastructure they use to run their application on. This is still in line with the discussed split-team approach as all shared infrastructure as well as the final deployment is in control of the infrastructure team and is only consumed by the application team’s construct.

About the Authors

As a Solutions Architect Jörg works with manufacturing customers in Germany. Before he joined AWS in 2019 he held various roles like Developer, DevOps Engineer and SRE. With that Jörg enjoys building and automating things and fell in love with the AWS Cloud Development Kit.

Mo joined AWS in 2020 as a Technical Account Manager, bringing with him 7 years of hands-on AWS DevOps experience and 6 year as System operation admin. He is a member of two Technical Field Communities in AWS (Cloud Operation and Builder Experience), focusing on supporting customers with CI/CD pipelines and AI for DevOps to ensure they have the right solutions that fit their business needs.

Secure CDK deployments with IAM permission boundaries

The AWS Cloud Development Kit (CDK) accelerates cloud development by allowing developers to use common programming languages when modelling their applications. To take advantage of this speed, developers need to operate in an environment where permissions and security controls don’t slow things down, and in a tightly controlled environment this is not always the case. Of particular concern is the scenario where a developer has permission to create AWS Identity and Access Management (IAM) entities (such as users or roles), as these could have permissions beyond that of the developer who created them, allowing for an escalation of privileges. This approach is typically controlled through the use of permission boundaries for IAM entities, and in this post you will learn how these boundaries can now be applied more effectively to CDK development – allowing developers to stay secure and move fast.

Time to read
10 minutes

Learning level
Advanced (300)

Services used

AWS Cloud Development Kit (CDK)

AWS Identity and Access Management (IAM)

Applying custom permission boundaries to CDK deployments

When the CDK deploys a solution, it assumes a AWS CloudFormation execution role to perform operations on the user’s behalf. This role is created during the bootstrapping phase by the AWS CDK Command Line Interface (CLI). This role should be configured to represent the maximum set of actions that CloudFormation can perform on the developers behalf, while not compromising any compliance or security goals of the organisation. This can become complicated when developers need to create IAM entities (such as IAM users or roles) and assign permissions to them, as those permissions could be escalated beyond their existing access levels. Taking away the ability to create these entities is one way to solve the problem. However, doing this would be a significant impediment to developers, as they would have to ask an administrator to create them every time. This is made more challenging when you consider that security conscious practices will create individual IAM roles for every individual use case, such as each AWS Lambda Function in a stack. Rather than taking this approach, IAM permission boundaries can help in two ways – first, by ensuring that all actions are within the overlap of the users permissions and the boundary, and second by ensuring that any IAM entities that are created also have the same boundary applied. This blocks the path to privilege escalation without restricting the developer’s ability to create IAM identities. With the latest version of the AWS CLI these boundaries can be applied to the execution role automatically when running the bootstrap command, as well as being added to IAM entities that are created in a CDK stack.

To use a permission boundary in the CDK, first create an IAM policy that will act as the boundary. This should define the maximum set of actions that the CDK application will be able to perform on the developer’s behalf, both during deployment and operation. This step would usually be performed by an administrator who is responsible for the security of the account, ensuring that the appropriate boundaries and controls are enforced. Once created, the name of this policy is provided to the bootstrap command. In the example below, an IAM policy called “developer-policy” is used to demonstrate the command.

cdk bootstrap –custom-permissions-boundary developer-policy

Once this command runs, a new bootstrap stack will be created (or an existing stack will be updated) so that the execution role has this boundary applied to it. Next, you can ensure that any IAM entities that are created will have the same boundaries applied to them. This is done by either using a CDK context variable, or the permissionBoundary attribute on those resources. To explain this in some detail, let’s use a real world scenario and step through an example that shows how this feature can be used to restrict developers from using the AWS Config service.

Installing or upgrading the AWS CDK CLI

Before beginning, ensure that you have the latest version of the AWS CDK CLI tool installed. Follow the instructions in the documentation to complete this. You will need version 2.54.0 or higher to make use of this new feature. To check the version you have installed, run the following command.

cdk –version

Creating the policy

First, let’s begin by creating a new IAM policy. Below is a CloudFormation template that creates a permission policy for use in this example. In this case the AWS CLI can deploy it directly, but this could also be done at scale through a mechanism such as CloudFormation Stack Sets. This template has the following policy statements:

Allow all actions by default – this allows you to deny the specific actions that you choose. You should carefully consider your approach to allow/deny actions when creating your own policies though.
Deny the creation of users or roles unless the “developer-policy” permission boundary is used. Additionally limit the attachment of permissions boundaries on existing entities to only allow “developer-policy” to be used. This prevents the creation or change of an entity that can escalate outside of the policy.
Deny the ability to change the policy itself so that a developer can’t modify the boundary they will operate within.
Deny the ability to remove the boundary from any user or role
Deny any actions against the AWS Config service

Here items 2, 3 and 4 all ensure that the permission boundary works correctly – they are controls that prevent the boundary being removed, tampered with, or bypassed. The real focus of this policy in terms of the example are items 1 and 5 – where you allow everything, except the specific actions that are denied (creating a deny list of actions, rather than an allow list approach).

Resources:
PermissionsBoundary:
Type: AWS::IAM::ManagedPolicy
Properties:
PolicyDocument:
Statement:
# —– Begin base policy —————
# If permission boundaries do not have an explicit allow
# then the effect is deny
– Sid: ExplicitAllowAll
Action: “*”
Effect: Allow
Resource: “*”
# Default permissions to prevent privilege escalation
– Sid: DenyAccessIfRequiredPermBoundaryIsNotBeingApplied
Action:
– iam:CreateUser
– iam:CreateRole
– iam:PutRolePermissionsBoundary
– iam:PutUserPermissionsBoundary
Condition:
StringNotEquals:
iam:PermissionsBoundary:
Fn::Sub: arn:${AWS::Partition}:iam::${AWS::AccountId}:policy/developer-policy
Effect: Deny
Resource: “*”
– Sid: DenyPermBoundaryIAMPolicyAlteration
Action:
– iam:CreatePolicyVersion
– iam:DeletePolicy
– iam:DeletePolicyVersion
– iam:SetDefaultPolicyVersion
Effect: Deny
Resource:
Fn::Sub: arn:${AWS::Partition}:iam::${AWS::AccountId}:policy/developer-policy
– Sid: DenyRemovalOfPermBoundaryFromAnyUserOrRole
Action:
– iam:DeleteUserPermissionsBoundary
– iam:DeleteRolePermissionsBoundary
Effect: Deny
Resource: “*”
# —– End base policy —————
# — Begin Custom Organization Policy —
– Sid: DenyModifyingConfig
Effect: Deny
Action: config:*
Resource: “*”
# — End Custom Organization Policy —
Version: “2012-10-17”
Description: “Bootstrap Permission Boundary”
ManagedPolicyName: developer-policy
Path: /

Save the above locally as developer-policy.yaml and then you can deploy it with a CloudFormation command in the AWS CLI:

aws cloudformation create-stack –stack-name DeveloperPolicy
–template-body file://developer-policy.yaml
–capabilities CAPABILITY_NAMED_IAM

Creating a stack to test the policy

To begin, create a new CDK application that you will use to test and observe the behaviour of the permission boundary. Create a new directory with a TypeScript CDK application in it by executing these commands.

mkdir DevUsers && cd DevUsers
cdk init –language typescript

Once this is done, you should also make sure that your account has a CDK bootstrap stack deployed with the cdk bootstrap command – to start with, do not apply a permission boundary to it, you can add that later an observe how it changes the behaviour of your deployment. Because the bootstrap command is not using the –cloudformation-execution-policies argument, it will default to arn:aws:iam::aws:policy/AdministratorAccess which means that CloudFormation will have full access to the account until the boundary is applied.

cdk bootstrap

Once the command has run, create an AWS Config Rule in your application to be sure that this works without issue before the permission boundary is applied. Open the file lib/dev_users-stack.ts and edit its contents to reflect the sample below.

import * as cdk from ‘aws-cdk-lib’;
import { ManagedRule, ManagedRuleIdentifiers } from ‘aws-cdk-lib/aws-config’;
import { Construct } from “constructs”;

export class DevUsersStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);

new ManagedRule(this, ‘AccessKeysRotated’, {
configRuleName: ‘access-keys-policy’,
identifier: ManagedRuleIdentifiers.ACCESS_KEYS_ROTATED,
inputParameters: {
maxAccessKeyAge: 60, // default is 90 days
},
});
}
}

Next you can deploy with the CDK CLI using the cdk deploy command, which will succeed (the output below has been truncated to show a summary of the important elements).

❯ cdk deploy
Synthesis time: 3.05s
DevUsersStack
Deployment time: 23.17s

Stack ARN:
arn:aws:cloudformation:ap-southeast-2:123456789012:stack/DevUsersStack/704a7710-7c11-11ed-b606-06d79634f8d4

Total time: 26.21s

Before you deploy the permission boundary, remove this stack again with the cdk destroy command.

❯ cdk destroy
Are you sure you want to delete: DevUsersStack (y/n)? y
DevUsersStack: destroying… [1/1]
DevUsersStack: destroyed

Using a permission boundary with the CDK test application

Now apply the permission boundary that you created above and observe the impact it has on the same deployment. To update your booststrap with the permission boundary, re-run the cdk bootstrap command with the new custom-permissions-boundary parameter.

cdk bootstrap –custom-permissions-boundary developer-policy

After this command executes, the CloudFormation execution role will be updated to use that policy as a permission boundary, which based on the deny rule for config:* will cause this same application deployment to fail. Run cdk deploy again to confirm this and observe the error message.

Deployment failed: Error: Stack Deployments Failed: Error: The stack
named DevUsersStack failed creation, it may need to be manually deleted
from the AWS console:
ROLLBACK_COMPLETE:
User: arn:aws:sts::123456789012:assumed-role/cdk-hnb659fds-cfn-exec-role-123456789012-ap-southeast-2/AWSCloudFormation
is not authorized to perform: config:PutConfigRule on resource: access-keys-policy with an explicit deny in a
permissions boundary

This shows you that the action was denied specifically due to the use of a permissions boundary, which is what was expected.

Applying permission boundaries to IAM entities automatically

Next let’s explore how the permission boundary can be extended to IAM entities that are created by a CDK application. The concern here is that a developer who is creating a new IAM entity could assign it more permissions than they have themselves – the permission boundary manages this by ensuring that entities can only be created that also have the boundary attached. You can validate this by modifying the stack to deploy a Lambda function that uses a role that doesn’t include the boundary. Open the file lib/dev_users-stack.ts again and edit its contents to reflect the sample below.

import * as cdk from ‘aws-cdk-lib’;
import { PolicyStatement } from “aws-cdk-lib/aws-iam”;
import {
AwsCustomResource,
AwsCustomResourcePolicy,
PhysicalResourceId,
} from “aws-cdk-lib/custom-resources”;
import { Construct } from “constructs”;

export class DevUsersStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);

new AwsCustomResource(this, “Resource”, {
onUpdate: {
service: “ConfigService”,
action: “putConfigRule”,
parameters: {
ConfigRule: {
ConfigRuleName: “SampleRule”,
Source: {
Owner: “AWS”,
SourceIdentifier: “ACCESS_KEYS_ROTATED”,
},
InputParameters: ‘{“maxAccessKeyAge”:”60″}’,
},
},
physicalResourceId: PhysicalResourceId.of(“SampleConfigRule”),
},
policy: AwsCustomResourcePolicy.fromStatements([
new PolicyStatement({
actions: [“config:*”],
resources: [“*”],
}),
]),
});
}
}

Here the AwsCustomResource is used to provision a Lambda function that will attempt to create a new config rule. This is the same result as the previous stack but in this case the creation of the rule is done by a new IAM role that is created by the CDK construct for you. Attempting to deploy this will result in a failure – run cdk deploy to observe this.

Deployment failed: Error: Stack Deployments Failed: Error: The stack named
DevUsersStack failed creation, it may need to be manually deleted from the AWS
console:
ROLLBACK_COMPLETE:
API: iam:CreateRole User: arn:aws:sts::123456789012:assumed-
role/cdk-hnb659fds-cfn-exec-role-123456789012-ap-southeast-2/AWSCloudFormation
is not authorized to perform: iam:CreateRole on resource:
arn:aws:iam::123456789012:role/DevUsersStack-
AWS679f53fac002430cb0da5b7982bd2287S-1EAD7M62914OZ
with an explicit deny in a permissions boundary

The error message here details that the stack was unable to deploy because the call to iam:CreateRole failed because the boundary wasn’t applied. The CDK now offers a straightforward way to set a default permission boundary on all IAM entities that are created, via the CDK context variable core:permissionsBoundary in the cdk.json file.

{
“context”: {
“@aws-cdk/core:permissionsBoundary”: {
“name”: “developer-policy”
}
}
}

This approach is useful because now you can import constructs that create IAM entities (such as those found on Construct Hub or out of the box constructs that create default IAM roles) and have the boundary apply to them as well. There are alternative ways to achieve this, such as setting a boundary on specific roles, which can be used in scenarios where this approach does not fit. Make the change to your cdk.json file and run the CDK deploy again. This time the custom resource will attempt to create the config rule using its IAM role instead of the CloudFormation execution role. It is expected that the boundary will also protect this Lambda function in the same way – run cdk deploy again to confirm this. Note that the deployment updates from CloudFormation show that this time the role creation succeeds this time, and a new error message is generated.

Deployment failed: Error: Stack Deployments Failed: Error: The stack named
DevUsersStack failed creation, it may need to be manually deleted from the AWS
console:
ROLLBACK_COMPLETE:
Received response status [FAILED] from custom resource. Message returned: User:
arn:aws:sts::123456789012:assumed-role/DevUsersStack-
AWS679f53fac002430cb0da5b7982bd2287S-84VFVA7OGC9N/DevUsersStack-
AWS679f53fac002430cb0da5b7982bd22872-MBnArBmaaLJp
is not authorized to perform: config:PutConfigRule on resource: SampleRule with an explicit deny in a permissions boundary

In this error message you can see that the user it refers to is DevUsersStack-AWS679f53fac002430cb0da5b7982bd2287S-84VFVA7OGC9N rather than the CloudFormation execution role. This is the role being used by the custom Lambda function resource, and when it attempts to create the Config rule it is rejected because of the permissions boundary in the same way. Here you can see how the boundary is being applied consistently to all IAM entities that are created in your CDK app, which ensures the administrative controls can be applied consistently to everything a developer does with a minimal amount of overhead.

Cleanup

At this point you can either choose to remove the CDK bootstrap stack if you no longer require it, or remove the permission boundary from the stack. To remove it, delete the CDKToolkit stack from CloudFormation with this AWS CLI command.

aws cloudformation delete-stack –stack-name CDKToolkit

If you want to keep the bootstrap stack, you can remove the boundary by following these steps:

Browse to the CloudFormation page in the AWS console, and select the CDKToolit stack.
Select the ‘Update’ button. Choose “Use Current Template” and then press ‘Next’
On the parameters page, find the value InputPermissionsBoundary which will have developer-policy as the value, and delete the text in this input to leave it blank. Press ‘Next’ and the on the following page, press ‘Next’ again
On the final page, scroll to the bottom and check the box acknowledging that CloudFormation might create IAM resources with custom names, and choose ‘Submit’

With the permission boundary no longer being used, you can now remove the stack that created it as the final step.

aws cloudformation delete-stack –stack-name DeveloperPolicy

Conclusion

Now you can see how IAM permission boundaries can easily be integrated in to CDK development, helping ensure developers have the control they need while administrators can ensure that security is managed in a way that meets the needs of the organisation as well.

With this being understood, there are next steps you can take to further expand on the use of permission boundaries. The CDK Security and Safety Developer Guide document on GitHub outlines these approaches, as well as ways to think about your approach to permissions on deployment. It’s recommended that developers and administrators review this, and work to develop and appropriate approach to permission policies that suit your security goals.

Additionally, the permission boundary concept can be applied in a multi-account model where each Stage has a unique boundary name applied. This can allow for scenarios where a lower-level environment (such as a development or beta environment) has more relaxed permission boundaries that suit troubleshooting and other developer specific actions, but then the higher level environments (such as gamma or production) could have the more restricted permission boundaries to ensure that security risks are more appropriately managed. The mechanism for implement this is defined in the security and safety developer guide also.

About the authors:

Brian Farnhill

Brian Farnhill is a Software Development Engineer at AWS, helping public sector customers in APAC create impactful solutions running in the cloud. His background is in building solutions and helping customers improve DevOps tools and processes. When he isn’t working, you’ll find him either coding for fun or playing online games.

David Turnbull

David Turnbull is a Software Development Engineer at AWS, helping public sector customers in APAC create impactful solutions running in the cloud. He likes to comprehend new programming languages and has used this to stray out of his line. David writes computer simulations for fun.