Behind the Scenes on AWS Contributions to Cloud Native Open Source Projects

Amazon Elastic Kubernetes Service (Amazon EKS) is well known in the Kubernetes community. But few realize that AWS engineers are closely involved and contributing upstream to Kubernetes and to many more cloud native open source projects.

In the past year alone, AWS contributed significantly to containerd, Cortex, etcd, Fluentd, nerdctl, Notary, OpenTelemetry, Thanos, and Tinkerbell. We employ maintainers and contributors on these projects and we will contribute more to these and other projects in the coming year. Here’s a behind-the-scenes look at our contributions and why we’re investing in the open source projects we support. You can also meet many of our contributors in the AWS booth at KubeCon Europe in Amsterdam, April 18-21, 2023 and hear from them in our virtual Container Day event 9 a.m. – 4 p.m. CEST on April 18.

“Amazon EKS is committed to open source and we are spending a lot of our cycles now focused on contributing back to the community. Kubernetes is part of a community that’s bigger than AWS and so we’re continuing to be committed to maintaining and helping that community to be successful because without it, we wouldn’t exist, either,” said Barry Cooks, Vice President, Kubernetes, at AWS and a Cloud Native Computing Foundation (CNCF) governing board member.

AWS contributes to Kubernetes and Etcd

Today, AWS is heavily involved in open source, cloud native projects. Consider, for example, some of our recent key contributions to Kubernetes and etcd, the underlying data store for Kubernetes.

“We’re building the AWS cloud provider, contributing to CAPI (cluster API), and serve as part of the security response committee. We helped implement gzip optimization which improves the performance of Kubernetes clients,” said Nathan Taber who leads the product team for Kubernetes at AWS, in a keynote at KubeCon North America 2022. “With etcd we’re bringing our operational learnings from running just so much etcd at scale, back into the community.”

The AWS cloud provider for Kubernetes is the open source interface between a Kubernetes cluster and AWS service APIs. This project allows a Kubernetes cluster to provision, monitor, and remove AWS resources necessary for operation of the cluster.

As of Kubernetes 1.27, AWS has just finished a multi-year effort to migrate our legacy cloud provider out of tree to an external cloud provider. The cloud provider migration reduces binary bloat in the main kubernetes/kubernetes (k/k) repository, as well as reduces dependency complexity and the surface area for security vulnerabilities.

AWS has also built a webhook framework that allows cloud providers to host webhooks in their cloud-controller-managers, which makes certain migration tasks easier. One use case for this is helping other cloud providers to migrate the persistent volume labeller admission controllers from the API server code, which is one of the last areas of cloud provider specific code that needs to be migrated out of core Kubernetes.

“We’ve included a lot of space in our planning for upstream open source work this year,” said Nick Turner, software developer on the AWS Kubernetes team and a chair in Kubernetes SIG-cloud-provider. “Expect us to keep up our contributions to the cloud provider and the load balancer controller as well as increase our investments in the AWS IAM authenticator for Kubernetes and KMS encryption provider.”

These and other Kubernetes contributions bring value to the entire Kubernetes community as well as to the EKS service and its customers.

Since KubeCon Detroit last fall, the EKS-etcd team has contributed numerous improvements to etcd. Chao Chen contributed to the effort to improve testing mechanisms for etcd by unifying the test frameworks used by etcd tests. Baoming Wang contributed an important metric to the Kubernetes API server code base which will help catch data corruption issues early. We’ve also worked on building a linearizability test suite, made various improvements to the core etcd database and the etcd backend database Bolt-DB, contributed to documentation, made helm more resilient to etcd side transient errors, and fixed an issue with the installation script for argo-cd-helmfile.

What’s driving AWS to contribute more to cloud native open source

Like most modern companies, AWS builds many of its services with open source components. There are several business and technical reasons we do this, which we’ve outlined in an article on The New Stack about why we invest in sustainable open source. We recognize that the success of our services depends on the success of those underlying open source projects.

Given that most of the open source projects that AWS supports underpin specific services, AWS tasks all engineers working in services, regardless of their assigned sub-service teams, to contribute in any way that they can to those upstream projects.

The result is a virtuous cycle that promotes mutually beneficial growth. As AWS services grow, so too do the open source projects upon which they are based because of AWS contributions and support. Conversely, as these open source projects grow from the contributions of other companies and developers, so do the benefits to the AWS services that depend upon them.

AWS contributions focus on performance and scale

AWS contributions to open source typically come as a practical matter in the form of bug fixes, code reviews, documentation, new features, or security enhancements. Like many developers working in the open source space, AWS engineers often work to address issues that arise in the course of their day jobs and then share the fixes with the rest of the open source community. Similarly, new features for an open source project are developed by AWS engineers to expand the project’s scale or performance which in turn increases the project’s usability, stability, and overall appeal.

Because AWS has a large number of Kubernetes clusters under management, it affords AWS a unique opportunity to test the limitations of open source software and build its edges stronger and further out from its initial core. So many of the contributions that our team members do for upstream Kubernetes, etcd, containerd, and other projects center on making sure that we provide insights to the upstream community on where things break down in scaling, production, and operational readiness.

The resulting insights provide value for the entire open source community as well as our own customers.

Take for example, the lag fix that curiously performed as a latency expander. AWS engineer Shyam Jeedigunta, was looking at the logs and metrics collected from thousands of production EKS clusters. He determined that Gzip compression is enabled inside the Kubernetes API server to reduce the demand on network bandwidth and to decrease latency.  However, the compression was actually increasing the latency for large list requests made by clients to the Kubernetes API server. Shyam, who is also co-chair of the Kubernetes scalability special interest group (SIG), took a deep dive into the issue to investigate whether a particular compression level created the problem and if so, could the compression level be reduced? Could Gzip compression be disabled entirely? What impact would that have on latency and network bandwidth?

Answers to questions like this one lead to contributions upstream in etcd and core Kubernetes from AWS service teams. Customers and others often report these kinds of issues to the project as well, but the nature of the problem isn’t clear until it’s viewed on 1,000 nodes and 200,000 objects of a certain kind. AWS engineers diagnose what’s going on, put together troubleshooting information, and collate information into proposals on how to fix the problem(s) to upstream to Kubernetes. AWS likes to spearhead fixing issues that arise from running the projects at scale.

Key AWS contributions

AWS contributes to many Kubernetes sub projects and SIGs. For example, Micah Hausler and Sri Saran Balaji Vellore Rajakumar serve on the Kubernetes Security Response Committee (SRC), Davanum Srinivas (Dims) chairs SIG-Architecture and SIG-k8s-infra, and Nick Turner is a chair in SIG-cloud-provider.  Key contributions have gone into projects including containerd, Cortex, cdk8s, CNI, nerdctl and Prometheus. Innovations have also been substantial and include TorchServe, improved ARM support through AWS Graviton, and the Virtual GPU plugin. However, this is not an exhaustive or complete list of AWS contributions and innovations in the cloud native community.

On containerd, for example, AWS employs two maintainers who contribute features and help ensure the project’s general health and security. Key contributions from AWS engineers to the containerd project include OpenTelemetry integration in the 1.7.0 release, improved tracing, and improved fuzzing integration.

“It’s been awesome to see the growth on the container runtime team here at AWS these past few years. I love to see the eagerness to learn not just *how* to contribute, but how to do it well and really benefit the broader community,” said Phil Estes, a principal engineer at AWS and a containerd maintainer.

Nerdctl, a Docker-compatible CLI for containerd and a containerd sub-project, is used by other open source projects Lima, Finch, and Rancher Desktop. AWS engineers significantly improved nerdctl’s compose support by adding 11 out of 13 missing compose commands. We enhanced nerdctl’s image signing/verification support by contributing cosign support for nerdctl compose, and notation support for nerdctl. And engineer Jin Dong recently became the first reviewer for the project from AWS.

AWS services are also standardizing on OpenTelemetry, a set of open source tools and standards for collecting metrics, logs, and traces to measure application performance. AWS Distro for OpenTelemetry (ADOT), OpenSearch, and CloudWatch are all building on OpenTelemetry and contribute back to the upstream project. All ADOT code is 100% open source and contributed upstream. Key contributions include: adding functionality to upstream observability components such as OpenTelemetry language SDKs, collectors, and agents.

“Amazon is the fourth largest contributor to OpenTelemetry with a dedicated maintainer and many contributors working on the project. A key contribution has been improving collector and metric stability, including improved Prometheus interoperability with OpenTelemetry,” said Taber.

A fourth example is Cortex where AWS is the top supporter of the project and employs three maintainers. As AWS runs this project at scale, engineers have the opportunity to identify and fix scaling cliffs before they become a problem for the rest of the community. Some of the key contributions are new features and performance improvements. Examples include partition compactor, Ring DynamoDB Multikey KV, out of order samples ingestion, snappy-block gRPC compression, ARM images, and Thanos PromQL engine integration.

We have also contributed bug fixes to Thanos, a tool for setting up highly available Prometheus instances with long term storage. Thanos is a CNCF incubating project which Cortex depends on. We participated in the development of the new Thanos PromQL engine and open sourced a tool that could use fuzzing for correctness testing which has already caught a few bugs.

AWS employs four maintainers on Tinkerbell, a cloud native open source bare metal provisioning engine for EKS Anywhere and a CNCF Sandbox project. Key contributions include organizing the project roadmap, VLAN support, a Kubernetes native backend, out-of-band management Kubernetes controller, Helm Chart deployment, and Cluster API provider updates.

“Our team has done a lot of work to update the Tinkerbell backend from Postgres to native Kubernetes,” said Taber.

AWS employs three maintainers in Notation, a sub project of Notary under the CNCF, and is the third largest code contributor to Notary. Notation enables the generation of cryptographic signatures for container images so users can verify that they come from a trusted source or process. AWS founded the sub project with other contributors to come up with specifications for signature format, generation, verification, and revocation. As part of this work we also defined a process for evaluating signature envelope formats like COSE ensuring that they met a high security bar before they were used in Notation.

AWS employees have either written or reviewed the majority of code contributions for the core Notation libraries and a CLI. AWS also employs a maintainer to Ratify so Kubernetes users can easily enable policies for signature verification with their existing admissions controllers. Similarly we also employ a maintainer to ORAS so signatures can easily be pushed to OCI registries. Notation enables users to define granular trust policies for defining which sources they want to trust, balance deployment safety and security needs, and flexibility on secure signing key storage options.

We have contributed to many other open source projects as well, including Crossplane, for which AWS added support for EKS IRSA in the China region and fixed Amazon Route 53 wildcard support, and Backstage, with AWS Proton and AWS Code Suite (AWS CodeBuild, AWS CodePipeline, and AWS CodeDeploy).

“We’re very excited about doing more development in the open, sharing that with our customers, and working directly in some cases with customers on their needs in open source projects and working together to make the community stronger in the Kubernetes space,” Cooks said.

AWS is open

We want to hear from you. AWS engineers are open to helping community members through collaboration and contribution opportunities. Tell us how we can help meet your needs.

AWS engineers, solutions architects, and product managers are hanging out on the Kubernetes community and the CNCF community Slack channels. Channels where you can reach out to us include the provider AWS channel and Karpenter channel, and the AWS controllers for Kubernetes channel on the Kubernetes Slack.

Find us and tell us what you’d like us to work on. Or if you have a particular issue that you found in one of these upstream projects that you think our engineers can help move the needle on. Come find us and talk to us in the CNCF’s AWS Slack channel and join us for our virtual Container Day on April 18, before KubeCon EU.

Flatlogic Admin Templates banner

Building Automation for Fraud Detection Using OpenSearch and Terraform

Organizations that interface with online payments are continuously monitoring and guarding against fraudulent activity. Transactional fraud usually presents itself as discrete data points, making it challenging to identify multiple actors involved in the same group of transactions. Even a single actor operating over a period of time can be hard to detect. Visibility is key to prevent fraud incidents from occurring and to give meaningful knowledge of the activities within your environment to data, security, and operations engineers.

Understanding the connections between individual data points can reduce the time for customers to detect and prevent fraud. You can use a graph database to store transaction information along with the relationships between individual data points. Analyzing those relationships through a graph database can uncover patterns difficult to identify with relational tables. Fraud graphs enable customers to find common patterns between transactions, such as phone numbers, locations, and origin and destination accounts. Additionally, combining fraud graphs with full text search provides additional benefits as it can simplify analysis and integration with existing applications.

In our solution, financial analysts can upload graph data, which gets automatically ingested into the Amazon Neptune graph database service and replicated into Amazon OpenSearch Service for analysis. Data ingestion is automated with Amazon Simple Storage Service (Amazon S3) and Amazon Simple Queue Service (Amazon SQS) integration. We do data replication through AWS Lambda functions and AWS Step Functions for orchestration. The design is using open source tools and AWS Managed Services to build resources and is available in this https://github.com/aws-samples/neptune-fraud-detection-with-opensearch GitHub repository under an MIT-0 license. You will use Terraform and Docker to deploy the architecture, and will be able to send search requests to the system to explore the dataset.

Solution overview

This solution takes advantage of native integration between AWS services for scalability and performance, as well as the Neptune-to-OpenSearch Service replication pattern described in Neptune’s official documentation.

Figure 1 An architectural diagram that illustrates the infrastructure state and workflow as defined in the Terraform templates.

The process for this solution consists of the following steps, also shown in the architecture diagram here:

Financial analyst uploads graph data files to an Amazon S3 bucket.

Note: The data files are in a Gremlin load data format (CSV) and can include vertex files and edge files.

The action of the upload invokes a PUT object event notification with a destination set to an Amazon SQS queue.
The SQS queue is configured as an AWS Lambda event source, which invokes a Lambda function.
This Lambda function sends an HTTP request to an Amazon Neptune database to load data stored in an S3 bucket.
The Neptune database reads data from the S3 endpoint defined in the Lambda request and loads the data into the graph database.
An Amazon EventBridge rule is scheduled to run every 5 minutes. This rule targets an AWS Step Functions state machine to create a new execution.
The Neptune Poller step function (state machine) replicates the data in the Neptune database to an OpenSearch Service cluster.
Note: The Neptune Poller step function is responsible for continually syncing new data after the initial data upload using Neptune Streams.

User can access the replicated data from the Neptune database with Amazon OpenSearch Service.
Note: A Lambda function is invoked to send a search request or query to an OpenSearch Service endpoint to get results.

Prerequisites

To implement this solution, you must have the following prerequisites:

An AWS account with local credentials is configured. For more information, check the documentation on configuration and credential file settings.
The latest version of the AWS Command Line Interface (AWS CLI).
An IAM user with Git credentials.
A Git client to clone the source code provided.
A Bash shell.

Docker installed on your localhost.

Terraform installed on your localhost.

Deploying the Terraform templates

The solution is available in this GitHub repository with the following structure:

data: Contains a sample dataset to be used with the solution for demonstration purposes. Information on fictional transactions, identities and devices is represented in files within the nodes/ folder, and relationships between them are represented in files in the edges/ folder.
terraform: This folder contains the Terraform modules to deploy the solution.
documents: This folder contains the architecture diagram image file of the solution.

Create a local directory called NeptuneOpenSearchDemo and clone the source code repository:

mkdir -p $HOME/NeptuneOpenSearchDemo

cd $HOME/NeptuneOpenSearchDemo

git clone https://github.com/aws-samples/neptune-fraud-detection-with-opensearch.git

Change directory into the Terraform directory:

cd $HOME/NeptuneOpenSearchDemo neptune-fraud-detection-with-opensearch /terraform

Make sure that the Docker daemon is running:

docker info

If the previous command outputs an error that is unable to connect to the Docker daemon, start Docker and run the command again.

Initialize the Terraform folder to install required providers:

terraform init

The solution is deployed on us-west-2 by default. The user can change this behavior by modifying the variable “region” in variables.tf file.

Deploy the AWS services:

terraform apply -auto-approve

Note: Deployment will take around 30 minutes due to the time necessary to provision the Neptune and OpenSearch Service clusters.

To retrieve the name of the S3 bucket to upload data to:

aws s3 ls | grep “neptunestream-loader.*d$”

Upload node data to the S3 bucket obtained in the previous step:

aws s3 cp $HOME/NeptuneOpenSearchDemo/neptune-fraud-detection-with-opensearch /data s3:// neptunestream-loader-us-west-2-123456789012 –recursive

Note: This is a sample dataset for demonstration purposes only created from the IEEE-CIS Fraud Detection dataset.

Test the solution

After the solution is deployed and the dataset is uploaded to S3, the dataset can be retrieved and explored through a Lambda function that sends a search request to the OpenSearch Service cluster.

Confirm the Lambda function that sends a request to OpenSearch was deployed correctly:

aws lambda get-function –function-name NeptuneStreamOpenSearchRequestLambda –-query ‘Configuration.[FunctionName, State]’

Invoke the Lambda function to see all records present in OpenSearch that are added from Neptune:

aws lambda invoke –function-name NeptuneStreamOpenSearchRequestLambda response.json

The results of the Lambda invocation are stored in the response.json file. This file contains the total number of records in the cluster and all records ingested up to that point. The solution stores records in the index amazon_neptune. An example of a node with device information looks like this:

{
“_index”: “amazon_neptune”,
“_type”: “_doc”,
“_id”: “1fb6d4d2936d6f590dc615142a61059e”,
“_score”: 1.0,
“_source”: {
“entity_id”: “d3”,
“document_type”: “vertex”,
“entity_type”: [
“vertex”
],
“predicates”: {
“deviceType”: [
{
“value”: “desktop”
}
],
“deviceInfo”: [
{
“value”: “Windows”
}
]
}
}
}

Cleaning up

To avoid incurring future charges, clean up the resources deployed in the solution:

terraform destroy –auto-approve

The command will output information on resources being destroyed.

Destroy complete! Resources: 101 destroyed.

Conclusion

Fraud graphs are complementary to other techniques organizations can use to detect and prevent fraud. The solution presented in this blog post reduces the time financial analysts would take to access transactional data by automating data ingestion and replication. It also improves performance for systems with growing volumes of data when compared to executing a large number of insert statements or other API calls.

Flatlogic Admin Templates banner

Dive Deeper into Data Lake for Nonprofits, a New Open Source Solution from AWS for Salesforce for Nonprofits

Nonprofits are using cloud-based solutions for fundraising, donor and member management, and communications. With this move online, they have access to more data than ever. This data has the potential to transform their missions and increase their impact. However, sharing, connecting, and interpreting data from many different sources can be a challenge. To address this challenge, Amazon Web Services (AWS) and AWS Partner Salesforce for Nonprofits announced the general availability of Data Lake for Nonprofits – Powered by AWS.

Data Lake for Nonprofits is an open source application that helps nonprofit organizations set up a data lake in their AWS account and populate it with the data that they have in the Salesforce Non Profit Success Pack (NPSP) schema. This data resides in Amazon Relational Database Service (Amazon RDS), where it can be accessed by other AWS services like Amazon Redshift and Amazon QuickSight, as well as Business Intelligence products such as Tableau.

This post is written for developers and solution integrator partners who want to get a closer look at the architecture and implementation of Data Lake for Nonprofits to better understand the solution.  We’ll walk you through the architecture and how to set up a data lake using AWS Amplify.

Prerequisites

To follow along with this walkthrough, you must have the following prerequisites:

An AWS Account

An AWS Identity and Access Management (IAM) user with Administrator permissions that enables you to interact with your AWS account
A Salesforce account that has the Non Profit Success Pack managed packages installed
A Salesforce user that has API permissions to interact with your Salesforce org

Git command line interface installed on your computer for cloning the repository

Node.js and Yarn Package Manager installed on your computer for cleanup

Solution Overview

The following diagram shows the solution’s high-level architecture.

Salesforce and AWS have released the source code in GitHub under a BSD 3-Clause license so nonprofits and their cloud partners can use, customize, and innovate on top of it at no cost.

Use your command-line shell to clone the GitHub repository for your own development.

git clone https://github.com/salesforce-misc/Data-Lake-for-Nonprofits

The solution consists of two tiers which are frontend and backend applications. In the following sections, we’ll walk you through both architectures and show you how to install a data lake.

Frontend Walkthrough

We have developed the frontend application using React with Typescript, and it has three layers.

The view layer comprises React components.
The model layer is where the most of the application logic is maintained. The models are Mobx State Tree (MST) types with properties, actions, and computed values. Relationships between models are expressed as MST trees.
The API layer is used to communicate with AWS Services using AWS SDK for JavaScript v3.

The frontend application has been built and zipped for AWS Amplify using GitHub Workflows. This is done for every change in the repository. The latest build can be found in the GitHub repository.

Backend Walkthrough

We have developed the backend application using using AWS CloudFormation templates that can be found under the infra/cf folder in the GitHub repository. These templates will provision the resources on your AWS account for your data lake as below.

vpc.yaml template is used to provision the Amazon Virtual Private Cloud (Amazon VPC) that the application will be running in.

buckets.yaml template is used to provision several Amazon Simple Storage Service (Amazon S3) buckets.

datastore.yaml is used to provision an Amazon Relational Database Service (Amazon RDS) PostgreSQL database.

athena.yaml is used to provision Amazon Athena with a custom workgroup.

step_function.yaml is used to provision AWS Step Functions and related AWS Lambda functions. Step Functions is used to orchestrate the Lambda functions that are going to import your data from your Salesforce account into Amazon RDS for PostgreSQL using an Amazon AppFlow connection.

Lambda functions can be found in the infra/lambdasfolder.

src/cleanupSQL.ts performs the movement of data from the data-loading database schema to the public database schema.

src/filterSchemaListing.ts filters out S3’s “schema/” key as well as queries Salesforce for deleted objects.

src/finalizeSQL.ts drops tables that will be no longer needed in the application to save cost.

src/listEntities.ts calls APIs since the output is too large for Step Functions.

src/processImport.ts imports a Lambda function which writes to RDS the data which is sent to the Amazon SQS queue.

src/pullNewSchema.ts queries Amazon AppFlow and then Salesforce to gather any new or updated fields.

src/setupSQL.ts sets up the data-loading database schema and creates the tables based on the schema file.

src/statusReport.ts performs a status update to S3 based on where it is in the Step Functions State Machine.

src/updateFlowSchema.ts uses the updated schema file on S3 to create or update the Amazon AppFlow flow.

Amazon Simple Queue Service (Amazon SQS) is used to queue the import data so that the Lambda function can use it.

Amazon EventBridge is used to set up a job that syncs your data based on your choice of frequency.

Amazon CloudWatch is used to keep logs during installation as well as synchronization. The application creates a CloudWatch dashboard to track the usage of the data lake.

Deploy the Frontend Application using AWS Amplify

The latest release of the frontend application can be downloaded from the GitHub repository and deployed in AWS Amplify in your AWS account as explained in the User Guide. It typically takes a few minutes to deploy the frontend application, after which a URL to the frontend application is presented. The URL should look like the link here:

https://abc.xyz….amplifyapp.com

When you open the URL, you will see the frontend application, which provides a wizard-like guide to the steps. Each step will guide you through the instructions on how to move forward.

Step 1 will ask for an Access Key ID and Secret Access key for an IAM user of your AWS account. The application shows guidance on how to log in to AWS Management Console and use Identity and Access Management (IAM) to create the IAM user with admin permissions.

This step also requires you to select the AWS region where you would like to create the Amazon AppFlow connection and deploy the data lake.

Step 2 establishes the connection to your Salesforce account. The application guides the user to leverage Amazon AppFlow, which allows AWS to connect to your Salesforce account.

At the end of the page, use the drop down menu to select the connection name and click Next.

Step 3 will help you choose the data objects and set the frequency of data synchronization for your data lake. The data import option can be set to any date and time, and you can choose the frequency from the given options: daily/weekly/monthly.

This page further displays the complete set of objects from your Salesforce account. Choose the necessary objects that you want to import into your data lake.

In Step 4, you can review the data lake configuration and confirm. This step is where you are allowed to go back to the previous step to make any changes if needed.

Step 5 is where the data lake is provisioned, and then your data is imported. This can take half an hour to several hours, depending on the size of the data in your Salesforce account.

Once the data lake is ready, in Step 6, you can find the instructions and information needed to connect to your data lake using business intelligence applications such as Tableau Cloud and Tableau Desktop that will help you visualize your data and analyze it per your business needs.

Cleanup

Keeping the data lake in your AWS account will incur charges due to the resources provisioned. To avoid incurring future charges, run these commands on your terminal where you cloned the GitHub repository and follow the instructions:

cd Data-Lake-for-Nonprofits/app

yarn delete-datalake

Conclusion

This post showed you how AWS services are used to transform your Salesforce data into a data lake. It uses AWS Amplify to host the frontend application and provisions several AWS services for the data lake backend. The architecture is based on the successful collaboration between AWS and Salesforce to build an open source data lake solution using a simple and easy-to-use, wizard-like application.

We invite you to clone the GitHub repository and develop your own solution for your needs, provide feedback, and contribute to the project.

Flatlogic Admin Templates banner

Optimized Video Encoding with FFmpeg on AWS Graviton Processors

If you have not tried video encoding on Graviton lately, now is the time to give it another look. Recent FFmpeg improvements, contributed by AWS and others in the open source community, have increased the performance of fully loaded video workloads on Graviton processors.

Measured on Amazon Elastic Compute Cloud (Amazon EC2) C7g instances, for offline video encoding we saw a 63% performance boost for H.264 and 60% for H.265. Encoding video on C7g costs measured 29% less for H.264 and 18% less for H.265 compared to C6i, the latest x86-based Amazon EC2 instance (both using on-demand instance pricing). This makes C7g the fastest compute optimized cloud instance that is the most cost effective and the most energy efficient for video encoding.

When the AWS Graviton2 instances were introduced, they provided 40% better price performance for many workloads, compared to similar x86 Amazon EC2 instances. Graviton3 features an additional 25% improved performance over Graviton2. Video processing and transcoding has been growing in importance, and Graviton is well suited for this workload. AWS engineers and the open source community have worked on video encoding tools, such as FFmpeg and the codec libraries, to further optimize for Graviton. You can get these improvements on GitHub from a build in the development branch of FFmpeg, or use FFmpeg version 5.2 when it is released.

Use cases

One of the common use cases for video in the cloud is batch transcoding multiple videos concurrently on the same instance. This optimizes for the best throughput and price. Another popular use case is transcoding a single input stream to multiple output formats optimized for different viewing resolutions. Both of these cases require optimizing performance for concurrent processing. For the following benchmarks we scale down the incoming 4k stream and encode multiple target resolutions for each input. Each different target resolution can be used to support different device and network capabilities at their native resolution: 1080p, 720p, 480p, 360p, and 160p.

Figure 1: Encoding multiple streams in parallel on a single instance.

We tested encoding the target videos into H.264 and H.265 using the x264 and x265 open source libraries. The H.264 or AVC (Advanced Video Coding) standard was first published in 2004 and enjoys broad compatibility. Devices including mobile phones, tablets, personal computers, smart TVs, and others generally have support for hardware accelerated H.264 decoding. The H.265 or HEVC (High Efficiency Video Coding) standard was first published in 2013 and has better compression at a given level of quality than H.264, but hardware accelerated decoding is not as widely deployed and patents and licensing restrictions have prevented some companies from adopting it in their software. For most video use cases, having more than one video format will be necessary in order to provide the best quality for devices which can play H.265 and also H.264 for devices without H.265 decoding support.

Offline (batch) encoding

Speed: The following diagram shows the encoding speed in frames per second (FPS) for a sample workload. It was tested comparing FFmpeg 4.2 with the development branches of FFmpeg and x265 that include the latest optimizations.

Figure 2: Speed results are the mean frame per second (FPS) for different input samples.
Higher is better.

Cost: The cost of encoding on the latest Graviton instance, C7g, is compared with the latest Amazon EC2 x86 based instances, C6i and C6a, showing better performance and a reduction of 18-29% in cost compared to C6i.

Figure 3: Comparing cost for the latest generations of Amazon EC2 compute instances.

Lower is better. Normalized so that cost of x264, preset ultrafast on c6i is equal to one.

The results show the total cost to transcode 1 million input frames in parallel jobs to five output sizes. Each value is a mean of results for three different input files tested. 1 million frames is about 4 hours and 37 minutes at 60 frames per second.

Live stream encoding

For a live streaming use case, we can measure the maximum number of streams for which an instance can maintain full frame rate while transcoding to 3 output sizes. The results below are the number of streams the instance was able to sustain divided by the cost per hour, resulting in 15-35% lower overall cost on C7g vs. C6i. This makes the C7g instance the most cost effective AWS compute instance type for transcoding streaming video.

Figure 5: Results show the hourly cost per video stream at 24FPS, using -preset ultrafast with x264 and x265.
Lower is better.

The changes

The aarch64 version of the scaling functions initially used the reference implementations written in C. After rewriting these C functions in aarch64 assembly, the performance improved significantly. Video scaling is a component of FFmpeg which consistently takes a high percentage of compute time; most encode jobs will include a scaling step, since it is necessary to create multiple outputs to support different device resolutions, both for offline and live streams. All of these changes have been contributed upstream into FFmpeg. See the table below for some of the changes AWS contributed since the 2019 release of FFmpeg version 4.2. In Figure 6, below, are the sample code changes and their effects on the encoding performance on Graviton.

Function name
Speed up
Commit

ff_yuv2planeX_8_neon
1.08x
https://github.com/FFmpeg/FFmpeg/commit/c3a17ffff6b

ff_hscale_8_to_15_neon
1.39x
https://github.com/FFmpeg/FFmpeg/commit/bd831912712

ff_hscale8to15_4_neon
1.38x
https://github.com/FFmpeg/FFmpeg/commit/0ea61725b1b

ff_pix_abs16_neon
7.00x
https://github.com/FFmpeg/FFmpeg/commit/c471cc74747

ff_hscale8to15_X4_neon
4.00x
https://github.com/FFmpeg/FFmpeg/commit/75ffca7eef5

ff_yuv2planeX_8_neon
1.13x
https://github.com/FFmpeg/FFmpeg/commit/3e708722a2d

ff_yuv2planeX_8_neon
2.00x
https://github.com/FFmpeg/FFmpeg/commit/0d7caa5b09b

Through a series of optimizations to the horizontal and vertical scaling functions, as detailed in the pull requests listed here, AWS engineers were able to improve performance for a variety of input cases. After optimizations optimizations and others applied to FFmpeg and to x265, Graviton instances perform better than comparable Amazon EC2 x86 based instances. Comparing C7g instances to C6i instances for the mainline branch of FFmpeg, C7g shows higher performance in every category.

Benchmarking method

To benchmark FFmpeg we used three different test files, each 10 seconds long. One was a high bitrate test with complex motion and lots of high frequency detail changes, another was mostly a still scene and a low bitrate, and a third was a moderate bitrate scene from the open source Tears of Steel film. We transcoded each clip into the five target sizes using multiple parallel jobs intended to simulate a service transcoding many sources in parallel. To increase the stability of the measurements, we also executed multiple iterations of these parallel jobs sequentially. The total time to execute these jobs is then used to calculate frames per second and cost per frame. Results are measured in frames per second and use the number of source frames transcoded, rather than the output frames, since the output consists of many different sizes. All input files are 4K in size and had H.264 encoding. We tested with the following software versions: FFmpeg, 2022-08-23; x264, 2022-06-01; x265, 2022-09-12.

Conclusion

Graviton2 and Graviton3 processors are cost efficient and fast for running video transcoding. With the latest improvements to FFmpeg and codecs, the advantage has only improved. In order to achieve these results for yourself, the first step is to ensure you are running an optimized build from the latest code. There’s a pre-built binary on https://github.com/BtbN/FFmpeg-Builds/releases, a third-party which maintains builds using the latest source code. VT1 and GPU instances can also be a compelling option, especially for live video, but have less flexibility for getting the best quality at a given bit rate than software encoders. If a software encoder is right for your workload, Graviton is a great option.

There is still more work to do for FFmpeg, especially if you are using HDR content with 10 or 12 bit color depth. If you are, and even if you are not, be sure to keep up to date with FFmpeg and codec releases. If you find use cases where FFmpeg on Graviton does not meet expectations, please open an issue on the Graviton Technical Guide to let us know about it. We will continue to add more performance improvements to make Graviton the most cost effective and efficient general purpose processor for video encoding.

Flatlogic Admin Templates banner

Introducing Open Source AWS CardDemo for Mainframe Modernization  

We are excited to share that AWS has open sourced its AWS Mainframe Modernization CardDemo application for use by the community modernizing mainframe applications.

CardDemo is a sample mainframe application. It is designed and developed to test AWS Mainframe Modernization and partner technology for many modernization use cases such as discovery, migration, modernization, performance test, augmentation, service enablement, service extraction, test creation, test harness, etc. It can be used to showcase modernization using patterns such as automated refactor, replatform, and augmentation. Since it is a relatively small standalone application which uses synthetic data, CardDemo enables our customers and partners to test, evaluate, and show solutions without the risk of exposing customer business logic and data to the public.

By open sourcing CardDemo under the Apache 2.0 license, we want to empower builders and foster innovation around the modernization of mainframe applications. With the CardDemo mainframe application, you can quickly experiment with new solutions and integrations. We also want to develop the knowledge of solutions for modernizing mainframe applications by learning from workshops, tutorials, or demonstrations leveraging CardDemo.

In this post, we describe the functions, the technical components and the structure of the CardDemo application. Then we show how to install the application on a mainframe. Finally, we highlight some additional resources that help understand how you can modernize this application using cloud technology and contribute to this open source application.

Overview of CardDemo

CardDemo is a mainframe credit card management application that allows you to view and update account information, credit card data and transactions for a given date range. You interact with the frontend through typical mainframe green screens using one of two personas:

A back-office user who can view and update a restricted set of account and credit card attributes, create transactions manually, and view reports on transactions.
An administrator who can view, add, update, and delete other users.

A series of batch jobs is included in CardDemo. You can run this batch cycle to process transaction data received from a simulated external transaction feed file. The batch posts transactions, performs interest calculations on outstanding balances and produces reports and extracts that it stores for later reference.

To help you get started with CardDemo, we have provided a starter database populated with synthetic data. The key entities in this system are Account, Customers associated with the account and Cards owned by the account. Visit this link to see the entity relationship diagram.

Though it is possible to have many-to-many relationships between these three entities, we have constrained ourselves to a 1:1:1 relationship between them in the first release. This means that one account can be associated with at most one card and one person.

Architecture

The CardDemo application is composed of the following components:

Basic Mapping Support (BMS) maps written using the assembler language macro instructions (macros) to create the screens.

Virtual Storage Access Method to store the data. In addition to VSAM, the application also makes use of physical sequential (PS) datasets and generation data groups (GDGs).

Common Business Oriented Language (COBOL) for the logic to perform edits and execute business processes.

Application Flow

The application has two flows based on the type of user logging in to the CardDemo login screen.

Back-office user flow
Administrator flow

When you login as a back-office user to the application, it displays the main menu which gives you the ability to choose from several options to perform various application functions. As a back-office user you can manage accounts, credit cards, transactions and bill payments.

When you log in as an administrator, the admin menu provides a set of options to list, add, update, and delete users.

Installing CardDemo

Prerequisites

The README file in the CardDemo GitHub repository has detailed step-by-step instructions about how to install CardDemo on a mainframe.

Here is a high-level overview of the installation process.

Retrieve the repository from GitHub and upload the supplied code and data to the mainframe.
Compile screens and batch programs.
Define CICS resources used by CardDemo.
Start the application and begin testing CardDemo.

Step 1: Uploading and organizing the GitHub artifacts on the mainframe

Take the code under the app folder of the repository and upload it to the mainframe.
Upload the provided sample data usinge a binary mode of transfer.
Run jobs listed in Step 3 of the installation process in the README file to re-structure the raw data provided so that it fits the format that CardDemo expects (VSAM, physical sequential files, and GDG groups.)

Step 2: Compiling the programs

Obtain JCLs to compile maps, CICS programs and batch COBOL from your mainframe administrator. We have provided sample JCLs  for reference.

Execute the following steps:

Compile the maps in the BMS folder.
Compile programs starting with CO using your CICS COBOL compile job.
Compile batch programs starting with CB using the COBOL only compiler.

Step 3: Creating CICS Resource Definitions

Once you finish compiling the programs, you need to make CICS aware of the resources created. Follow the instructions under bullet 5 of the README file to do this.

You can either:

Use the batch system definition utility (DFHCSDUP) to import the CardDemo CSD file

Or use the online resource definition program CEDA to define all the transactions, files, and transient storage queues required by CICS.

Step 4: Start using CardDemo

Clear the CICS screen and type CC00 to invoke the CardDemo main menu screen.
If you see a menu screen prompting you for a User ID and Password, you have successfully installed CardDemo.
To use the application as a back-office user, login with id USER0001 and password PASSWORD.
To use the application as an administrator, log in with id ADMIN001 and password PASSWORD.

Overview of programs provided with CardDemo

The README file has an documented the purpose of each program provided with CardDemo.

Here is a brief overview of the functions provided:

Initial Screen: From the frontend perspective, the first transaction that you encounter in CardDemo is the sign-on screen shown previously. This is where you choose whether you are logging in as a back office or admin user.
Main Menu: If you log in as a back-office user, you see a user function menu. You can configure which programs appear in each slot of this menu by editing a copybook.

3. Admin Menu: If on the other hand, you log in as an administrator, you will see a list of administration functions. This too is a configurable menu which is initially setup to show four functions for managing CardDemo users.

Batch: As is typical in mainframe applications, CardDemo has a batch process to perform calculations. CardDemo expects you to stop all online activity and close the files opened to CICS while batch is in progress.

The batch refreshes data in the application files and then posts transactions received to accounts before calculating interest on outstanding balances.

You can find instructions to set up and run this process under the heading Running Full Batch in the README file  and you can review the impact of the batch by comparing the account balances before and after running it.

Congratulations!! You have setup CardDemo and are ready to start with your own use cases.

Conclusion

By open sourcing CardDemo, we want to make it easier for our customers, partners, and for the mainframe community to learn and experiment with mainframe code and to understand solutions for modernizing mainframe applications.

We also invite builders everywhere to add their own extensions and features to this CardDemo code base. If you would like to add features and contribute to the further development of CardDemo, you are welcome to submit your code for inclusion in CardDemo following the guidelines we have provided for you in this contribution guide.

You can learn more about the AWS Mainframe Modernization service and see CardDemo in action in the following webinars:

AWS Mainframe Modernization Blu Age Refactor with CardDemo demonstration
AWS Mainframe Modernization Micro Focus Replatform with CardDemo demonstration

Flatlogic Admin Templates banner

How Contino improved collaboration with Amazon CodeCatalyst

Amazon CodeCatalyst is a modern software development service that empowers teams to deliver software on AWS easily and quickly. CodeCatalyst provides one place where you can plan, code, and build, test, and deploy applications with continuous integration/continuous delivery (CI/CD) tools. It also helps streamlined team collaboration. Developers on modern software teams are usually distributed, work independently, and use disparate tools. Often, ad hoc collaboration is necessary to resolve problems. Today, developers are forced to do this across many tools, which distract developers from their primary task—adding business critical features and enhancing their quality and completeness.

In this post, we explain how Contino uses CodeCatalyst to on-board their engineering team onto new projects, eliminates the overhead of managing disparate tools, and streamlines collaboration among different stakeholders.

The Problem

Contino helps customers migrate their applications to the cloud, and then improves their architecture by taking full advantage of cloud-native features to improve agility, performance, and scalability. This usually involves the build out of a central landing zone platform. A landing zone is a set of standard building blocks that allows customers to automatically create accounts, infrastructure and environments that are pre-configured in line with security policies, compliance guidelines and cloud native best practices. Some features are common to most landing zones, for example creating secure container images, AMIs, and environment setup boilerplate. In order to provide maximum value to the customers, Contino develops in-house versions of such features, incorporating AWS best practices, and later rolls out to the customer’s environment with some customization. Contino’s technical consultants, who are not currently assigned to customer work, collectively known as ‘Squad 0’ work on these features. Squad 0 builds the foundation for the work that will be re-used by other squads that work directly with Contino’s customers. As the technical consultants are typically on Squad 0 for a short period, it is critical that they can be productive in this short time, without spending too much time getting set up.

To build these foundational services, Contino was looking for something more integrated that would allow them to quickly setup development environments, enable collaboration between Squad 0 members, invite other squads to validate foundations services usage for their respective customers, and provide access to different AWS accounts and git repos centrally from one place. Historically, Contino has used disparate tools to achieve this, which meant having to grant/revoke access to the various AWS accounts individually on a continual basis. With these disparate tools, granting access to the tools needed for squads to be productive was non-trivial.

The Solution

It was at this point Contino participated in the private beta for CodeCatalyst prior to the public preview. CodeCatalyst has allowed Contino to move to a structure, as shown in Figure 1 below. A Project Manager at Contino creates a different project for each foundational service and invites Squad 0 members to join the relevant project. With CodeCatalyst, Squad 0 technical consultants use features like CI/CD, source repositories, and issue trackers to build foundational services. This helps eliminate the overhead of managing and integrating developer tools and provides more time to focus on developing code. Once Squad 0 is ready with the foundational services, they invite customer squads using their email address to validate the readiness of the project for use with their customers. Finally, members of Squad 0 use Cloud 9 Dev Environments from within CodeCatalyst to rapidly create consistent cloud development environments, without manual configuration, so they can work on new or multiple projects simultaneously, without conflict.

Figure 1: CodeCatalyst with multiple account connections

Contino uses CI/CD to conduct multi-account deployments. Contino typically does one of two types of deployments: 1. Traditional sequential application deployment that is promoted from one environment to another, for example dev -> test -> prod, and 2. Parallel deployment, for example, a security control that is required to be deployed out into multiple AWS accounts at the same time. CodeCatalyst solves this problem by making it easier to construct workflows using a workflow definition file that can deploy either sequentially or in parallel to multiple AWS accounts. Figure 2 shows parallel deployment.

Figure 2: CI/CD with CodeCatalyst

The Value

CodeCatalyst has reduced the time it takes for members of Squad 0 to complete the necessary on-boarding to work on foundational services from 1.5 days to about 1 hour. These tasks include setting up connections to source repositories, setting up development environments, configuring IAM roles and trust relationships, etc. With support for integrated tools and better collaboration, CodeCatalyst minimized overhead for ad hoc collaboration. Squad 0 could spend more time on writing code to build foundation services. This has led to tasks being completed, on average, 20% faster. This increased productivity led to increased value delivered to Contino’s customers. As Squad 0 is more productive, more foundation services are available for other squads to reuse for their respective customers. Now, Contino’s teams on the ground working directly with customers can re-use these services with some customization for the specific needs of the customer.

Conclusion

Amazon CodeCatalyst brings together everything software development teams need to plan, code, build, test, and deploy applications on AWS into a streamlined, integrated experience. With CodeCatalyst, developers can spend more time developing application features and less time setting up project tools, creating and managing CI/CD pipelines, provisioning and configuring various development environments or coordinating with team members. With CodeCatalyst, the Contino engineers can improve productivity and focus on rapidly developing application code which captures business value for their customers.

About the authors:

Mark Faiers

Mark Faiers started out as a software engineer and later transitioned into DevOps, and Cloud. He has worked across numerous technology stacks and industries, including Healthcare, FinTech, and Logistics. Mark is currently working as an AWS consultant to some of the biggest Financial and Insurance firms in the U.K., as well as running the AWS Practice at Contino. He is especially passionate about serverless, and sustainability.

Chetan Makvana

Chetan Makvana is a senior solutions architect working with global systems integrators at AWS. He works with AWS partners and customers to provide them with architectural guidance for building scalable architecture and execute strategies to drive adoption of AWS services. He is a technology enthusiast and a builder with a core area of interest on serverless and DevOps. Outside of work, he enjoys binge-watching, traveling and music.