Right-size your Kubernetes Applications Using Open Source Goldilocks for Cost Optimization

In the last few years as companies have modernized their business applications, many have moved to microservices based architectures using containers on Kubernetes. A lot of the initial focus was on designing and building new cloud native architectures to support the applications. As environments have grown, we’ve seen a shift in focus to optimize resource allocation and right-size workloads to reduce costs.

In this blog post we will share guidance on how to optimize resource allocation and right-size applications in Kubernetes environments using Goldilocks. We’ll walk through how to install Goldilocks as well as a sample application to view the suggested resource recommendations. This applies to all Kubernetes applications, including those running on Amazon Elastic Kubernetes Service (Amazon EKS), that are deployed with managed node groups, self-managed node groups, and AWS Fargate.

Right-sizing applications on Kubernetes

In Kubernetes, resource right-sizing is done through setting resource specifications in the application manifest. These settings directly impact:

Performance — Kubernetes applications running on the same node will arbitrarily compete for resources without proper resource specifications. This can adversely impact application performance.
Cost Optimization — Applications deployed with oversized resource specifications will result in increased costs and underutilized infrastructure.
Autoscaling — The Kubernetes Cluster Autoscaler and Horizontal Pod Autoscaling require resource specifications to function.

The most common resource specifications in Kubernetes are for CPU and memory requests and limits.

Requests and Limits

Containerized applications are deployed on Kubernetes as Pods. CPU and memory requests and limits are an optional part of the Pod definition. CPU is specified in units of Kubernetes CPUs while memory is specified in bytes, usually as mebibytes (Mi).

Requests and limits each serve different functions in Kubernetes and affect scheduling and resource enforcement differently.

Scheduling

The Kubernetes scheduler only considers requests when determining where to place Pods in your cluster. Acceptable nodes are those that have enough available resources to satisfy the Pod’s resource requests.  Limits are not considered by the scheduler.

Resource Enforcement

The container runtime on the node where your Pods are running is responsible for resource enforcement.  Both requests and limits are factors in ensuring applications have access to their required compute resources. Their effect on CPU and memory is different:

CPU — If no limits are specified, then each Pod on a node can use all the available CPU on the host. As soon as available CPU is exhausted, Pods are throttled using a Linux primitive called cgroups. This is a resource sharing primitive that ensures each Pod gets its fair share of CPU time. CPU requests determine that fair share and are weighted to give more CPU time to Pods with larger CPU requests. If a limit is specified then CPU time will not exceed the specific limit.
Memory — Just like CPU, if no memory limits are specified, then each Pod can use all the available memory on the host. Unlike CPU, when memory is exhausted, there is no sharing mechanism. The Pod will either be terminated by the Linux Out-of-memory (OOM) killer or the kubelet will evict the Pod. The same process will happen if a Pod’s memory usage exceeds its limit.

Vertical Pod Autoscaler

So how do application owners choose the “right” values for their CPU and memory resource requests? An ideal solution is to load test the application in a development environment and measure resource usage using observability tooling. While that might make sense for your organization’s most critical applications, it’s likely not feasible for every containerized application deployed in your cluster.

Fortunately, there is a Kubernetes project that has a feature specifically designed to help provide resource recommendations — the Vertical Pod Autoscaler (VPA). VPA is a Kubernetes sub-project owned by the Autoscaling special interest group (SIG). It’s designed to automatically set Pod requests based on observed application performance. VPA collects resource usage using the Kubernetes Metric Server by default but can be optionally configured to use Prometheus as a data source.

VPA has a recommendation engine that measures application performance and makes sizing recommendations. The VPA recommendation engine can be deployed stand-alone so VPA will not perform any autoscaling actions. It’s configured by creating a VerticalPodAutoscaler custom resource for each application and VPA updates the object’s status field with resource sizing recommendations.

Creating VerticalPodAutoscaler objects for every application in your cluster and trying to read and interpret the JSON results is challenging at scale. Goldilocks is an open source project that makes this easy.

Goldilocks

Goldilocks is an open source project from Fairwinds that is designed to help organizations get their Kubernetes application resource requests “just right”. It takes its name, very appropriately, from the well known fairly tale Goldilocks and the Three Bears. Goldilocks builds on top of the Kubernetes Vertical Pod Autoscaler and provides:

A controller that automates the creation of VerticalPodAutoscaler objects for workloads in your cluster.
A dashboard that displays resource recommendations for all the monitored workloads.

The default configuration of Goldilocks is an opt-in model. You choose which workloads are monitored by adding the goldilocks.fairwinds.com/enabled: true label to a namespace.

Solution Overview

Let’s walk through how to install Goldilocks, including its dependencies Metrics Server and Vertical Pod Autoscaler. Then we’ll install a sample application to view the suggested resource recommendations. The diagram shown here illustrates all of the components on an Amazon EKS cluster and their interactions.

The Metrics Server collects resource metrics from the Kubelet running on worker nodes and exposes them through Metrics API for use by the Vertical Pod Autoscaler. The Goldilocks controller watches for namespaces with the goldilocks.fairwinds.com/enabled: true label and creates VerticalPodAutoscaler objects for each workload in those namespaces.

In this blog post, we will be creating a namespace called javajmx-sample and will be creating a tomcat deployment. We will label this namespace in order to get a recommendation from Goldilocks. As soon as we label the namespace, we will be able to see a VPA object called goldilocks-tomcat-example created.

Prerequisites

You will need the following to complete the steps in this post:

AWS Command Line Interface (AWS CLI) version 2
kubectl
helm
If you don’t have an Amazon EKS cluster, you can create one using the eksctl

Step 1: Deploying the Metrics Server

In this step, we will be deploying the Metrics server which provides the resource metrics to be used by Vertical Pod Autoscaler.

helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server

helm upgrade –install metrics-server metrics-server/metrics-server

Let’s verify the status of the metrics-server. Once successfully deployed, you should be able to see the resource utilization of the deployments within seconds:

kubectl top pods  -n kube-system

NAME                     CPU(cores)   MEMORY(bytes)  
aws-node-czlb8           2m           35Mi            
aws-node-fs22v           3m           35Mi            
aws-node-nl4js           2m           60Mi            
aws-node-vth4m           2m           59Mi            
coredns-d5b9bfc4-lbhb7   4m           13Mi            
coredns-d5b9bfc4-ngtf9   4m           14Mi            
kube-proxy-5gq76         1m           12Mi            
kube-proxy-mvp6g         1m           12Mi            
kube-proxy-vxpw9         1m           33Mi            
kube-proxy-zsfs4         1m           34Mi  

Step 2 : Enable namespaces which needs resource recommendation from Goldilocks

We will be deploying sample workloads in the javajmx-sample namespace and we will get the resource recommendation for the applications running on it. Let’s go ahead and create the namespace and label it.

kubectl create ns javajmx-sample
kubectl label ns javajmx-sample goldilocks.fairwinds.com/enabled=true

To ensure the label was applied successfully, run describe on the javajmx-sample namespace

kubectl describe ns javajmx-sample

Name:         javajmx-sample
Labels:       goldilocks.fairwinds.com/enabled=true
              kubernetes.io/metadata.name=javajmx-sample
Annotations:  <none>
Status:       Active

No resource quota.

No LimitRange resource.

Step 3 : Deploy Goldilocks

We will be using a helm chart to deploy Goldilocks. The deployment creates three objects :

Goldilocks-controller: responsible for creating the VPA objects for the workloads whose namespace is enabled for a Goldilocks recommendation

Goldilocks-vpa-recommender:  responsible for providing the resource recommendations for the workloads

Goldilocks-dashboard: summarizes the resource recommendation of the workloads and will also provide the yaml manifest for implementing the recommendation.

To deploy Goldilocks, run the following helm commands:

helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm upgrade –install goldilocks fairwinds-stable/goldilocks –namespace goldilocks –create-namespace –set vpa.enabled=true

Now, we will use kubectl to verify if the deployment was successful:

NAME                                          READY   STATUS    RESTARTS   AGE
goldilocks-controller-7bc5788596-q752s        1/1     Running   0          18h
goldilocks-dashboard-7ffff8966b-dphmj         1/1     Running   0          18h
goldilocks-dashboard-7ffff8966b-s2dgf         1/1     Running   0          18h
goldilocks-vpa-recommender-5ddf6dcd66-njgt4   1/1     Running   0          18h

Step 4 : Deploy the sample application

In this step, we will be deploying a sample application in the javajmx-sample namespace to get recommendations from Goldilocks. The application tomcat-example  is initially provisioned with a CPU and Memory request of 100m and 180Mi respectively and limits of 300m CPU and 300 Mi Memory.

kubectl apply -f https://raw.githubusercontent.com/aws-observability/aws-o11y-recipes/main/sandbox/javajmx/example/sample-javajmx-app.yaml

nht-admin:~/environment $ kubectl get pods -n javajmx-sample
NAME                              READY   STATUS    RESTARTS   AGE
tomcat-bad-traffic-generator      1/1     Running   0          127m
tomcat-example-5c874c8b8b-zt2tv   1/1     Running   0          127m
tomcat-traffic-generator          1/1     Running   0          127m

As mentioned earlier, Goldilocks will be creating VPAs for each deployment in a Goldilocks enabled namespace. Using the kubectl command, we can verify that a VPA was created in thejavajmx-sample namespace for the goldilocks-tomcat-example:

nht-admin:~/environment $ kubectl get vpa -n javajmx-sample
NAME                        MODE   CPU   MEM         PROVIDED   AGE
goldilocks-tomcat-example   Off    15m   109814751   True       127m

Step 5 : Review the Goldilocks recommendation dashboard

Goldilocks-dashboard will expose the dashboard in the port 8080 and we can access it to get the resource recommendation.  We now run this kubectl command to access the dashboard:

kubectl -n goldilocks port-forward svc/goldilocks-dashboard 8080:80

We can now open a browser to http://localhost:8080 to display the Goldilocks dashboard.

Let’s analyze the javajmx-sample namespace to see the recommendations provided by Goldilocks. We should be able to see the recommendations for the goldilocks-tomcat-example deployment.

Here the screen shows the request and limit recommendations for the javajmx-sample workload. The Current column under each Quality of Service (QoS) indicates the currently configured CPU and Memory request and limits. The Guaranteed and Burstable column under each QoS indicates the recommended CPU and Memory request limits for the respective QoS.

We can clearly notice  that we have over provisioned the resources and Goldilocks has made the recommendations to optimize the CPU and Memory request. The recommended level for CPU request and CPU limit is 15m and 15m compared to the current setting of 100m and 300m for Guaranteed QoS.  Memory request and limits are recommended to be 105M and 105M, compared to the current setting of 180Mi and 300 Mi.

Notice that the recommendations are available for two different Quality of Service (QoS) types: Guaranteed and Burstable. Kubernetes provides different levels of Quality of Service to pods depending on what they request and what limits are set for them. Pods that need to stay up and consistently good can request guaranteed resources, while pods with less exacting requirements can use resources with less or no guarantee.

Guaranteed (QoS) pods are considered top priority and are guaranteed to not be killed until they exceed their limits. If limits, and optionally requests, (not equal to 0) are set for all resources across all containers and limits and requests  are equal, then the pod is classified as Guaranteed.

Burstable (QoS) pods have some form of minimal resource guarantee, but can use more resources when available. Under system memory pressure, these containers are more likely to be killed once they exceed their requests and no Best-Effort pods exist. If requests, and optionally limits, are set (not equal to 0) for one or more resources across one or more containers, and they are not equal, then the pod is classified as Burstable.

To follow the recommended resource specification, customers can simply copy  the respective manifest file for the QoS class they are interested in and deploy the workloads which will then be right-sized and optimized.

For example, if we decide to apply the recommendations for the Guaranteed QoS, we could copy the YAML from the dashboard as shown here and apply them to the deployment object:

Let’s run the kubectl edit command to the deployment to apply the recommendations:

kubectl edit deployment tomcat-example -n javajmx-sample

The resources section in the containers spec  shows that we have successfully applied the recommendation of request and limits for CPU, and memory:

Once we apply the recommendations, we should be able to verify that the pod is trying to restart and come online with the updated resource configuration. Let’s verify the same by running the kubectl describe  command on the tomcat-example deployment:

kubectl describe deployment tomcat-example -n javajmx-sample

The output should look like the following:

Name:                   tomcat-example
Namespace:              javajmx-sample
CreationTimestamp:      Mon, 06 Feb 2023 17:41:38 +0000
Labels:                 <none>
Annotations:            deployment.kubernetes.io/revision: 2
Selector:               app=tomcat-example-pods
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=tomcat-example-pods
  Containers:
   tomcat-example-pod:
    Image:       public.ecr.aws/u6p4l7a1/sample-java-jmx-app:latest
    Ports:       8080/TCP, 9404/TCP
    Host Ports:  0/TCP, 0/TCP
    Limits:
      cpu:     15m
      memory:  105Mi
    Requests:
      cpu:        15m
      memory:     105Mi
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>

Cleanup

To delete the deployments and sample workloads we created in the blog, execute the following commands:

helm delete metrics-server
helm delete goldilocks -n goldilocks
kubectl delete -f https://raw.githubusercontent.com/aws-observability/aws-o11y-recipes/main/sandbox/javajmx/example/sample-javajmx-app.yaml

Conclusion

This post demonstrated how Goldilocks can be used to efficiently rightsize the resource requests for Kubernetes applications. Customers in modernization efforts often have minimal time to decide the resource requirements for their applications, which usually involves a complex process of reviewing monitoring dashboards. By adopting the recommendations from Goldilocks, customers can shorten the time to market for their applications and optimize their Amazon EKS costs.

Further reading

EKS Best practices
Blog: Using Prometheus to Avoid Disasters with Kubernetes CPU Limits

Goldilocks project

Flatlogic Admin Templates banner

Run Open Source FFMPEG at Lower Cost and Better Performance on a VT1 Instance for VOD Encoding Workloads 

FFmpeg is an open source tool commonly used by media technology companies to encode and transcode video and audio formats. FFmpeg users can leverage a cost efficient Amazon Web Services (AWS) instance for their video on demand (VOD) encoding workloads now that AWS offers VT1 support on Amazon Elastic Compute Cloud (Amazon EC2).

VT1 offers improved visual quality for 4K video, support for a newer version of FFmpeg (4.4), expanded OS/kernel support , and bug fixes. These instances are powered by the AMD-Xilinx Alveo U30 media accelerator. Xilinx has the ability to add a single line  into FFmpeg, and enable the Alveo U30 to do the transcoding work. The Xilinx Video SDK includes an enhanced version of FFmpeg that can communicate with the hardware accelerated transcode pipeline in Xilinx devices to deliver up to 30% lower cost per stream than Amazon EC2 GPU-based instances and up to 60% lower cost per stream than Amazon EC2 CPU-based instances.

Companies typically use EC2 CPU instances such as the C5 and C6 coupled with FFmpeg for their VOD encoding workloads. These workloads can be costly in cases where companies encode thousands of VOD assets. The cost of an EC2 workload is influenced by the number of concurrent encoding jobs that an instance can support and this subsequently affects the time it takes to encode targeted outputs. As VOD libraries expand, companies typically auto scale to increase the size or number of C5 and C6 instances or allow the instances to operate longer. In both cases, these workloads experience an increase in cost. Important note: There is no additional charge for AWS Auto Scaling. You pay only for the AWS resources needed to run your applications and Amazon CloudWatch monitoring fees.

Amazon EC2 VT1 instances are designed to accelerate real-time video transcoding and deliver low-cost transcoding for live video streams. VT1 is also a cost effective and performance-enhancing alternative for VOD encoding workloads. Using FFmpeg as the transcoding tool, AWS performed an evaluation of VT1, C5, and C6 instances to compare price performance and speed of encode for VOD assets. When compared to C5 and C6 instances, VT1 instances can achieve up to 75% cost savings. The results show that you could operate two VT1 instances for the price of one C5 or C6 instance. Additionally, the  .

Benchmarking Method

First let’s determine the best instance type to use for our VOD workload. C5 and C6 instances are commonly used for transcoding. We used  and C6i.8xl instances and compared them against VT1.3xl instances to transcode 4K and 1080p VOD assets. The two assets were encoded into the output targets as shown  . The VT1.3xl, C5.9xl, and C6i.8xl output targets were measured against the amount of time it took to complete the encode.

As shown here in the screenshot from the AWS console for various instance types, VT1.3xl is the smallest instance type in the VT1 family. Even though VT1.6xl compares closely to C5 and C6 in terms of CPU/memory, we chose VT1.3xl for a closer price/performance comparison.

VT1 family instance type comparison to C5 in terms of CPU/memory.

Input data points

Sample input content

The following table summarizes the key parameters of the source content video files used in measuring the encoding performance for the benchmarking

Clip Name
Frame Count
Duration
Frame Rate
Codec
Resolution
Chroma Sampling

1080p
43092
12 mins
60
H.264, High Profile
1920×1080
4:2:0 YUV

4K
776
13 secs
60
H.264, High Profile
3840×2160
4:2:0 YUV

Evaluation Adaptive Bit Rate (ABR ) targets

Adaptive bitrate streaming (ABR or ABS) is technology designed to stream files efficiently over HTTP networks. Multiple files of the same content, in different size files, are offered to a user’s video player, and the client chooses the most suitable file to play back on the device. This involves transcoding a single input stream to multiple output formats optimized for different viewing resolutions.

For the benchmarking tests the input 4K and 1080K files were transcoded to various target resolutions that can be used to support different device and network capabilities at their native resolution: 1080p, 720p, 540p, and 360p. The bitrate (br) in the graphic shown here indicates the bitrate associated with each pixel. For example a 4K input file was transcoded to 360p resolution and 640 bitrate.

Figure 1: Adaptive bitrate ladder used to transcode output video files. Source: https://developer.att.com/video-optimizer/docs/best-practices/adaptive-bitrate-video-streaming

Output results

Target duration analysis

The VT1.3xl instance completed the targeted encodes 15.709 seconds faster than the C5.9xl instance and 12.58 seconds faster than the C6.8xl instance. The results in the following charts detail that the VT1.3xl instance has better speed and price performance when compared to the C5.9xl and C6i.8xl instances.

% Price Performance = {(C5/C6 Price Performance – VT1 Price Performance) /C5/C6 Price Performance} * 100

% Speed = { (C5/C6 duration – VT1 duration) /C5/C6 duration } * 100

H.264 4K Clip (3 seconds duration)
Instance Type

VT1.3xl
C5.9xl
C6i.8xl

Codec
mpsoc_vcu_h264
x264
x264

Duration to complete ABR targets (See Evaluation Targets Below)
14.47632
30.186221
27.05856

Speed‍ Compared to VT1.3xl (%)
N/A
52.043284 %
46.500035 %

Instance Cost ($/hour)
$0.65
$1.53
$1.22

Instance Cost ($/second)
$0.00018
$0.00043
$0.000338

Price Performance: $/(clip transcoded)
$0.002605
$0.01298
$0.009145

Price Performance Compared to VT1.3xl (%) ‍
N/A
79.930662 %
71.514488 %

H.264 1080p Clip (12 minutes duration)
Instance Type

VT1.3xl
C5.9xl
C6i.8xl

Codec
mpsoc_vcu_h264
x264
x264

Duration to complete ABR targets (See Evaluation Targets Below)
490.82074
837.20001
762.63252

Speed‍ Compared to VT1.3xl (%)‍
N/A
41.373538 %
35.641252 %

Instance Cost ($/hour)
$0.65
$1.53
$1.22

Instance Cost ($/second)
$0.00018
$0.00043
$0.000338

Price Performance: $/(clip transcoded)
$0.0883
$0.3599
$0.2577

Price Performance Compared to VT1.3xl (%)‍
N/A
75.465407 %
65.735351 %

The following section explains the encoding parameters used for testing. FFmpeg was installed on 1 x C5.9xl, 1x C6i.8xl, and 1 x VT1.3xl instances.  The two input files as mentioned in sample input files were run in parallel on each instance type, and the total duration to complete the transcoding to various output target resolutions was calculated.

Technical specifications

EC2 Instances: 1 x C5.9xl, 1x C6i.8xl, 1 x VT1.3xl
Video framework: FFmpeg
Video codecs: x264 (CPU), XMA (Xilinx U30)
Quality objective: x264 faster
Operating system

For C5 and C6 – x264, Amazon Linux 2 (Linux kernel 4.14)
For VT1- Xilinx: Amazon Linux 2 (Linux kernel 5.4.0-1038-aws)

Video codec settings for encoding performance tests

Instance
C5, C6
VT1

Codec
x264
mpsoc_vcu_h264

Preset
Faster
n/a

Output Bitrate (CBR)
See Evaluation Targets above
See Evaluation Targets above

Chroma Subsampling
YUV 4:2:0
YUV 4:2:0

Color Bit Depth
8 bits
8 bits

Profile
High
n/a

Conclusion

Amazon VT1 EC2 instances are typically used for live real-time encoding; however, this blog post demonstrates VT1 VOD encoding  and price performance advantages when compared to C5 and C6 EC2 instances. VT1 instances can encode VOD assets up to 52% faster, and achieve up to 75% reduction in cost when compared to C5 and C6 instances. VT1 is best utilized in workloads with VOD encoding jobs that require low encoding speeds in the time it takes to complete outputs. Please visit the Amazon EC2 VT1 instances page for more details.

Flatlogic Admin Templates banner