Behind the Scenes on AWS Contributions to Cloud Native Open Source Projects

Amazon Elastic Kubernetes Service (Amazon EKS) is well known in the Kubernetes community. But few realize that AWS engineers are closely involved and contributing upstream to Kubernetes and to many more cloud native open source projects.

In the past year alone, AWS contributed significantly to containerd, Cortex, etcd, Fluentd, nerdctl, Notary, OpenTelemetry, Thanos, and Tinkerbell. We employ maintainers and contributors on these projects and we will contribute more to these and other projects in the coming year. Here’s a behind-the-scenes look at our contributions and why we’re investing in the open source projects we support. You can also meet many of our contributors in the AWS booth at KubeCon Europe in Amsterdam, April 18-21, 2023 and hear from them in our virtual Container Day event 9 a.m. – 4 p.m. CEST on April 18.

“Amazon EKS is committed to open source and we are spending a lot of our cycles now focused on contributing back to the community. Kubernetes is part of a community that’s bigger than AWS and so we’re continuing to be committed to maintaining and helping that community to be successful because without it, we wouldn’t exist, either,” said Barry Cooks, Vice President, Kubernetes, at AWS and a Cloud Native Computing Foundation (CNCF) governing board member.

AWS contributes to Kubernetes and Etcd

Today, AWS is heavily involved in open source, cloud native projects. Consider, for example, some of our recent key contributions to Kubernetes and etcd, the underlying data store for Kubernetes.

“We’re building the AWS cloud provider, contributing to CAPI (cluster API), and serve as part of the security response committee. We helped implement gzip optimization which improves the performance of Kubernetes clients,” said Nathan Taber who leads the product team for Kubernetes at AWS, in a keynote at KubeCon North America 2022. “With etcd we’re bringing our operational learnings from running just so much etcd at scale, back into the community.”

The AWS cloud provider for Kubernetes is the open source interface between a Kubernetes cluster and AWS service APIs. This project allows a Kubernetes cluster to provision, monitor, and remove AWS resources necessary for operation of the cluster.

As of Kubernetes 1.27, AWS has just finished a multi-year effort to migrate our legacy cloud provider out of tree to an external cloud provider. The cloud provider migration reduces binary bloat in the main kubernetes/kubernetes (k/k) repository, as well as reduces dependency complexity and the surface area for security vulnerabilities.

AWS has also built a webhook framework that allows cloud providers to host webhooks in their cloud-controller-managers, which makes certain migration tasks easier. One use case for this is helping other cloud providers to migrate the persistent volume labeller admission controllers from the API server code, which is one of the last areas of cloud provider specific code that needs to be migrated out of core Kubernetes.

“We’ve included a lot of space in our planning for upstream open source work this year,” said Nick Turner, software developer on the AWS Kubernetes team and a chair in Kubernetes SIG-cloud-provider. “Expect us to keep up our contributions to the cloud provider and the load balancer controller as well as increase our investments in the AWS IAM authenticator for Kubernetes and KMS encryption provider.”

These and other Kubernetes contributions bring value to the entire Kubernetes community as well as to the EKS service and its customers.

Since KubeCon Detroit last fall, the EKS-etcd team has contributed numerous improvements to etcd. Chao Chen contributed to the effort to improve testing mechanisms for etcd by unifying the test frameworks used by etcd tests. Baoming Wang contributed an important metric to the Kubernetes API server code base which will help catch data corruption issues early. We’ve also worked on building a linearizability test suite, made various improvements to the core etcd database and the etcd backend database Bolt-DB, contributed to documentation, made helm more resilient to etcd side transient errors, and fixed an issue with the installation script for argo-cd-helmfile.

What’s driving AWS to contribute more to cloud native open source

Like most modern companies, AWS builds many of its services with open source components. There are several business and technical reasons we do this, which we’ve outlined in an article on The New Stack about why we invest in sustainable open source. We recognize that the success of our services depends on the success of those underlying open source projects.

Given that most of the open source projects that AWS supports underpin specific services, AWS tasks all engineers working in services, regardless of their assigned sub-service teams, to contribute in any way that they can to those upstream projects.

The result is a virtuous cycle that promotes mutually beneficial growth. As AWS services grow, so too do the open source projects upon which they are based because of AWS contributions and support. Conversely, as these open source projects grow from the contributions of other companies and developers, so do the benefits to the AWS services that depend upon them.

AWS contributions focus on performance and scale

AWS contributions to open source typically come as a practical matter in the form of bug fixes, code reviews, documentation, new features, or security enhancements. Like many developers working in the open source space, AWS engineers often work to address issues that arise in the course of their day jobs and then share the fixes with the rest of the open source community. Similarly, new features for an open source project are developed by AWS engineers to expand the project’s scale or performance which in turn increases the project’s usability, stability, and overall appeal.

Because AWS has a large number of Kubernetes clusters under management, it affords AWS a unique opportunity to test the limitations of open source software and build its edges stronger and further out from its initial core. So many of the contributions that our team members do for upstream Kubernetes, etcd, containerd, and other projects center on making sure that we provide insights to the upstream community on where things break down in scaling, production, and operational readiness.

The resulting insights provide value for the entire open source community as well as our own customers.

Take for example, the lag fix that curiously performed as a latency expander. AWS engineer Shyam Jeedigunta, was looking at the logs and metrics collected from thousands of production EKS clusters. He determined that Gzip compression is enabled inside the Kubernetes API server to reduce the demand on network bandwidth and to decrease latency.  However, the compression was actually increasing the latency for large list requests made by clients to the Kubernetes API server. Shyam, who is also co-chair of the Kubernetes scalability special interest group (SIG), took a deep dive into the issue to investigate whether a particular compression level created the problem and if so, could the compression level be reduced? Could Gzip compression be disabled entirely? What impact would that have on latency and network bandwidth?

Answers to questions like this one lead to contributions upstream in etcd and core Kubernetes from AWS service teams. Customers and others often report these kinds of issues to the project as well, but the nature of the problem isn’t clear until it’s viewed on 1,000 nodes and 200,000 objects of a certain kind. AWS engineers diagnose what’s going on, put together troubleshooting information, and collate information into proposals on how to fix the problem(s) to upstream to Kubernetes. AWS likes to spearhead fixing issues that arise from running the projects at scale.

Key AWS contributions

AWS contributes to many Kubernetes sub projects and SIGs. For example, Micah Hausler and Sri Saran Balaji Vellore Rajakumar serve on the Kubernetes Security Response Committee (SRC), Davanum Srinivas (Dims) chairs SIG-Architecture and SIG-k8s-infra, and Nick Turner is a chair in SIG-cloud-provider.  Key contributions have gone into projects including containerd, Cortex, cdk8s, CNI, nerdctl and Prometheus. Innovations have also been substantial and include TorchServe, improved ARM support through AWS Graviton, and the Virtual GPU plugin. However, this is not an exhaustive or complete list of AWS contributions and innovations in the cloud native community.

On containerd, for example, AWS employs two maintainers who contribute features and help ensure the project’s general health and security. Key contributions from AWS engineers to the containerd project include OpenTelemetry integration in the 1.7.0 release, improved tracing, and improved fuzzing integration.

“It’s been awesome to see the growth on the container runtime team here at AWS these past few years. I love to see the eagerness to learn not just *how* to contribute, but how to do it well and really benefit the broader community,” said Phil Estes, a principal engineer at AWS and a containerd maintainer.

Nerdctl, a Docker-compatible CLI for containerd and a containerd sub-project, is used by other open source projects Lima, Finch, and Rancher Desktop. AWS engineers significantly improved nerdctl’s compose support by adding 11 out of 13 missing compose commands. We enhanced nerdctl’s image signing/verification support by contributing cosign support for nerdctl compose, and notation support for nerdctl. And engineer Jin Dong recently became the first reviewer for the project from AWS.

AWS services are also standardizing on OpenTelemetry, a set of open source tools and standards for collecting metrics, logs, and traces to measure application performance. AWS Distro for OpenTelemetry (ADOT), OpenSearch, and CloudWatch are all building on OpenTelemetry and contribute back to the upstream project. All ADOT code is 100% open source and contributed upstream. Key contributions include: adding functionality to upstream observability components such as OpenTelemetry language SDKs, collectors, and agents.

“Amazon is the fourth largest contributor to OpenTelemetry with a dedicated maintainer and many contributors working on the project. A key contribution has been improving collector and metric stability, including improved Prometheus interoperability with OpenTelemetry,” said Taber.

A fourth example is Cortex where AWS is the top supporter of the project and employs three maintainers. As AWS runs this project at scale, engineers have the opportunity to identify and fix scaling cliffs before they become a problem for the rest of the community. Some of the key contributions are new features and performance improvements. Examples include partition compactor, Ring DynamoDB Multikey KV, out of order samples ingestion, snappy-block gRPC compression, ARM images, and Thanos PromQL engine integration.

We have also contributed bug fixes to Thanos, a tool for setting up highly available Prometheus instances with long term storage. Thanos is a CNCF incubating project which Cortex depends on. We participated in the development of the new Thanos PromQL engine and open sourced a tool that could use fuzzing for correctness testing which has already caught a few bugs.

AWS employs four maintainers on Tinkerbell, a cloud native open source bare metal provisioning engine for EKS Anywhere and a CNCF Sandbox project. Key contributions include organizing the project roadmap, VLAN support, a Kubernetes native backend, out-of-band management Kubernetes controller, Helm Chart deployment, and Cluster API provider updates.

“Our team has done a lot of work to update the Tinkerbell backend from Postgres to native Kubernetes,” said Taber.

AWS employs three maintainers in Notation, a sub project of Notary under the CNCF, and is the third largest code contributor to Notary. Notation enables the generation of cryptographic signatures for container images so users can verify that they come from a trusted source or process. AWS founded the sub project with other contributors to come up with specifications for signature format, generation, verification, and revocation. As part of this work we also defined a process for evaluating signature envelope formats like COSE ensuring that they met a high security bar before they were used in Notation.

AWS employees have either written or reviewed the majority of code contributions for the core Notation libraries and a CLI. AWS also employs a maintainer to Ratify so Kubernetes users can easily enable policies for signature verification with their existing admissions controllers. Similarly we also employ a maintainer to ORAS so signatures can easily be pushed to OCI registries. Notation enables users to define granular trust policies for defining which sources they want to trust, balance deployment safety and security needs, and flexibility on secure signing key storage options.

We have contributed to many other open source projects as well, including Crossplane, for which AWS added support for EKS IRSA in the China region and fixed Amazon Route 53 wildcard support, and Backstage, with AWS Proton and AWS Code Suite (AWS CodeBuild, AWS CodePipeline, and AWS CodeDeploy).

“We’re very excited about doing more development in the open, sharing that with our customers, and working directly in some cases with customers on their needs in open source projects and working together to make the community stronger in the Kubernetes space,” Cooks said.

AWS is open

We want to hear from you. AWS engineers are open to helping community members through collaboration and contribution opportunities. Tell us how we can help meet your needs.

AWS engineers, solutions architects, and product managers are hanging out on the Kubernetes community and the CNCF community Slack channels. Channels where you can reach out to us include the provider AWS channel and Karpenter channel, and the AWS controllers for Kubernetes channel on the Kubernetes Slack.

Find us and tell us what you’d like us to work on. Or if you have a particular issue that you found in one of these upstream projects that you think our engineers can help move the needle on. Come find us and talk to us in the CNCF’s AWS Slack channel and join us for our virtual Container Day on April 18, before KubeCon EU.

Flatlogic Admin Templates banner

React Labs: What We’ve Been Working On – June 2022

React 18 was years in the making, and with it brought valuable lessons for the React team. Its release was the result of many years of research and exploring many paths. Some of those paths were successful; many more were dead-ends that led to new insights. One lesson we’ve learned is that it’s frustrating for the community to wait for new features without having insight into these paths that we’re exploring.

We typically have a number of projects being worked on at any time, ranging from the more experimental to the clearly defined. Looking ahead, we’d like to start regularly sharing more about what we’ve been working on with the community across these projects.

To set expectations, this is not a roadmap with clear timelines. Many of these projects are under active research and are difficult to put concrete ship dates on. They may possibly never even ship in their current iteration depending on what we learn. Instead, we want to share with you the problem spaces we’re actively thinking about, and what we’ve learned so far.

Server Components

We announced an experimental demo of React Server Components (RSC) in December 2020. Since then we’ve been finishing up its dependencies in React 18, and working on changes inspired by experimental feedback.

In particular, we’re abandoning the idea of having forked I/O libraries (eg react-fetch), and instead adopting an async/await model for better compatibility. This doesn’t technically block RSC’s release because you can also use routers for data fetching. Another change is that we’re also moving away from the file extension approach in favor of annotating boundaries.

We’re working together with Vercel and Shopify to unify bundler support for shared semantics in both Webpack and Vite. Before launch, we want to make sure that the semantics of RSCs are the same across the whole React ecosystem. This is the major blocker for reaching stable.

Asset Loading

Currently, assets like scripts, external styles, fonts, and images are typically preloaded and loaded using external systems. This can make it tricky to coordinate across new environments like streaming, server components, and more.
We’re looking at adding APIs to preload and load deduplicated external assets through React APIs that work in all React environments.

We’re also looking at having these support Suspense so you can have images, CSS, and fonts that block display until they’re loaded but don’t block streaming and concurrent rendering. This can help avoid “popcorning“ as the visuals pop and layout shifts.

Static Server Rendering Optimizations

Static Site Generation (SSG) and Incremental Static Regeneration (ISR) are great ways to get performance for cacheable pages, but we think we can add features to improve performance of dynamic Server Side Rendering (SSR) – especially when most but not all of the content is cacheable. We’re exploring ways to optimize server rendering utilizing compilation and static passes.

React Optimizing Compiler

We gave an early preview of React Forget at React Conf 2021. It’s a compiler that automatically generates the equivalent of useMemo and useCallback calls to minimize the cost of re-rendering, while retaining React’s programming model.

Recently, we finished a rewrite of the compiler to make it more reliable and capable. This new architecture allows us to analyze and memoize more complex patterns such as the use of local mutations, and opens up many new compile-time optimization opportunities beyond just being on par with memoization hooks.

We’re also working on a playground for exploring many aspects of the compiler. While the goal of the playground is to make development of the compiler easier, we think that it will make it easier to try it out and build intuition for what the compiler does. It reveals various insights into how it works under the hood, and live renders the compiler’s outputs as you type. This will be shipped together with the compiler when it’s released.

Offscreen

Today, if you want to hide and show a component, you have two options. One is to add or remove it from the tree completely. The problem with this approach is that the state of your UI is lost each time you unmount, including state stored in the DOM, like scroll position.

The other option is to keep the component mounted and toggle the appearance visually using CSS. This preserves the state of your UI, but it comes at a performance cost, because React must keep rendering the hidden component and all of its children whenever it receives new updates.

Offscreen introduces a third option: hide the UI visually, but deprioritize its content. The idea is similar in spirit to the content-visibility CSS property: when content is hidden, it doesn’t need to stay in sync with the rest of the UI. React can defer the rendering work until the rest of the app is idle, or until the content becomes visible again.

Offscreen is a low level capability that unlocks high level features. Similar to React’s other concurrent features like startTransition, in most cases you won’t interact with the Offscreen API directly, but instead via an opinionated framework to implement patterns like:

Instant transitions. Some routing frameworks already prefetch data to speed up subsequent navigations, like when hovering over a link. With Offscreen, they’ll also be able to prerender the next screen in the background.

Reusable state. Similarly, when navigating between routes or tabs, you can use Offscreen to preserve the state of the previous screen so you can switch back and pick up where you left off.

Virtualized list rendering. When displaying large lists of items, virtualized list frameworks will prerender more rows than are currently visible. You can use Offscreen to prerender the hidden rows at a lower priority than the visible items in the list.

Backgrounded content. We’re also exploring a related feature for deprioritizing content in the background without hiding it, like when displaying a modal overlay.

Transition Tracing

Currently, React has two profiling tools. The original Profiler shows an overview of all the commits in a profiling session. For each commit, it also shows all components that rendered and the amount of time it took for them to render. We also have a beta version of a Timeline Profiler introduced in React 18 that shows when components schedule updates and when React works on these updates. Both of these profilers help developers identify performance problems in their code.

We’ve realized that developers don’t find knowing about individual slow commits or components out of context that useful. It’s more useful to know about what actually causes the slow commits. And that developers want to be able to track specific interactions (eg a button click, an initial load, or a page navigation) to watch for performance regressions and to understand why an interaction was slow and how to fix it.

We previously tried to solve this issue by creating an Interaction Tracing API, but it had some fundamental design flaws that reduced the accuracy of tracking why an interaction was slow and sometimes resulted in interactions never ending. We ended up removing this API because of these issues.

We are working on a new version for the Interaction Tracing API (tentatively called Transition Tracing because it is initiated via startTransition) that solves these problems.

New React Docs

Last year, we announced the beta version of the new React documentation website. The new learning materials teach Hooks first and has new diagrams, illustrations, as well as many interactive examples and challenges. We took a break from that work to focus on the React 18 release, but now that React 18 is out, we’re actively working to finish and ship the new documentation.

We are currently writing a detailed section about effects, as we’ve heard that is one of the more challenging topics for both new and experienced React users. Synchronizing with Effects is the first published page in the series, and there are more to come in the following weeks. When we first started writing a detailed section about effects, we’ve realized that many common effect patterns can be simplified by adding a new primitive to React. We’ve shared some initial thoughts on that in the useEvent RFC. It is currently in early research, and we are still iterating on the idea. We appreciate the community’s comments on the RFC so far, as well as the feedback and contributions to the ongoing documentation rewrite. We’d specifically like to thank Harish Kumar for submitting and reviewing many improvements to the new website implementation.

Thanks to Sophie Alpert for reviewing this blog post!Flatlogic Admin Templates banner

367: With Micah Godbolt

I got to talk with Micah Godbolt this week! Micah is is a long-hauler at Microsoft working on Design Systems and such. His CodePen account looks a lot like mine: steady consistent usage of “just trying to figure something out” Pens sprinkled with some ideas that somehow seem to click with the wider front-end world. I found it fascinating that putting the word “Design Systems” into his book title “Front-end Architecture for Design Systems” was suggested by the publisher, and they were right! Turns out the term Design Systems clicked a lot harder since the 2016 publication and I’m sure hasn’t hurt sales!

Time Jumps

00:35 Guest introduction

02:46 Front end architecture for design systems

06:31 Building design systems

10:23 Sponsor: Linode

11:08 You don’t need a UI framework

17:49 Responsive multi-level nav pen

19:30 Multi height equal column pen

23:01 How do you ship components?

27:00 Testing for bugs

28:41 Consistently making pens

35:13 Creating a stripped down use case

40:06 Where can people find out more

Sponsor: Linode

Visit linode.com/codepen and see why over a million developers trust Linode for the infrastructure.  From their award-winning support (offered 24/7/365 to every level of user) to ease of use and set up; it’s clear why developers have been trusting Linode for projects both big and small since 2003.   Linode offers the industry’s best price-to-performance value for all compute instances, including shared, dedicated, high memory, and GPUs. Linode makes cloud computing simple, affordable, and accessible allowing you to focus on your customers, not your infrastructure.  Visit linode.com/codepen, create a free account and you’ll get $100 in credit.

The post 367: With Micah Godbolt appeared first on CodePen Blog.Flatlogic Admin Templates banner