Dive Deeper into Data Lake for Nonprofits, a New Open Source Solution from AWS for Salesforce for Nonprofits

Nonprofits are using cloud-based solutions for fundraising, donor and member management, and communications. With this move online, they have access to more data than ever. This data has the potential to transform their missions and increase their impact. However, sharing, connecting, and interpreting data from many different sources can be a challenge. To address this challenge, Amazon Web Services (AWS) and AWS Partner Salesforce for Nonprofits announced the general availability of Data Lake for Nonprofits – Powered by AWS.

Data Lake for Nonprofits is an open source application that helps nonprofit organizations set up a data lake in their AWS account and populate it with the data that they have in the Salesforce Non Profit Success Pack (NPSP) schema. This data resides in Amazon Relational Database Service (Amazon RDS), where it can be accessed by other AWS services like Amazon Redshift and Amazon QuickSight, as well as Business Intelligence products such as Tableau.

This post is written for developers and solution integrator partners who want to get a closer look at the architecture and implementation of Data Lake for Nonprofits to better understand the solution.  We’ll walk you through the architecture and how to set up a data lake using AWS Amplify.

Prerequisites

To follow along with this walkthrough, you must have the following prerequisites:

An AWS Account

An AWS Identity and Access Management (IAM) user with Administrator permissions that enables you to interact with your AWS account
A Salesforce account that has the Non Profit Success Pack managed packages installed
A Salesforce user that has API permissions to interact with your Salesforce org

Git command line interface installed on your computer for cloning the repository

Node.js and Yarn Package Manager installed on your computer for cleanup

Solution Overview

The following diagram shows the solution’s high-level architecture.

Salesforce and AWS have released the source code in GitHub under a BSD 3-Clause license so nonprofits and their cloud partners can use, customize, and innovate on top of it at no cost.

Use your command-line shell to clone the GitHub repository for your own development.

git clone https://github.com/salesforce-misc/Data-Lake-for-Nonprofits

The solution consists of two tiers which are frontend and backend applications. In the following sections, we’ll walk you through both architectures and show you how to install a data lake.

Frontend Walkthrough

We have developed the frontend application using React with Typescript, and it has three layers.

The view layer comprises React components.
The model layer is where the most of the application logic is maintained. The models are Mobx State Tree (MST) types with properties, actions, and computed values. Relationships between models are expressed as MST trees.
The API layer is used to communicate with AWS Services using AWS SDK for JavaScript v3.

The frontend application has been built and zipped for AWS Amplify using GitHub Workflows. This is done for every change in the repository. The latest build can be found in the GitHub repository.

Backend Walkthrough

We have developed the backend application using using AWS CloudFormation templates that can be found under the infra/cf folder in the GitHub repository. These templates will provision the resources on your AWS account for your data lake as below.

vpc.yaml template is used to provision the Amazon Virtual Private Cloud (Amazon VPC) that the application will be running in.

buckets.yaml template is used to provision several Amazon Simple Storage Service (Amazon S3) buckets.

datastore.yaml is used to provision an Amazon Relational Database Service (Amazon RDS) PostgreSQL database.

athena.yaml is used to provision Amazon Athena with a custom workgroup.

step_function.yaml is used to provision AWS Step Functions and related AWS Lambda functions. Step Functions is used to orchestrate the Lambda functions that are going to import your data from your Salesforce account into Amazon RDS for PostgreSQL using an Amazon AppFlow connection.

Lambda functions can be found in the infra/lambdasfolder.

src/cleanupSQL.ts performs the movement of data from the data-loading database schema to the public database schema.

src/filterSchemaListing.ts filters out S3’s “schema/” key as well as queries Salesforce for deleted objects.

src/finalizeSQL.ts drops tables that will be no longer needed in the application to save cost.

src/listEntities.ts calls APIs since the output is too large for Step Functions.

src/processImport.ts imports a Lambda function which writes to RDS the data which is sent to the Amazon SQS queue.

src/pullNewSchema.ts queries Amazon AppFlow and then Salesforce to gather any new or updated fields.

src/setupSQL.ts sets up the data-loading database schema and creates the tables based on the schema file.

src/statusReport.ts performs a status update to S3 based on where it is in the Step Functions State Machine.

src/updateFlowSchema.ts uses the updated schema file on S3 to create or update the Amazon AppFlow flow.

Amazon Simple Queue Service (Amazon SQS) is used to queue the import data so that the Lambda function can use it.

Amazon EventBridge is used to set up a job that syncs your data based on your choice of frequency.

Amazon CloudWatch is used to keep logs during installation as well as synchronization. The application creates a CloudWatch dashboard to track the usage of the data lake.

Deploy the Frontend Application using AWS Amplify

The latest release of the frontend application can be downloaded from the GitHub repository and deployed in AWS Amplify in your AWS account as explained in the User Guide. It typically takes a few minutes to deploy the frontend application, after which a URL to the frontend application is presented. The URL should look like the link here:

https://abc.xyz….amplifyapp.com

When you open the URL, you will see the frontend application, which provides a wizard-like guide to the steps. Each step will guide you through the instructions on how to move forward.

Step 1 will ask for an Access Key ID and Secret Access key for an IAM user of your AWS account. The application shows guidance on how to log in to AWS Management Console and use Identity and Access Management (IAM) to create the IAM user with admin permissions.

This step also requires you to select the AWS region where you would like to create the Amazon AppFlow connection and deploy the data lake.

Step 2 establishes the connection to your Salesforce account. The application guides the user to leverage Amazon AppFlow, which allows AWS to connect to your Salesforce account.

At the end of the page, use the drop down menu to select the connection name and click Next.

Step 3 will help you choose the data objects and set the frequency of data synchronization for your data lake. The data import option can be set to any date and time, and you can choose the frequency from the given options: daily/weekly/monthly.

This page further displays the complete set of objects from your Salesforce account. Choose the necessary objects that you want to import into your data lake.

In Step 4, you can review the data lake configuration and confirm. This step is where you are allowed to go back to the previous step to make any changes if needed.

Step 5 is where the data lake is provisioned, and then your data is imported. This can take half an hour to several hours, depending on the size of the data in your Salesforce account.

Once the data lake is ready, in Step 6, you can find the instructions and information needed to connect to your data lake using business intelligence applications such as Tableau Cloud and Tableau Desktop that will help you visualize your data and analyze it per your business needs.

Cleanup

Keeping the data lake in your AWS account will incur charges due to the resources provisioned. To avoid incurring future charges, run these commands on your terminal where you cloned the GitHub repository and follow the instructions:

cd Data-Lake-for-Nonprofits/app

yarn delete-datalake

Conclusion

This post showed you how AWS services are used to transform your Salesforce data into a data lake. It uses AWS Amplify to host the frontend application and provisions several AWS services for the data lake backend. The architecture is based on the successful collaboration between AWS and Salesforce to build an open source data lake solution using a simple and easy-to-use, wizard-like application.

We invite you to clone the GitHub repository and develop your own solution for your needs, provide feedback, and contribute to the project.

Bringing JavaScript to WebAssembly

#​625 — February 10, 2023

Read on the Web

It looked quiet at first but wow, what an epic week this turned out to be. There’s a lot to chew on here, and we even have a variety of bonus items at the very end of the issue. Enjoy!
__
Your editor, Peter Cooper

JavaScript Weekly

Speeding Up the JS Ecosystem: It’s ESLint’s Turn — Last year we featured an article from the same author about how he was finding, and fixing, low-hanging performance fruit in popular JavaScript projects. He’s back, and he’s found a lot of potential for savings in ESLint this time.

Marvin Hagemeister

The Future (and the Past) of the Web is Server Side Rendering — It’s fair to say the Deno folks have some skin in this game, but nonetheless this is a neat brief history of server-side rendering and why they feel it’s the right approach for modern web development.

Andy Jiang (Deno)

Monitoring Your NestJS Application with AppSignal — With AppSignal, you can monitor your NestJS app with ease and rely on OpenTelemetry to handle third-party instrumentations. AppSignal even provides helper functions to help you build comprehensive custom instrumentation. A box of ? included!

AppSignal sponsor

Ten Web Development Trends in 2023 — Following the State of JS survey results Robin takes a considered look at new web dev trends that we should be paying attention to this year, and why they matter.

Robin Wieruch

Bringing JavaScript to WebAssembly for Shopify Functions — As much as this is focused on a specific use case at Shopify, this is a fascinating look at how they’re integrating JavaScript and WebAssembly under tight constraints. They also talk about Javy, a JS to WebAssembly toolchain being built at Shopify that lets you run JS code on a WASM-embedded JS runtime.

Surma (Shopify)

Google Touts Web-Based Machine Learning with TensorFlow.js

Richard MacManus (The New Stack)

IN BRIEF:

? Time to celebrate — a recent survey allegedly found that JavaScript applications ‘have fewer flaws’ than Java and .NET ones. So there you go.

Honeypot’s highly anticipated ▶️ React.js documentary drops later today – it’ll probably be out by the time you read this.

Vanilla List is a directory of ‘vanilla’ JavaScript controls and plugins.

▶️ Evan You tells us what to expect in 2023 from Vue.js.

The Scala.js project is celebrating its ten year anniversary – it’s now a mature way to build Web projects using Scala, if you prefer.

? Vue.js Live is a JavaScript event taking place both in London and online on May 12 & 15. From the same folks as the also forthcoming JSNation conference.

A history of criticisms levelled at React.

RELEASES:

Eleventy / 11ty 2.0
↳ Popular Node.js static site generator.

pnpm 7.27 – The efficient package manager.

RxDB 14.0 – Offline-first, reactive database.

? Articles & Tutorials

Design Patterns in TypeScript — OO-inspired patterns aren’t for everyone or every use case, but this is a fantastic catalog of examples, complete with diagrams and explanations, if you need to learn to tell apart factory methods from decorators, facades, or proxies.

Refactoring Guru

Resumable React: How To Use React Inside Qwik — Building React apps without ever loading React in the user’s browser? “Sounds too good to be true? Let’s see how this works.”

Yoav Ganbar

Did You Know That You’re Already a Distributed Systems Developer?

Temporal Technologies sponsor

Build a Hacker News Client using Alpine.jsAlpine.js is a thin and elegant reactivity library that lets you add dynamic functionality to your site directly in markup. This is a short and sweet practical example of what you can quickly do with it.

Salai Vedha Viradhan

▶  TypeScript Speedrun: A Crash Course for Beginners — If you want to pick up TypeScript and would find a video guide useful, this is for you. Matt has become well known recently for his educational TypeScript tweets and videos, and this is another good one that flies through the basics. (23 minutes.)

Matt Pocock

Using Notion as a Headless CMS with Nuxt

Trent Brew

The Options API vs Composition API in Vue.js

Charles Allotey

? Code & Tools

Bookmarklet Editor: Easily Work on JavaScript Bookmarklets — Useful because who can remember the exact syntax for a bookmarklet? ? This also can instantly convert code to and from bookmarklet form and includes some examples in the help section (click the big ? to get all the details).

Marek Gibney

Breakpoints and console.log Is the Past, Time Travel Is the Future — 15x faster JavaScript debugging than with breakpoints and console.log, now with support for Vitest.

Wallaby.js sponsor

Yup 1.0: Super Simple Object Schema Validation — Define a schema, transform a value to match, assert the shape of an existing value, or both. Very extensive docs here.

Jason Quense

Material React Table: A Full-Featured React Table Component — Built upon Material UI 5 and TanStack Table 8. The docs include lots of interactive examples.

Kevin Van Cott

BlockNote: Notion-Style Block-Based Text Editor — Built on top of Prosemirror and Tiptap, this is for you if you like the way the Notion note-taking service’s text editor feels. There’s a live demo.

Yousef

TresJS: Build 3D Experiences with Vue.js — Create 3D scenes with Vue components and Three.js. Think React-three-fiber but Vue flavored.

Alvaro Sabu

depngn: Find Out if Dependencies Support a Given Node.js Version — A CLI tool that establishes whether or not the dependencies in your package.json will work against a specified version of Node.

OmbuLabs

Open-Source JS Form Libraries to Automate Your Form Workflow — Self-host SurveyJS to configure and modify multiple forms, convert them to fillable PDF files, and analyze collected data in interactive dashboards.

SurveyJS sponsor

Lawnmower: Build VR Scenes with Custom HTML Tags — A web component library that leans on Three.js and aims “to make building a basic VR website as easy to make as your first HTML site”.

Gareth Marland

Electron 23.0 Released — The popular cross platform JavaScript, HTML + CSS desktop app framework gets bumped up to Node 18.12.1, Chromium 110, and V8 11.0. Windows 7/8/8.1 support has also been dropped, so we might start to see those versions of Windows lose the support of a lot of Electron based apps soon.

Electron Core Team

Run: Run User-Provided Code in a Web Worker

SLASHD Analytics

? Jobs

Software Engineer (Backend) — Join our “kick ass” team. Our software team operates from 17 countries and we’re always looking for more exceptional engineers.

Sticker Mule

Find JavaScript Jobs with Hired — Hired makes job hunting easy-instead of chasing recruiters, companies approach you with salary details up front. Create a free profile now.

Hired

QUICK RELEASES:

vue-easytable 2.23
↳ A data table/grid control for Vue.js. (Demo.)

React-Custom-Scroll 5.0
↳ Customize the browser scroll bar. (Demo.)

react-jsonschema-form 5.1
↳ Component to build Web forms from JSON Schema.

AlaSQL.js 3.1
↳ JavaScript-based SQL database.

jest-puppeteer 7.0
↳ Run tests using Jest & Puppeteer.

MDX 2.3
↳ Markdown for the component era.

? The Bonus Round

✈️ Watching someone wrestle with Python and JavaScript to fly (virtual) planes with Microsoft Flight Simulator tickled me a lot.

A beautiful WebGL2-based fluid simulation. It’s even happy on mobile. Pretty!

Go-like channels in 10 lines of JavaTypeScript..?

? Misko Hevery: “useSignal() is the future of web frameworks and is a better abstraction than useState(), which is showing its age.” (source)

Mike Pennisi asks: when is an object property not a property?

Do you use Postgres at all? Check out Postgres Weekly – one of our sister newsletters. So much is going on in the Postgres space lately and it’s a great way to keep up.

Using GitHub Actions with Amazon CodeCatalyst

An Amazon CodeCatalyst workflow is an automated procedure that describes how to build, test, and deploy your code as part of a continuous integration and continuous delivery (CI/CD) system. You can use GitHub Actions alongside native CodeCatalyst actions in a CodeCatalyst workflow.

Introduction:

In a prior post in this series, Using Workflows to Build, Test, and Deploy with Amazon CodeCatalyst, I discussed creating CI/CD pipelines in CodeCatalyst and how that relates to The Unicorn Project’s main protagonist, Maxine. CodeCatalyst workflows help you reliably deliver high-quality application updates frequently, quickly, and securely. CodeCatalyst allows you to quickly assemble and configure actions to compose workflows that automate your CI/CD pipeline, test reporting, and other manual processes. Workflows use provisioned compute, Lambda compute, custom container images, and a managed build infrastructure to scale execution easily without sacrificing flexibility. In this post, I will return to workflows and discuss running GitHub Actions alongside native CodeCatalyst actions.

Prerequisites

If you would like to follow along with this walkthrough, you will need to:

Have an AWS Builder ID for signing in to CodeCatalyst.
Belong to a space and have the space administrator role assigned to you in that space. For more information, see Creating a space in CodeCatalyst, Managing members of your space, and Space administrator role.
Have an AWS account associated with your space and have the IAM role in that account. For more information about the role and role policy, see Creating a CodeCatalyst service role.

Walkthrough

As with the previous posts in the CodeCatalyst series, I am going to use the Modern Three-tier Web Application blueprint. Blueprints provide sample code and CI/CD workflows to help you get started easily across different combinations of programming languages and architectures. To follow along, you can re-use a project you created previously, or you can refer to a previous post that walks through creating a project using the Three-tier blueprint.

As the team has grown, I have noticed that code quality has decreased. Therefore, I would like to add a few additional tools to validate code quality when a new pull request is submitted. In addition, I would like to create a Software Bill of Materials (SBOM) for each pull request so I know what components are used by the code. In the previous post on workflows, I focused on the deployment workflow. In this post, I will focus on the OnPullRequest workflow. You can view the OnPullRequest pipeline by expanding CI/CD from the left navigation, and choosing Workflows. Next, choose OnPullRequest and you will be presented with the workflow shown in the following screenshot. This workflow runs when a new pull request is submitted and currently uses Amazon CodeGuru to perform an automated code review.

Figure 1. OnPullRequest Workflow with CodeGuru code review

While CodeGuru provides intelligent recommendations to improve code quality, it does not check style. I would like to add a linter to ensure developers follow our coding standards. While CodeCatalyst supports a rich collection of native actions, this does not currently include a linter. Fortunately, CodeCatalyst also supports GitHub Actions. Let’s use a GitHub Action to add a linter to the workflow.

Select Edit in the top right corner of the Workflow screen. If the editor opens in YAML mode, switch to Visual mode using the toggle above the code. Next, select “+ Actions” to show the list of actions. Then, change from Amazon CodeCatalyst to GitHub using the dropdown. At the time this blog was published, CodeCatalyst includes about a dozen curated GitHub Actions. Note that you are not limited to the list of curated actions. I’ll show you how to add GitHub Actions that are not on the list later in this post. For now, I am going to use Super-Linter to check coding style in pull requests. Find Super-Linter in the curated list and click the plus icon to add it to the workflow.

Figure 2. Super-Linter action with add icon

This will add a new action to the workflow and open the configuration dialog box. There is no further configuration needed, so you can simply close the configuration dialog box. The workflow should now look like this.

Figure 3. Workflow with the new Super-Linter action

Notice that the actions are configured to run in parallel. In the previous post, when I discussed the deployment workflow, the steps were sequential. This made sense since each step built on the previous step. For the pull request workflow, the actions are independent, and I will allow them to run in parallel so they complete faster. I select Validate, and assuming there are no issues, I select Commit to save my changes to the repository.

While CodeCatalyst will start the workflow when a pull request is submitted, I do not have a pull request to submit. Therefore, I select Run to test the workflow. A notification at the top of the screen includes a link to view the run. As expected, Super Linter fails because it has found issues in the application code. I click on the Super Linter action and review the logs. Here are few issues that Super Linter reported regarding app.py used by the backend application. Note that the log has been modified slightly to fit on a single line.

/app.py:2:1: F401 ‘os’ imported but unused
/app.py:2:1: F401 ‘time’ imported but unused
/app.py:2:1: F401 ‘json’ imported but unused
/app.py:2:10: E401 multiple imports on one line
/app.py:4:1: F401 ‘boto3’ imported but unused
/app.py:6:9: E225 missing whitespace around operator
/app.py:8:1: E402 module level import not at top of file
/app.py:10:1: E402 module level import not at top of file
/app.py:15:35: W291 trailing whitespace
/app.py:16:5: E128 continuation line under-indented for visual indent
/app.py:17:5: E128 continuation line under-indented for visual indent
/app.py:25:5: E128 continuation line under-indented for visual indent
/app.py:26:5: E128 continuation line under-indented for visual indent
/app.py:33:12: W292 no newline at end of file

With Super-Linter working, I turn my attention to creating a Software Bill of Materials
(SBOM). I am going to use OWASP CycloneDX to create the SBOM. While there is a GitHub Action for CycloneDX, at the time I am writing this post, it is not available from the list of curated GitHub Actions in CodeCatalyst. Fortunately, CodeCatalyst is not limited to the curated list. I can use most any GitHub Action in CodeCatalyst. To add a GitHub Action that is not in the curated list, I return to edit mode, find GitHub Actions in the list of curated actions, and click the plus icon to add it to the workflow.

Figure 4. GitHub Action with add icon

CodeCatalyst will add a new action to the workflow and open the configuration dialog box. I choose the Configuration tab and use the pencil icon to change the Action Name to Software-Bill-of-Materials. Then, I scroll down to the configuration section, and change the GitHub Action YAML. Note that you can copy the YAML from the GitHub Actions Marketplace, including the latest version number. In addition, the CycloneDX action expects you to pass the path to the Python requirements file as an input parameter.

Figure 5. GitHub Action YAML configuration

Since I am using the generic GitHub Action, I must tell CodeCatalyst which artifacts are produced by the action and should be collected after execution. CycloneDX creates an XML file called bom.xml which I configure as an artifact. Note that a CodeCatalyst artifact is the output of a workflow action, and typically consists of a folder or archive of files. You can share artifacts with subsequent actions.

Figure 6. Artifact configuration with the path to bom.xml

Once again, I select Validate, and assuming there are no issues, I select Commit to save my changes to the repository. I now have three actions that run in parallel when a pull request is submitted: CodeGuru, Super-Linter, and Software Bill of Materials.

Figure 7. Workflow including the software bill of materials

As before, I select Run to test my workflow and click the view link in the notification. As expected, the workflow fails because Super-Linter is still reporting issues. However, the new Software Bill of Materials has completed successfully. From the artifacts tab I can download the SBOM.

Figure 8. Artifacts tab listing code review and SBOM

The artifact is a zip archive that includes the bom.xml created by CycloneDX. This includes, among other information, a list of components used in the backend application.

<components>
<component type=”library” bom-ref=”7474f0f6-8aa2-46db-bebf-a7648cff84e1″>
<name>Jinja2</name>
<version>3.1.2</version>
<purl>pkg:pypi/[email protected]</purl>
</component>
<component type=”library” bom-ref=”fad0708b-d007-4f98-a80c-056b136015df”>
<name>aws-cdk-lib</name>
<version>2.43.0</version>
<purl>pkg:pypi/[email protected]</purl>
</component>
<component type=”library” bom-ref=”23e3aaae-b4e1-4f3b-b026-fcd298c9cb9b”>
<name>aws-cdk.aws-apigatewayv2-alpha</name>
<version>2.43.0a0</version>
<purl>pkg:pypi/[email protected]</purl>
</component>
<component type=”library” bom-ref=”d283cf17-9125-422c-b55c-cabb64d18f79″>
<name>aws-cdk.aws-apigatewayv2-integrations-alpha</name>
<version>2.43.0a0</version>
<purl>pkg:pypi/[email protected]</purl>
</component>
<component type=”library” bom-ref=”0f095c84-c9e9-4d6c-a4ed-c4a6c7605426″>
<name>aws-cdk.aws-lambda-python-alpha</name>
<version>2.43.0a0</version>
<purl>pkg:pypi/[email protected]</purl>
</component>
<component type=”library” bom-ref=”b248b85b-ba27-4796-bcdf-6bd82ad47295″>
<name>constructs</name>
<version>&gt;=10.0.0,&lt;11.0.0</version>
<purl>pkg:pypi/[email protected]%3E%3D10.0.0%2C%3C11.0.0</purl>
</component>
<component type=”library” bom-ref=”72b1da33-19c2-4b5c-bd58-7f719dafc28a”>
<name>simplejson</name>
<version>3.17.6</version>
<purl>pkg:pypi/[email protected]</purl>
</component>
</components>

The workflow is now enforcing code quality and generating a SBOM like I wanted. Note that while this is a great start, there is still room for improvement. First, I could collect reports generated by the actions in my workflow, and define success criteria for code quality. Second, I could scan the SBOM for known security vulnerabilities using a Software Composition Analysis (SCA) solution. I will be covering this in a future post in this series.

Cleanup

If you have been following along with this workflow, you should delete the resources you deployed so you do not continue to incur charges. First, delete the two stacks that CDK deployed using the AWS CloudFormation console in the AWS account you associated when you launched the blueprint. These stacks will have names like mysfitsXXXXXWebStack and mysfitsXXXXXAppStack. Second, delete the project from CodeCatalyst by navigating to Project settings and choosing Delete project.

Conclusion

In this post, you learned how to add GitHub Actions to a CodeCatalyst workflow. I used GitHub Actions alongside native CodeCatalyst actions in my workflow. I also discussed adding actions from both the curated list of actions and others not in the curated list. Read the documentation to learn more about using GitHub Actions in CodeCatalyst.

About the authors:

Dr. Rahul Gaikwad

Dr. Rahul is a DevOps Lead Consultant at AWS. He helps customers to migrate and modernize workloads to AWS Cloud with a special focus on DevOps and IaC. He is passionate about building innovative solutions using technology and enjoys collaborating with customers and peers. He contributes to open-source community projects. Outside of work, Rahul has completed Ph.D. in AIOps and he enjoys travelling and spending time with his family.

Anirudh Sharma

Anirudh is a Cloud Support Engineer 2 with an extensive background in DevOps offerings at AWS, he is also a Subject Matter Expert in AWS ElasticBeanstalk and AWS CodeDeploy services. He loves helping customers and learning new services and technologies. He also loves travelling and has a goal to visit Japan someday, is a Golden State Warriors fan and loves spending time with his family.

Navdeep Pareek

Navdeep is Lead Migration Consultant at AWS. He helps customer to migrate and modernize customer workloads to AWS Cloud and have specialisation in automation, DevOps. In his spare time, he enjoys travelling, cooking and spending time with family and friends.

Driving Action and Communication in AWS Amplify Open Source Projects

AWS Amplify is a complete solution that lets frontend web and mobile developers easily build, ship, and host full-stack applications on AWS, with the flexibility to leverage additional AWS services as use cases evolve. To build Amplify applications, customers typically use one of the open source Amplify libraries. At Amplify, we manage and build these open source projects on GitHub.

In the last year, the number of contributions across the Amplify projects has increased and the teams have scaled to meet customer needs while building across programming languages and frameworks. This necessitates a constant balance of collaboration with contributors and also flexible processes to continually move the projects forward. The goal is to provide a delightful experience for Front End developers building on AWS.

Open source is as much about relationships as it is about code. These relationships are being built anytime the team is collaborating with external contributors, developers, and customers at different touch points. A core element to facilitate open source is to facilitate relationships through consistent and transparent communication. There isn’t a clear, well-defined playbook for this. It requires iteration and continuous feedback from contributors to fine tune and make better. How we communicate can mean how the projects are planned, structured, and received by the developer community! And all of this may change over time!

In this post, we’ll cover some processes and tools that we’ve built and use at AWS Amplify to help build a vibrant and responsive open source community.

Organization

For context, an average of 35 external contributor pull requests (PRs) and 350 issues are opened each month on GitHub across the Amplify project repositories. In 2022, this equated to an 80% increase in issues and 66% increase in external contributor PRs from the previous year. An external contributor is any active contributor that is not a member of the Amplify GitHub organization. These projects range from the Amplify CLI to the client libraries like Amplify JS and Amplify UI. We also have a very active Discord server with over 19,000 members.

Along with the growth, it can be a cultural shift for both AWS engineers and customers to work in the open. Not every team member is used to working in public, and not all customers are used to working with AWS through GitHub or Discord.

Our ownership model is that each individual team manages their own repositories and is responsible for the project health and operations. This includes issue and Pull Request (PR) management, communication, and releases. This level of autonomy helps the projects move fast and remain flexible. As the service has grown, so has the need for standardization and operational tooling within each team.

Communication and transparency – reducing the time to response

The starting point for community is consistent communication and transparency. There are different elements to this and it fluctuates over time and as projects grow or slow in velocity. There is an expectation in open source software development that there is an ongoing dialog with the community. This may take the form of contributors helping on issue triage and workarounds, providing feedback on Request for Comments (RFCs), and submitting PRs. It could also be developers using the libraries and services within their applications.

In each of these scenarios, open communication is what helps to oil that machine. In open source, the goal is to determine an action for each issue, pull request, feature request, or question that is created. In most cases, proactive communication and quick responses in issues and PRs helps to determine this action faster. This also shapes the open source contributor experience. An external contributor is more likely to stay engaged in an issue thread or PR review if the maintainers are quick to respond.

Knowing this, we identified ways to reduce friction and make the experience better knowing that communication plays such a large role. These ways are:

Standardize the structure of the operational processes (GitHub issues labels, etc.)
Communicate as early and often as possible in response to open items (issues, PRs, etc.)
Proactively track updates on these items in order to follow up quickly

With these goals in mind, we had to operationalize processes to allow us to deliver on each. Thinking through this, two core themes surfaced:

Develop a consistent approach and cadence for communication
Reduce overall time-to-response (or action)

Tackling these first would improve the contributor experience and begin to strengthen our own internal culture. Once standardized, we could then proactively track open items and metrics.

Communications processes

How do we improve communication across our project? This is where that cultural shift comes in. Sometimes this requires changes to normal process and team communication to fully embrace working in the open. The project is always evolving and updating, including nights and weekends. Without a clear process to drive action, things can quickly begin to get missed.

The first task was to find the root cause of the communication friction points in a project repository. What are the communication touch points? At those points, where can automation help reduce friction and time to next action? The initial entry points to communication in a GitHub repository are:

A contributor is using the project and opens an item in the form of an issue, or
A contributor opens a PR to submit code for inclusion into the project

The team on-call engineers were initially shadowed to observe how they identified, and interacted with, new and updated issues and PRs. A common theme was that the context switching and frequent back and forth on these items was very time consuming and hard to track. After observing this same pattern across multiple projects, it was clear that the conversation should be streamlined.

We needed an efficient way to both remind the original posters of what is needed to help reproduce an issue and encourage them to also provide details. For each of these items, we had to determine the best way to expedite the path to action. Maintainers need to triage an issue to determine the next action to take. Maintainers also need to triage each pull request to determine if it aligns with the project and identify what the next steps are with respect to review and merge.

GitHub issue template forms to the rescue

We initially standardized the GitHub issue template forms across all of the Amplify projects after getting access to the Beta feature in early 2021. These forms are used each time a contributor opens an issue, pull request, or feature request in a repository. This allowed for more actionable conversations by collecting all of the required information up front. The goal of the form was to collect just the required information without introducing unnecessary friction for contributors. This is different from the original GitHub issue templates that allowed for more free-form data entry. With this approach, we were able to require standard information that had been found to be helpful in triaging issues while shadowing the maintainers. Here is a screenshot of a portion of the current GitHub issue form template in the Amplify CLI repository.

This is the form that is used to open new issues in the repository. It asks the user to check some boxes before opening an issue, such as whether they have installed the latest version of the Amplify CLI, searched for duplicate or closed issues, and read the guide for submitting bug reports. It also asks some open ended questions about the user’s environment.

As using the new form to file an issue became the standard, the starting point for communication became clearer, allowing future interactions on issues and PRs to be more direct. Through the structured nature and standardized collection of data fields, we were able to significantly improve the quality of our search results. This has helped to reduce duplicate issues (although they may still exist in some forms in older issues) and increase visibility and traction on current open issues and feature requests.

Standardization

As the Amplify project quickly grew, the operational processes needed to grow with it and remain consistent across teams. Repository standardization tools are typically templates used to structure a project when it is first created. The original templates are helpful but there is an ongoing challenge of maintaining, or changing, the structure over time as these dimensions change. To account for this, we created an open source repository audit tool that provides a declarative approach to keep certain items such as labels, project topic tags, and descriptions up to date.

The audit tool also includes a dashboard to provide insight into other items, such as GitHub actions, required links, and the number of good first issues. The required items are defined in a configuration file and each repository is checked against that structure. This is a simple but effective way to quickly check across all the projects without manually checking each project.

Repositories will change over time but we need to make sure that there is consistency in the core structure. For instance, new labels are added, some are removed. Or, the CODEOWNERs.md file needs updated. The audit tool queries the repository metadata and provides a self-service way for each team to check if the repo is in good standing as part of their standard operating processes or has fallen out of date and needs attention. This screenshot shows a portion of the audit tool for the Amplify JS repository. The green check mark next to the links indicates that these items are present in the repository.

We want customers to have a consistent entry point when landing in any AWS Amplify repository. This includes the same nomenclature and core labels to correctly communicate the lifecycle of issues and PRs. This screenshot shows the required labels section of the tool. The set of required labels is compared against the labels in the repository for any discrepancies.

Data and tooling – separating out external contributor events

To accurately gauge activities on issues and opened items, it was critical to determine whether events were initiated by someone on the Amplify team or an external contributor in order prioritize communication and the triaging process. Our triaging process involves troubleshooting or reproducing an issue by a project maintainer. We needed to isolate GitHub issue (and pull request) event data created by external contributors in a separate tool to support this effort.

To track this event stream from GitHub, we built serverless data processes using AWS Lambda to capture issues and PRs that have been updated, and then capture any new events (comments, labels, etc.) on the issues. The data allows for ad-hoc querying against events to isolate if activity is increasing in a certain area. This also helps teams to quickly spot items when there are lags in timezones or as team on-call rotations change. All of these queries take into account the active members of the Amplify organization to identify those events only triggered by external contributors.

Capturing this data has allowed us to build tools to proactively engage in open and closed issues. The following sections outline these tools.

Trending issues

With the external contributor data separated from Amplify team events, we are able to start tracking, in near real-time, issues that have an increased amount of activity within a given time period. One way has been to create a dashboard that highlights trending issues that are receiving increased activity from external contributors. This is used at the individual project level and also across all of the projects. This helps to identify cross-project themes that may be starting without constant, manual, checking of issues to see what has changed.

The trending dashboard only ranks issues based on the activity that they are receiving from external contributors within a given time period. It takes into account recent comments and the aggregate reactions on comments that are not created by Amplify maintainers. This provides a holistic view to the themes that contributors and developers may be experiencing across the entirety of Amplify without the noise of Amplify team comments.

This includes issues, both open and closed post triage when there’s already been communication. It’s important to track closed issues (that are not locked) to not miss any new comments or activity. Since the issue is closed, those comments may not be seen unless a team member happens to view the issue. This screenshot shows the trending list of open issues in the Amplify CLI repository. The list of issues and aggregated metadata (number of comments and reactions) only includes data from external contributors.

Closed issues with increased activity

As previously mentioned, closed issue visibility is a challenge since it’s difficult to track which closed issues were updated and their specific changes – new comment, increase in number of reactions. The trending dashboard also tracks this activity.

The dashboard is especially useful for issues that are already closed that may continue to receive events. Without automation, it’s very time consuming (and manual) to identify an increased spike in reactions or external contributor comments on issues across one project, let alone over 20. Often times in projects, these comments go unnoticed unless an Amplify team member is tagged or subscribed to a specific item and happens to see the notification.

Having a UI that highlights these items also helps to reduce notification fatigue. The Amplify team (engineers, product, and developer experience) receive many notifications across items for many of the Amplify repositories. It is helpful to have a single dashboard to spot check if activity has increased on an older, closed issue.

Metrics to track

So how do we know that all of these processes and tools are making a positive impact? A few key metrics helped:

Mean Time to Respond (MTTR) – This measures how responsive we are to customers and encourages communicating as soon as possible once an issue or Pull Request is opened.

Mean Time to Close (MTTC) – This measures the overall timeframe that it takes to close an issue or Pull Request.

The definition of these metrics used with the constant stream of event data has allowed us to track the MTTC and MTTR of items at the repository and organization level.

MTTR is primarily about response times. How quickly do we respond back throughout the lifecycle of an issue (or PR)? A lower MTTC indicates that issues are closed faster. This may mean a few things: Maintainers have answered questions, reproduced issues, and actioned PRs. There are a lot of factors that contribute to an item being closed, which may have varying timelines. One example is feature requests that may remain open longer than a normal issue. There are caveats to each metric, but this helps to isolate issues that need further investigation.

Even with the progress in tracking, a few consistent challenges presented themselves. The back and forth communication on issues can be very sporadic. The next step was to isolate which issues and PRs needed a response.

Pending response

It’s difficult and time consuming to keep track of comment responses in issues and PRs utilizing only the notifications view within GitHub. One mechanism to keep track of this is identifying items that were awaiting a response and the original poster has now followed up (i.e. responded).

The ideal flow is something like this:

Issue is opened
Team responds with follow up or questions
Issue is labeled pending-response

A dashboard displays issues that have the pending-response labels AND an external contributor has responded

This helps reduce active response times and highlights when issues have received a reply. Additionally, this ensures that issues are actioned and don’t remain in an unknown state (i.e. not triaged) while awaiting a reply. Similar to the trending issues above, the team created a dashboard to surface any issue that fit this criteria. This screenshot shows the list of issues that need a response from the maintainers. These issues all have the pending-response label and an external contributor has been the most recent to comment.

Conclusion

Open source is not static. Tools, teams, and community are always evolving and what is working today will certainly change over the next year. Working backwards from consistency and communication has allowed Amplify to focus attention on prioritizing proactive action to make it a pleasant experience for community members to contribute to the projects.

We continue to identify more efficient ways to strengthen the developer relationships and facilitate more open communication across the Amplify repositories. As the community and project evolve, it’s important to remain flexible, communicate early and often, and continue to improve what the team can control.

Interested in learning more about open source at AWS Amplify? Follow @AWSAmplify on Twitter to get the latest updates about the Amplify Contributor Program, explore the open source Amplify GitHub repositories, or join the Amplify Discord server.

AWS Teams with OSTIF on Open Source Security Audits

We are excited to announce that AWS is sponsoring open source software security audits by the Open Source Technology Improvement Fund (OSTIF), a non-profit dedicated to securing open source. This funding is part of a broader initiative at Amazon Web Services (AWS) to support open source software supply chain security.

Last year, AWS committed to investing $10 million over three years alongside the Open Source Security Foundation (OpenSSF) to fund supply chain security. AWS will be directly funding $500,000 to OSTIF as a portion of our ongoing initiative with OpenSSF. OSTIF has played a critical role in open source supply chain security by providing security audits and reviews to projects through their work as a pre-existing partner of the OpenSSF. Their broad experience with auditing open source projects has already provided significant benefits. This month the group completed a significant security audit of Git that uncovered 35 issues, including two critical and one high-severity finding. In July, the group helped find and fix a critical vulnerability in sigstore, a new open source technology for signing and verifying software.

Many of the tools and services provided by AWS are built on open source software. Through our OSTIF sponsorship, we can proactively mitigate software supply chain risk further up the supply chain by improving the health and security of the foundational open source libraries that AWS and our customers rely on. Our investment helps support upstream security and provides customers and the broader open source community with more secure open source software.

Supporting open source supply chain security is akin to supporting electrical grid maintenance. We all need the grid to continue working, and to be in good repair, because nothing gets powered without it. The same is true of open source software. Virtually everything of importance in the modern IT world is built atop open source. We need open source software to be well maintained and secure.

We look forward to working with OSTIF and continuing to make investments in open source supply chain security.

Ways to remove event listeners

#​624 — February 3, 2023

Read on the Web

JavaScript Weekly

You’ve Got Options for Removing Event Listeners — Unnecessary event listeners can cause all sorts of odd problems so it’s good to clean them up when you don’t need them anymore. How? There are several approaches and Alex looks at their pros and cons. (once is a good one to consider if your use case supports it as it’s ‘set and forget.’)

Alex MacArthur

Updates from the 94th TC39 meeting — The TC39 committee that works on the ECMAScript standard met last week and progressed a few language proposals with Change Array by Copy, Intl.NumberFormat v3 and Symbols as WeakMap Keys making it to stage 4. There’s also an interesting example of a downgrade to stage 2 for import assertions.

Hemanth HM

Fast and Flexible Gantt Chart Components for Your Web App — Bryntum’s suite of web components includes powerful Gantt and resource scheduling widgets used by thousands of businesses. The API is very flexible and allows you to configure everything from colors to details of the scheduling logic. Free 45-day trial.

Bryntum sponsor

Netlify Acquires Gatsby — The company behind the Gatsby React-based framework is joining Netlify with many Gatsby Cloud features expected to be integrated into Netlify’s own platform. This places Netlify more directly against Vercel who are behind Next.js (and don’t forget Shopify with Remix too).

Kyle Mathews (Gatsby)

You May Not Need Lodash or Underscore — Inspired by the popular You Might Not Need jQuery, this extensive document provides plain JavaScript alternatives to almost 100 different functions you’d find in popular utility libraries like Lodash and Underscore.

You Don’t Need

The Future of Create React App and Why It Exists — An extensive write up from Dan Abramov on the state of Create React App, a route to take it forwards, and how he sees React as a library working within an ecosystem of frameworks.

Dan Abramov

RELEASES:

Node.js v19.6.0 (Current)

Node.js v18.14.0 (LTS)

Electron 22

TestCafe 2.3
↳ End-to-end Web testing.

Docusaurus 2.3
↳ Popular documentation site generator.

Jotai 2.0

? Articles & Tutorials

How To Lose Functional Programming at Work — An amusing piece you might recognize parts of. It’s certainly possible to have too much of a good thing. “If you’re looking to lose functional programming at work, here are a bunch of mistakes I’ve made on JS-heavy web teams over the years that can help you do the same.”

Robert Pearce

How Node & SWC Can Make a Lightning Fast TypeScript Runtime — If the added compilation time for TypeScript has irritated you so far, Artem has found a way to get things as fast as possible.

Artem Avetisyan

▶  Tailwind CSS, Headless UI, and Powerlifting with Adam Wathan — We talk to Adam about what motivated him to create Tailwind and why it is creating polarizing discourse among developers.

Whiskey Web and Whatnot sponsorpodcast

The Road from Ember Classic to Glimmer Components — If you’ve got a mature Ember.js project you want to modernize, this is for you.

Ignace Maes

Using JavaScript in a Swift App — One for iOS app developers. It’s not perfect but at least it’s an option.

Douglas Hill

Cleaner Unit Tests with Custom Matchers — Using custom matchers to avoid repetitive and ambiguous assertions in Jest.

Jamie King (American Express)

The Yaml Document from Hell: JS Edition — The titular problematic document was spawned in this blog post focusing on Python, but Phil looks to see if JS YAML parsers have the same problems with the much maligned format.

Phil Nash

Too Much Tech Debt and Outdated Packages? Don’t Have Time to Upgrade?

UpgradeJS by OmbuLabs sponsor

On Using Playwright in GitHub Actions

Radosław Miernik

How I Made My App 2.4x Faster Switching to Svelte

Flotes Tech Blog

? Code & Tools

FeedbackPlus: Add Screenshot Tools to Your Feedback Forms — Say you’ve got a form on your app to let users submit bugs or feedback and you’d like to encourage them to send a screenshot too – this makes it easier to do. Live demo.

ColonelParrot

▶  ScrollyVideo.js: Responsive ‘Scrollable’ Videos — It’s an interesting effect and well demonstrated here. Compatible with React, Svelte, Vue, or just plain ole’ HTML.

Daniel Kao

Stop Sweating Over Supply Chain Security with Snyk — Get the security practices and tooling to bolster your build pipeline from Snyk’s article on npm security and preventing supply chain attacks.

Snyk sponsor

depngn: Find Out if Dependencies Support a Given Node Version — A CLI tool that establishes whether dependencies in your package.json will work against a specified Node version. May be helpful during upgrades.

OmbuLabs

Eta 2.0: Embedded JS Template Engine for Node, Deno, and Browser — Boasts being lighter and faster than EJS but with many of the same features (it looks a lot like Ruby’s ERB). GitHub repo.

Ben Gubler

Swiper 9.0: Mobile Touch Slider with Accelerated Transitions — Tree shakable, library agnostic, and focused entirely on modern browsers and web APIs. RTL support too. GitHub repo.

Vladimir Kharlampidi

UUID.js: RFC-Compliant UUID Generator — Supports v1 and v4 UUIDs.

LiosK

ReScript 10.1
↳ OCaml-inspired compile-to-JS language.

OrgChart 3.4
↳ Render org charts. (Lots of demos.)

clipboard-polyfill 4.0
↳ ‘Copy to clipboard’ for older browsers and edge cases.

morphdom 2.7
↳ DOM diffing/patching – no VDOM needed.

relative-time-element 4.2
↳ GitHub’s extensions for <time>

js-bson 5.0
↳ Binary JSON parser and serialization.

React Date Picker 4.10
↳ Simple date picker React component.

JustValidate 4.1
↳ Lightweight form validation library.

? Jobs

Platform Engineer – Dev Ops — Come help Qwire modernize how studios, composers, artists, publishers, labels, and the rest of the industry manage music rights.

Qwire

Find JavaScript Jobs with Hired — Hired makes job hunting easy-instead of chasing recruiters, companies approach you with salary details up front. Create a free profile now.

Hired

? Last But Not Least

Madge 6.0: Create Graphs From Your Module Dependencies — A developer tool for generating a visual graph of your module dependencies (works with CommonJS, AMD and ES modules), finding circular dependencies, and discovering other useful info.

Patrik Henningsson