Dive Deeper into Data Lake for Nonprofits, a New Open Source Solution from AWS for Salesforce for Nonprofits

Nonprofits are using cloud-based solutions for fundraising, donor and member management, and communications. With this move online, they have access to more data than ever. This data has the potential to transform their missions and increase their impact. However, sharing, connecting, and interpreting data from many different sources can be a challenge. To address this challenge, Amazon Web Services (AWS) and AWS Partner Salesforce for Nonprofits announced the general availability of Data Lake for Nonprofits – Powered by AWS.

Data Lake for Nonprofits is an open source application that helps nonprofit organizations set up a data lake in their AWS account and populate it with the data that they have in the Salesforce Non Profit Success Pack (NPSP) schema. This data resides in Amazon Relational Database Service (Amazon RDS), where it can be accessed by other AWS services like Amazon Redshift and Amazon QuickSight, as well as Business Intelligence products such as Tableau.

This post is written for developers and solution integrator partners who want to get a closer look at the architecture and implementation of Data Lake for Nonprofits to better understand the solution.  We’ll walk you through the architecture and how to set up a data lake using AWS Amplify.

Prerequisites

To follow along with this walkthrough, you must have the following prerequisites:

An AWS Account

An AWS Identity and Access Management (IAM) user with Administrator permissions that enables you to interact with your AWS account
A Salesforce account that has the Non Profit Success Pack managed packages installed
A Salesforce user that has API permissions to interact with your Salesforce org

Git command line interface installed on your computer for cloning the repository

Node.js and Yarn Package Manager installed on your computer for cleanup

Solution Overview

The following diagram shows the solution’s high-level architecture.

Salesforce and AWS have released the source code in GitHub under a BSD 3-Clause license so nonprofits and their cloud partners can use, customize, and innovate on top of it at no cost.

Use your command-line shell to clone the GitHub repository for your own development.

git clone https://github.com/salesforce-misc/Data-Lake-for-Nonprofits

The solution consists of two tiers which are frontend and backend applications. In the following sections, we’ll walk you through both architectures and show you how to install a data lake.

Frontend Walkthrough

We have developed the frontend application using React with Typescript, and it has three layers.

The view layer comprises React components.
The model layer is where the most of the application logic is maintained. The models are Mobx State Tree (MST) types with properties, actions, and computed values. Relationships between models are expressed as MST trees.
The API layer is used to communicate with AWS Services using AWS SDK for JavaScript v3.

The frontend application has been built and zipped for AWS Amplify using GitHub Workflows. This is done for every change in the repository. The latest build can be found in the GitHub repository.

Backend Walkthrough

We have developed the backend application using using AWS CloudFormation templates that can be found under the infra/cf folder in the GitHub repository. These templates will provision the resources on your AWS account for your data lake as below.

vpc.yaml template is used to provision the Amazon Virtual Private Cloud (Amazon VPC) that the application will be running in.

buckets.yaml template is used to provision several Amazon Simple Storage Service (Amazon S3) buckets.

datastore.yaml is used to provision an Amazon Relational Database Service (Amazon RDS) PostgreSQL database.

athena.yaml is used to provision Amazon Athena with a custom workgroup.

step_function.yaml is used to provision AWS Step Functions and related AWS Lambda functions. Step Functions is used to orchestrate the Lambda functions that are going to import your data from your Salesforce account into Amazon RDS for PostgreSQL using an Amazon AppFlow connection.

Lambda functions can be found in the infra/lambdasfolder.

src/cleanupSQL.ts performs the movement of data from the data-loading database schema to the public database schema.

src/filterSchemaListing.ts filters out S3’s “schema/” key as well as queries Salesforce for deleted objects.

src/finalizeSQL.ts drops tables that will be no longer needed in the application to save cost.

src/listEntities.ts calls APIs since the output is too large for Step Functions.

src/processImport.ts imports a Lambda function which writes to RDS the data which is sent to the Amazon SQS queue.

src/pullNewSchema.ts queries Amazon AppFlow and then Salesforce to gather any new or updated fields.

src/setupSQL.ts sets up the data-loading database schema and creates the tables based on the schema file.

src/statusReport.ts performs a status update to S3 based on where it is in the Step Functions State Machine.

src/updateFlowSchema.ts uses the updated schema file on S3 to create or update the Amazon AppFlow flow.

Amazon Simple Queue Service (Amazon SQS) is used to queue the import data so that the Lambda function can use it.

Amazon EventBridge is used to set up a job that syncs your data based on your choice of frequency.

Amazon CloudWatch is used to keep logs during installation as well as synchronization. The application creates a CloudWatch dashboard to track the usage of the data lake.

Deploy the Frontend Application using AWS Amplify

The latest release of the frontend application can be downloaded from the GitHub repository and deployed in AWS Amplify in your AWS account as explained in the User Guide. It typically takes a few minutes to deploy the frontend application, after which a URL to the frontend application is presented. The URL should look like the link here:

https://abc.xyz….amplifyapp.com

When you open the URL, you will see the frontend application, which provides a wizard-like guide to the steps. Each step will guide you through the instructions on how to move forward.

Step 1 will ask for an Access Key ID and Secret Access key for an IAM user of your AWS account. The application shows guidance on how to log in to AWS Management Console and use Identity and Access Management (IAM) to create the IAM user with admin permissions.

This step also requires you to select the AWS region where you would like to create the Amazon AppFlow connection and deploy the data lake.

Step 2 establishes the connection to your Salesforce account. The application guides the user to leverage Amazon AppFlow, which allows AWS to connect to your Salesforce account.

At the end of the page, use the drop down menu to select the connection name and click Next.

Step 3 will help you choose the data objects and set the frequency of data synchronization for your data lake. The data import option can be set to any date and time, and you can choose the frequency from the given options: daily/weekly/monthly.

This page further displays the complete set of objects from your Salesforce account. Choose the necessary objects that you want to import into your data lake.

In Step 4, you can review the data lake configuration and confirm. This step is where you are allowed to go back to the previous step to make any changes if needed.

Step 5 is where the data lake is provisioned, and then your data is imported. This can take half an hour to several hours, depending on the size of the data in your Salesforce account.

Once the data lake is ready, in Step 6, you can find the instructions and information needed to connect to your data lake using business intelligence applications such as Tableau Cloud and Tableau Desktop that will help you visualize your data and analyze it per your business needs.

Cleanup

Keeping the data lake in your AWS account will incur charges due to the resources provisioned. To avoid incurring future charges, run these commands on your terminal where you cloned the GitHub repository and follow the instructions:

cd Data-Lake-for-Nonprofits/app

yarn delete-datalake

Conclusion

This post showed you how AWS services are used to transform your Salesforce data into a data lake. It uses AWS Amplify to host the frontend application and provisions several AWS services for the data lake backend. The architecture is based on the successful collaboration between AWS and Salesforce to build an open source data lake solution using a simple and easy-to-use, wizard-like application.

We invite you to clone the GitHub repository and develop your own solution for your needs, provide feedback, and contribute to the project.

Flatlogic Admin Templates banner

Driving Action and Communication in AWS Amplify Open Source Projects

AWS Amplify is a complete solution that lets frontend web and mobile developers easily build, ship, and host full-stack applications on AWS, with the flexibility to leverage additional AWS services as use cases evolve. To build Amplify applications, customers typically use one of the open source Amplify libraries. At Amplify, we manage and build these open source projects on GitHub.

In the last year, the number of contributions across the Amplify projects has increased and the teams have scaled to meet customer needs while building across programming languages and frameworks. This necessitates a constant balance of collaboration with contributors and also flexible processes to continually move the projects forward. The goal is to provide a delightful experience for Front End developers building on AWS.

Open source is as much about relationships as it is about code. These relationships are being built anytime the team is collaborating with external contributors, developers, and customers at different touch points. A core element to facilitate open source is to facilitate relationships through consistent and transparent communication. There isn’t a clear, well-defined playbook for this. It requires iteration and continuous feedback from contributors to fine tune and make better. How we communicate can mean how the projects are planned, structured, and received by the developer community! And all of this may change over time!

In this post, we’ll cover some processes and tools that we’ve built and use at AWS Amplify to help build a vibrant and responsive open source community.

Organization

For context, an average of 35 external contributor pull requests (PRs) and 350 issues are opened each month on GitHub across the Amplify project repositories. In 2022, this equated to an 80% increase in issues and 66% increase in external contributor PRs from the previous year. An external contributor is any active contributor that is not a member of the Amplify GitHub organization. These projects range from the Amplify CLI to the client libraries like Amplify JS and Amplify UI. We also have a very active Discord server with over 19,000 members.

Along with the growth, it can be a cultural shift for both AWS engineers and customers to work in the open. Not every team member is used to working in public, and not all customers are used to working with AWS through GitHub or Discord.

Our ownership model is that each individual team manages their own repositories and is responsible for the project health and operations. This includes issue and Pull Request (PR) management, communication, and releases. This level of autonomy helps the projects move fast and remain flexible. As the service has grown, so has the need for standardization and operational tooling within each team.

Communication and transparency – reducing the time to response

The starting point for community is consistent communication and transparency. There are different elements to this and it fluctuates over time and as projects grow or slow in velocity. There is an expectation in open source software development that there is an ongoing dialog with the community. This may take the form of contributors helping on issue triage and workarounds, providing feedback on Request for Comments (RFCs), and submitting PRs. It could also be developers using the libraries and services within their applications.

In each of these scenarios, open communication is what helps to oil that machine. In open source, the goal is to determine an action for each issue, pull request, feature request, or question that is created. In most cases, proactive communication and quick responses in issues and PRs helps to determine this action faster. This also shapes the open source contributor experience. An external contributor is more likely to stay engaged in an issue thread or PR review if the maintainers are quick to respond.

Knowing this, we identified ways to reduce friction and make the experience better knowing that communication plays such a large role. These ways are:

Standardize the structure of the operational processes (GitHub issues labels, etc.)
Communicate as early and often as possible in response to open items (issues, PRs, etc.)
Proactively track updates on these items in order to follow up quickly

With these goals in mind, we had to operationalize processes to allow us to deliver on each. Thinking through this, two core themes surfaced:

Develop a consistent approach and cadence for communication
Reduce overall time-to-response (or action)

Tackling these first would improve the contributor experience and begin to strengthen our own internal culture. Once standardized, we could then proactively track open items and metrics.

Communications processes

How do we improve communication across our project? This is where that cultural shift comes in. Sometimes this requires changes to normal process and team communication to fully embrace working in the open. The project is always evolving and updating, including nights and weekends. Without a clear process to drive action, things can quickly begin to get missed.

The first task was to find the root cause of the communication friction points in a project repository. What are the communication touch points? At those points, where can automation help reduce friction and time to next action? The initial entry points to communication in a GitHub repository are:

A contributor is using the project and opens an item in the form of an issue, or
A contributor opens a PR to submit code for inclusion into the project

The team on-call engineers were initially shadowed to observe how they identified, and interacted with, new and updated issues and PRs. A common theme was that the context switching and frequent back and forth on these items was very time consuming and hard to track. After observing this same pattern across multiple projects, it was clear that the conversation should be streamlined.

We needed an efficient way to both remind the original posters of what is needed to help reproduce an issue and encourage them to also provide details. For each of these items, we had to determine the best way to expedite the path to action. Maintainers need to triage an issue to determine the next action to take. Maintainers also need to triage each pull request to determine if it aligns with the project and identify what the next steps are with respect to review and merge.

GitHub issue template forms to the rescue

We initially standardized the GitHub issue template forms across all of the Amplify projects after getting access to the Beta feature in early 2021. These forms are used each time a contributor opens an issue, pull request, or feature request in a repository. This allowed for more actionable conversations by collecting all of the required information up front. The goal of the form was to collect just the required information without introducing unnecessary friction for contributors. This is different from the original GitHub issue templates that allowed for more free-form data entry. With this approach, we were able to require standard information that had been found to be helpful in triaging issues while shadowing the maintainers. Here is a screenshot of a portion of the current GitHub issue form template in the Amplify CLI repository.

This is the form that is used to open new issues in the repository. It asks the user to check some boxes before opening an issue, such as whether they have installed the latest version of the Amplify CLI, searched for duplicate or closed issues, and read the guide for submitting bug reports. It also asks some open ended questions about the user’s environment.

As using the new form to file an issue became the standard, the starting point for communication became clearer, allowing future interactions on issues and PRs to be more direct. Through the structured nature and standardized collection of data fields, we were able to significantly improve the quality of our search results. This has helped to reduce duplicate issues (although they may still exist in some forms in older issues) and increase visibility and traction on current open issues and feature requests.

Standardization

As the Amplify project quickly grew, the operational processes needed to grow with it and remain consistent across teams. Repository standardization tools are typically templates used to structure a project when it is first created. The original templates are helpful but there is an ongoing challenge of maintaining, or changing, the structure over time as these dimensions change. To account for this, we created an open source repository audit tool that provides a declarative approach to keep certain items such as labels, project topic tags, and descriptions up to date.

The audit tool also includes a dashboard to provide insight into other items, such as GitHub actions, required links, and the number of good first issues. The required items are defined in a configuration file and each repository is checked against that structure. This is a simple but effective way to quickly check across all the projects without manually checking each project.

Repositories will change over time but we need to make sure that there is consistency in the core structure. For instance, new labels are added, some are removed. Or, the CODEOWNERs.md file needs updated. The audit tool queries the repository metadata and provides a self-service way for each team to check if the repo is in good standing as part of their standard operating processes or has fallen out of date and needs attention. This screenshot shows a portion of the audit tool for the Amplify JS repository. The green check mark next to the links indicates that these items are present in the repository.

We want customers to have a consistent entry point when landing in any AWS Amplify repository. This includes the same nomenclature and core labels to correctly communicate the lifecycle of issues and PRs. This screenshot shows the required labels section of the tool. The set of required labels is compared against the labels in the repository for any discrepancies.

Data and tooling – separating out external contributor events

To accurately gauge activities on issues and opened items, it was critical to determine whether events were initiated by someone on the Amplify team or an external contributor in order prioritize communication and the triaging process. Our triaging process involves troubleshooting or reproducing an issue by a project maintainer. We needed to isolate GitHub issue (and pull request) event data created by external contributors in a separate tool to support this effort.

To track this event stream from GitHub, we built serverless data processes using AWS Lambda to capture issues and PRs that have been updated, and then capture any new events (comments, labels, etc.) on the issues. The data allows for ad-hoc querying against events to isolate if activity is increasing in a certain area. This also helps teams to quickly spot items when there are lags in timezones or as team on-call rotations change. All of these queries take into account the active members of the Amplify organization to identify those events only triggered by external contributors.

Capturing this data has allowed us to build tools to proactively engage in open and closed issues. The following sections outline these tools.

Trending issues

With the external contributor data separated from Amplify team events, we are able to start tracking, in near real-time, issues that have an increased amount of activity within a given time period. One way has been to create a dashboard that highlights trending issues that are receiving increased activity from external contributors. This is used at the individual project level and also across all of the projects. This helps to identify cross-project themes that may be starting without constant, manual, checking of issues to see what has changed.

The trending dashboard only ranks issues based on the activity that they are receiving from external contributors within a given time period. It takes into account recent comments and the aggregate reactions on comments that are not created by Amplify maintainers. This provides a holistic view to the themes that contributors and developers may be experiencing across the entirety of Amplify without the noise of Amplify team comments.

This includes issues, both open and closed post triage when there’s already been communication. It’s important to track closed issues (that are not locked) to not miss any new comments or activity. Since the issue is closed, those comments may not be seen unless a team member happens to view the issue. This screenshot shows the trending list of open issues in the Amplify CLI repository. The list of issues and aggregated metadata (number of comments and reactions) only includes data from external contributors.

Closed issues with increased activity

As previously mentioned, closed issue visibility is a challenge since it’s difficult to track which closed issues were updated and their specific changes – new comment, increase in number of reactions. The trending dashboard also tracks this activity.

The dashboard is especially useful for issues that are already closed that may continue to receive events. Without automation, it’s very time consuming (and manual) to identify an increased spike in reactions or external contributor comments on issues across one project, let alone over 20. Often times in projects, these comments go unnoticed unless an Amplify team member is tagged or subscribed to a specific item and happens to see the notification.

Having a UI that highlights these items also helps to reduce notification fatigue. The Amplify team (engineers, product, and developer experience) receive many notifications across items for many of the Amplify repositories. It is helpful to have a single dashboard to spot check if activity has increased on an older, closed issue.

Metrics to track

So how do we know that all of these processes and tools are making a positive impact? A few key metrics helped:

Mean Time to Respond (MTTR) – This measures how responsive we are to customers and encourages communicating as soon as possible once an issue or Pull Request is opened.

Mean Time to Close (MTTC) – This measures the overall timeframe that it takes to close an issue or Pull Request.

The definition of these metrics used with the constant stream of event data has allowed us to track the MTTC and MTTR of items at the repository and organization level.

MTTR is primarily about response times. How quickly do we respond back throughout the lifecycle of an issue (or PR)? A lower MTTC indicates that issues are closed faster. This may mean a few things: Maintainers have answered questions, reproduced issues, and actioned PRs. There are a lot of factors that contribute to an item being closed, which may have varying timelines. One example is feature requests that may remain open longer than a normal issue. There are caveats to each metric, but this helps to isolate issues that need further investigation.

Even with the progress in tracking, a few consistent challenges presented themselves. The back and forth communication on issues can be very sporadic. The next step was to isolate which issues and PRs needed a response.

Pending response

It’s difficult and time consuming to keep track of comment responses in issues and PRs utilizing only the notifications view within GitHub. One mechanism to keep track of this is identifying items that were awaiting a response and the original poster has now followed up (i.e. responded).

The ideal flow is something like this:

Issue is opened
Team responds with follow up or questions
Issue is labeled pending-response

A dashboard displays issues that have the pending-response labels AND an external contributor has responded

This helps reduce active response times and highlights when issues have received a reply. Additionally, this ensures that issues are actioned and don’t remain in an unknown state (i.e. not triaged) while awaiting a reply. Similar to the trending issues above, the team created a dashboard to surface any issue that fit this criteria. This screenshot shows the list of issues that need a response from the maintainers. These issues all have the pending-response label and an external contributor has been the most recent to comment.

Conclusion

Open source is not static. Tools, teams, and community are always evolving and what is working today will certainly change over the next year. Working backwards from consistency and communication has allowed Amplify to focus attention on prioritizing proactive action to make it a pleasant experience for community members to contribute to the projects.

We continue to identify more efficient ways to strengthen the developer relationships and facilitate more open communication across the Amplify repositories. As the community and project evolve, it’s important to remain flexible, communicate early and often, and continue to improve what the team can control.

Interested in learning more about open source at AWS Amplify? Follow @AWSAmplify on Twitter to get the latest updates about the Amplify Contributor Program, explore the open source Amplify GitHub repositories, or join the Amplify Discord server.

Flatlogic Admin Templates banner

The AWS Modern Applications and Open Source Zone: Learn, Play, and Relax at AWS re:Invent 2022

AWS re:Invent is filled with fantastic opportunities, but I wanted to tell you about a space that lets you dive deep with some fantastic open source projects and contributors: the AWS Modern Applications and Open Source Zone! Located in the east alcove on the third floor of the Venetian Conference Center, this space exists so that re:Invent attendees can be introduced to some of the amazing projects that power and enhance the AWS solutions you know and use. We’ve divided the space up into three areas: Demos, Experts, and Fun.

Demos: Learn and be curious

We have two dedicated demo stations in the Zone and a deep list of projects that we are excited to show you from Amazonians, AWS Heroes, and AWS Community Builders. Please keep in mind this schedule may be subject to change, and we have some last minute surprises that we can’t share here, so be sure to drop by.

Monday, November 28, 2022

Kiosk #
9 AM – 11 AM
11 AM – 1 PM
1 PM – 3 PM
3 PM – 5 PM

1

Continuous Deployment
and GitOps delivery with Amazon EKS Blueprints and ArgoCD

Tsahi Duek, Dima Breydo

StackGres: An Advanced
PostgreSQL Platform on EKSAlvaro Hernandez

 TBD

Step Functions templates and
prebuilt Lambda Packages for
deploying scalable serverless applications in seconds

Rustem Feyzkhanov

2

Data on EKS

Vara Bonthu, Brian Hammons

 TBD
How to use Amazon Keyspaces
(for Apache Cassandra) and Apache Spark
to build applicationsMeet Bhagdev

Scale your applications beyond IPv4 limits

Sheetal Joshi

Tuesday, November 29, 2022

Kiosk #
9 AM – 11 AM
11 AM – 1 PM
1 PM – 3 PM
3 PM – 5 PM

1

Let’s build a self service developer portal with AWS Proton

Adam Keller

Doing serverless on AWS with Terraform (serverless.tf + terraform-aws-modules)

Anton Babenko

 Steampipe

Chris Farris, Bob Tordella

Using Lambda Powertools for better observability in IoT Applications

Alina Dima

2

Build and run containers on AWS with AWS Copilot

Sergey Generalov

Fargate Surprise
Fargate Surprise

Amplify Libraries Demo

Matt Auerbach

Wednesday, November 30, 2022

Kiosk #
9 AM – 11 AM
11 AM – 1 PM
1 PM – 3 PM
3 PM – 5 PM

1

Building Embedded Devices with FreeRTOS SMP and the Raspberry Pi Pico

Dan Gross

Quantum computing in the cloud with Amazon Braket

Michael Brett, Katharine Hyatt

Leapp

Andrea Cavagna

EKS multicluster management and applications delivery

Nicholas Thomson, Sourav Paul

2

Using SAM CLI and Terraform for local testing

Praneeta Prakash, Suresh Poopandi

How to use Terraform AWS and AWSCC provider in your project

Tyler Lynch, Drew Mullen

How to use Terraform AWS and AWSCC provider in your project

Glenn Chia, Welly Siau

Smart City Monitoring Using AWS IoT and Digital Twin

Syed Rehan

Thursday, December 1, 2022

Kiosk #
9 AM – 11 AM
11 AM – 1 PM
1 PM – 3 PM

1

Modern data exchange using AWS data streaming

Ali Alemi

Learn how to leverage your Amazon EKS cluster as a substrate for execution of distributed Ray programs for Machine Learning.

Apoorva Kulkarni

 TBD

2

Spreading apps, controlling traffic, and optimizing costs in Kubernetes

Lukonde Mwila

 TBD

Terraform IAM policy validator

Bohan Li

Experts

Pull up a chair, grab a drink and a snack, charge your devices, and have a conversation with some of our experts. We’ll have people visiting the zone all throughout re:Invent, with expertise in a variety of open source technologies and AWS services including (but not limited to):

Amazon Athena
Amazon DocumentDB (with MongoDB compatibility)
Amazon DynamoDB
Amazon Elastic Container Service (Amazon ECS)
Amazon Elastic Kubernetes Service (Amazon EKS)
Amazon Eventbridge
Amazon Keyspaces (for Apache Cassandra)
Amazon Kinesis
Amazon Linux
Amazon Managed Grafana
Amazon Managed Service for Prometheus
Amazon Managed Workflows for Apache Airflow (Amazon MWAA)
Amazon MQ
Amazon Redshift
Amazon Simple Notification Service (Amazon SNS)
Amazon Simple Queue Service (Amazon SQS)
Amazon Simple Storage Service (Amazon S3)
Apache Flink, Hadoop, Hudi, Iceberg, Kafka, and Spark
Automotive Grade Linux
AWS Amplify
AWS App Mesh
AWS App Runner
AWS CDK
AWS Copilot
AWS Distro for OpenTelemetry
AWS Fargate
AWS Glue
AWS IoT Greengrass
AWS Lambda
AWS Proton
AWS SDKs
AWS Serverless Application Model (AWS SAM)
AWS Step Functions
Bottlerocket
Cloudscape Design System
Embedded Linux
Flutter
FreeRTOS
Javascript
Karpenter
Lambda Powertools
OpenSearch
Red Hat OpenShift Service on AWS (ROSA)
Rust
Terraform

Fun

Want swag? We’ve got it, but it is protected by THE CLAW. That’s right, we brought back the claw machine, and this year, we might have some extra special items in there for you to catch. No spoilers, but we’ve heard there have been some Rustaceans sighted. You might want to bring an extra (empty) suitcase.

But we’re not done. By popular request, we also brought back Dance Dance Revolution. Warm up your dancing shoes or just cheer on the crowd. You never know who will be showing off their best moves.

Conclusion

The AWS Modern Applications and Open Source Zone is a must-visit destination for your re:Invent journey. With demos, experts, food, drinks, swag, games, and mystery surprises, how can you not stop by?

Flatlogic Admin Templates banner

Adding CDK Constructs to the AWS Analytics Reference Architecture

In 2021, we released the AWS Analytics Reference Architecture, a new AWS Cloud Development Kit (AWS CDK) application end-to-end example, as open source (docs are CC-BY-SA 4.0 International, sample code is MIT-0). It shows how our customers can use the available AWS products and features to implement well-architected analytics solutions. It also regroups AWS best practices for designing, implementing and operating analytics solutions through different purpose-built patterns. Altogether, the AWS Analytics Reference Architecture answers common requirements and solves customer challenges.

In 2022, we extended the scope of this project with AWS CDK constructs to provide more granular and reusable examples. This project is now composed of:

Reusable core components exposed in an AWS CDK library currently available in Typescript and Python. This library contains the AWS CDK constructs that can be used to quickly provision prepackaged analytics solutions.
Reference architectures consuming the reusable components in AWS CDK applications, and demonstrating end-to-end examples in a business context. Currently, only the AWS native reference architecture is available but others will follow.

In this blog post, we will first show how to consume the core library to quickly provision analytics solutions using CDK Constructs and experiment with AWS analytics products.

Building solutions with the Core Library

To illustrate how to use the core components,  let’s see how we can quickly build a Data Lake, a central piece for most analytics projects. The storage layer is implemented with the DataLakeStorage CDK construct relying on Amazon Simple Storage Service (Amazon S3), a durable, scalable and cost-effective object storage service. The query layer is implemented with the AthenaDemoSetup construct using Amazon Athena, an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. With regard to the data catalog, it‘s implemented with the DataLakeCatalog construct using AWS Glue Data Catalog.

Before getting started, please make sure to follow the instructions available here for setting up the prerequisites:

Install the necessary build dependencies
Bootstrap the AWS account
Initialize the CDK application.

This architecture diagram depicts the data lake building blocks we are going to deploy using the AWS Analytics Reference Architecture library. These are higher level constructs (commonly called L3 constructs) as they integrate several AWS services together in patterns.

To assemble these components, you can add this code snippet in your app.py file:

import aws_analytics_reference_architecture as ara

# Create a new DataLakeStorage with Raw, Clean and Transform buckets
storage = ara.DataLakeStorage(scope=self, id=”storage”)

# Create a new DataLakeCatalog with Raw, Clean and Transform databases
catalog = ara.DataLakeCatalog(scope=self, id=”catalog”)

# Configure a new Athena Workgroup
athena_defaults = ara.AthenaDemoSetup(scope=self, id=”demo_setup”)

# Generate data from Customer TPC dataset
data_generator = ara.BatchReplayer(
scope=self,
id=”customer-data”,
dataset=ara.PreparedDataset.RETAIL_1_GB_CUSTOMER,
sink_object_key=”customer”,
sink_bucket=storage.raw_bucket,
)

# Role with default permissions for any Glue service
glue_role = ara.GlueDemoRole.get_or_create(self)

# Crawler to create tables automatically
crawler = glue.CfnCrawler(self, id=’ara-crawler’, name=’ara-crawler’,
role=glue_role.iam_role.role_arn, database_name=’raw’,
targets={‘s3Targets’: [{“path”: f”s3://{storage.raw_bucket.bucket_name}/{data_generator.sink_object_key}/”}],}
)

# Trigger to kick off the crawler
cfn_trigger = glue.CfnTrigger(self, id=”MyCfnTrigger”,
actions=[{‘crawlerName’: crawler.name}],
type=”SCHEDULED”, description=”ara_crawler_trigger”,
name=”min_based_trigger”, schedule=”cron(0/5 * * * ? *)”, start_on_creation=True,
)

In addition to this library construct, the example also includes lower level constructs (commonly called L1 constructs) from the AWS CDK standard library. This shows that you can combine constructs from any CDK library interchangeably.

For use cases where customers have a need to adjust the default configurations in order to align with their organization specific requirements (e.g. data retention rules), the constructs can be changed through the class parameters as shown in this example:

storage = ara.DataLakeStorage(scope=self, id=”storage”, raw_archive_delay=180, clean_archive_delay=1095)

Finally, you can deploy the solution using the AWS CDK CLI from the root of the application with this command: cdk deploy. Once you deploy the solution, AWS CDK provisions the AWS resources included in the Constructs and you can log into your AWS account.

Go to the Athena console and start querying the data. The AthenaDemoSetup provides an Athena workgroup called “demo” that you can select to start querying the BatchReplayer data very quickly. Data is stored in the DataLakeStorage and registered in the DataLakeCatalog. Here is an example of an Athena query accessing the customer data from the BatchReplayer:

Accelerate the implementation

Earlier in the post we pointed out that the library simplifies and accelerates the development process. First, writing Python code is more appealing than writing CloudFormation markup code, either in json or yaml. Second, the CloudFormation template generated by the AWS CDK for the data lake example is 16 times more verbose than Python scripts.

❯ cdk synth | wc -w
2483

❯ wc -w ara_demo/ara_demo_stack.py
154

Demonstrating end-to-end examples with reference architectures

The AWS native reference architecture is the first reference architecture available. It explains the journey of a fake company, MyStore Inc., as it implements its data platform solution with AWS products and services . Deploying the AWS native reference architecture demonstrates a fully working example of a data platform from data ingestion to business analysis. AWS customers can learn from it, see analytics solutions in action, and play with retail dataset and business analysis.

More reference architectures will will be added to this project in Github later.

Business Story

The AWS native reference architecture is faking a retail company called MyStore Inc. that is building a new analytics platform on top of AWS products. This example shows how retail data can be ingested, processed, and analyzed in streaming and batch processes to provide business insights like sales analysis. The platform is built on top of the CDK Constructs from the core library to minimize development effort and inherit from AWS best practices.

Here is the architecture deployed by the AWS native reference architecture:

The platform is implemented in purpose-built modules. They are decoupled and can be independently provisioned but still integrate with each other. The global platformMyStore’s analytics platform has been able to deploy the following modules thanks to:

Data Lake foundations: This mandatory module (based on DataLakeCatalog and DataLakeStorage core constructs) is the core of the analytics platform. It contains the data lake storage and associated metadata for both batch and streaming data. The data lake is organized in multiple Amazon S3 buckets representing different versions of the data. (a) The raw layer contains the data coming from the data sources in the raw format. (b) The cleaned layer contains the raw data that has been cleaned and parsed to a consumable schema. (c) And the curated layer contains refactored data based on business requirements.

Batch analytics: This module is in charge of ingesting and processing data from a Stores channel generated by the legacy systems in batch mode. Data is then exposed to other modules for downstream consumption. The data preparation process leverages various features of AWS Glue, a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development via the Apache Spark framework. The orchestration of the preparation is handled using AWS Glue Workflows that allows managing and monitoring executions of Extract, Transform, and Load (ETL) activities involving multiple crawlers, jobs, and triggers. The metadata management is implemented via AWS Glue Crawlers, a serverless process that crawls data sources and sinks to extract the metadata including schemas, statistics and partitions. It saves them in the AWS Glue Data Catalog.

Streaming analytics: This module is ingesting and processing real time data from the Web channel generated by cloud native systems. The solution minimizes data analysis latency but also to feed the data lake for downstream consumption.

Data Warehouse: This module is ingesting data from the data lake to support reporting, dashboarding and ad hoc querying capabilities. The module is using an Extract, Load, and Transform (ELT) process to transform the data from the Data Lake foundations module. Here are the steps that outline the data pipeline from the data lake into the data warehouse. 1. AWS Glue Workflow reads CSV files from the Raw layer of the data lake and writes them to the Clean layer as Parquet files. 2. Stored procedures in Amazon Redshift’s stg_mystore schema extract data from the Clean layer of the data lake using Amazon Redshift Spectrum. 3. The stored procedures then transform and load the data into a star schema model.

Data Visualization: This module is providing dashboarding capabilities to business users like data analysts on top of the Data Warehouse module, but also provides data exploration on top of the Data Lake module. It is implemented with Amazon Quicksight, a scalable, serverless, embeddable, and machine learning-powered business intelligence tool. Amazon QuickSight is connected to the data lake via Amazon Athena and the data lake via Amazon Redshift using direct query mode, in opposition to the caching mode with SPICE.

Project Materials

The AWS native reference architecture provides both code and documentation about MyStore’s analytics platform:

Documentation is available on GitHub and comes in two different parts:

The high level design describes the overall data platform implemented by MyStore, and the different components involved. This is the recommended entry point to discover the solution.
The analytics solutions provide fine-grained solutions to the challenges MyStore met during the project. These technical patterns can help you choose the right solution for common challenges in analytics.

The code is publicly available here and can be reused as an example for other analytics platform implementations. The code can be deployed in an AWS account by following the getting started guide.

Conclusion

In this blog post, we introduced new AWS CDK content available for customers and partners to easily implement AWS analytics solutions with the AWS Analytics Reference Architecture. The core library provides reusable building blocks with best practices to accelerate the development life cycle on AWS and the reference architecture demonstrates running examples with end-to-end integration in a business context.

Because of its reusable nature, this project will be the foundation for lots of additional content. We plan to extend the technical scope of it with Constructs and reference architectures for a data mesh. We’ll also expand the business scope with industry focused examples. In a future blog post, we will go deeper into the constructs related to Amazon EMR Studio and Amazon EMR on EKS to demonstrate how customers can easily bootstrap an efficient data platform based on Amazon EMR Spark and notebooks.

Flatlogic Admin Templates banner

Caching NextJS Apps with Serverless Redis using Upstash

The modern application we build today is sophisticated. Every time a user loads a webpage, their browser needs to download the bulk of data in order to display that page. A website may consist of millions of data and serve hundreds of API calls. For the data to move smoothly with zero delays between server and client we can follow many strategies. We, developers want our app to deliver the best user experience possible, to achieve this we can employ a variety of techniques available.

There are a number of ways we can address this situation. It would be the best optimization if we could apply techniques that can reduce the amount of latency to perform read/write operations on the database. One of the most popular ways to optimize our API calls is by implementing Caching mechanism.

What is Caching?

Caching is the process of storing copies of files in a cache, or temporary storage location so that they can be accessed more quickly. Technically, a cache is any temporary storage location for copies of files or data, but the term is often used in reference to Internet technologies.

By Cloudflare.com

The most common example of caching we can see is the browser cache, which stores frequently accessed website resources locally so that it does not have to retrieve them over the network each time they are needed. Caching can boost the performance bottleneck of our web applications. When mostly dealing with heavy network traffic and large API calls optimization this technique can be one of the best options for our performance optimization.

Redis: Caching in Server-side

When we talk about caching in servers, one of the top pioneers of caching built-in databases is Redis. Redis (for REmote DIctionary Server) is an open-source NoSQL in-memory key-value data store. One of the best things about Redis is that we can persist data in a database that can continuously store them unless we delete or flush it manually. It is an in-memory database, its data access operations are faster than any other disk-based database, which eventually makes Redis the best choice for caching.

Redis can also be used as a primary database if needed. With the help of Redis, we can call to access and reaccessed as many times as needed without running the database query again. Depending on the Redis cache setup, this can stay in memory for a few hours, a few minutes, or longer. We even can set an expiration time for our caching which we will implement in our demo application.

Redis is able to handle huge amounts of data in real-time, making use of its in-memory data storage capabilities to help support highly responsive database constructs. Caching with Redis allows for fewer database accesses, which helps to reduce the amount of traffic and instances required even achieving a sub-millisecond of latency.

We will implement Redis in our Next application and see the performance gain we can achieve.

Let’s dive into it.

Initializing our Project

Before we begin I assume you have Node installed on your machine so that you can follow along with the steps involved. We will use Next for our project because it helps us write front-end and back-end logic with no configuration needed. We will create a starter project with the following command:

$ npx [email protected]typescript

After the command, give the project the desired name. After everything is done and the project is made for us we can add the dependencies we need to work on in this demo application.

$ npm i ioredis @chakra-ui/core @emotion/core @emotion/styled emotion-theming
$ npm i –save-dev @types/node @types/ioredis

The command above is all the dependencies we will deal with in this project. We will be making the use of ioredis to communicate with our Redis database and style things up with ChakraUI.

As we are using typescript for our project. We will also need to install the typescript version of the node and ioredis which we did in the second command as our local dev dependencies.

Setting up Redis with Upstash

We definitely need to connect our application with Redis. You can use Redis locally and connect to it from your application or use a Redis cloud instance. For this project demo, we will be using Upstash Redis.

Upstash is a serverless database for Redis, with servers/instances, you pay per hour or a fixed price. With Serverless, you pay per request. This means we are not charged when the database is not in use. Upstash configures and manages the database for you.

Head on to Upstash official website and start with an easy free plan. For our demo purpose, we don’t need to pay. Visit the Upstash console after creating your new account and create a new Redis serverless database with Upstash.

You can find the example of the connection string used ioredis in the Upstash dashboard. Copy the blue overlay URL. We will use this connection string to connect to the serverless Redis instance provided in with free tire by Upstash.

import Redis from “ioredis”;
export const redisConnect = new Redis(process.env.REDIS_URL);

In the snippet above we connected our app with the database. We can now use our Redis server instance provided by Upstash inside or our App.

Populating static data

The application we are building might not be an exact use case but, we actually want to see the implementation of caching performance Redis can make to our Application and know how it’s done.

Here we are making a Pokemon application where users can select a list of Pokemon and choose to see the details of Pokemon. We will implement caching to the visited Pokemon. In other words, if users visit the same Pokemon twice they will receive the cached result.

Let’s populate some data inside of our Pokemon options.

export const getStaticProps: GetStaticProps = async () => {
const res = await fetch(
‘https://pokeapi.co/api/v2/pokemon?limit=200&offset=200’
);
const { results }: GetPokemonResults = await res.json();

return {
props: {
pokemons: results,
},
};
};

We are making a call to our endpoint to fetch all the names of Pokemon. The GetStaticProps help us to fetch data at build time. The getStaticProps()function gives props needed for the component Home to render the pages that are generated at build time, not at runtime, and are static.

const Home: NextPage<{ pokemons: Pokemons[] }> = ({ pokemons }) => {
const [selectedPokemon, setSelectedPokemon] = useState<string>(”);
const toast = useToast();
const router = useRouter();

const handelSelect = (e: any) => {
setSelectedPokemon(e.target.value);
};

const searchPokemon = () => {
if (selectedPokemon === ”)
return toast({
title: ‘No pokemon selected’,
description: ‘You need to select a pokemon to search.’,
status: ‘error’,
duration: 3000,
isClosable: true,
});
router.push(`/details/${selectedPokemon}`);
};

return (
<div className={styles.container}>
<main className={styles.main}>
<Box my=”10″>
<FormControl>
<Select
id=”country”
placeholder={
selectedPokemon ? selectedPokemon : ‘Select a pokemon’
}
onChange={handelSelect}
>
{pokemons.map((pokemon, index) => {
return <option key={index}>{pokemon.name}</option>;
})}
</Select>
<Button
colorScheme=”teal”
size=”md”
ml=”3″
onClick={searchPokemon}
>
Search
</Button>
</FormControl>
</Box>
</main>
</div>
);
};

We have successfully populated some static data inside our dropdown to select some Pokemon. Let’s implement a page redirect to a dynamic route when we select a Pokemon name and click the search button.

Adding dynamic page

Creating a dynamic page inside of Next is simple as it has a folder structure provided, which we can leverage to add our dynamic Routes. Let’s create a detailed page for our Pokemon.

const PokemonDetail: NextPage<{ info: PokemonDetailResults }> = ({ info }) => {
return (
<div>
// map our data here
</div>
);
};

export const getServerSideProps: GetServerSideProps = async (context) => {
const { id } = context.query;
const name = id as string;
const data = await fetch(`https://pokeapi.co/api/v2/pokemon/${name}`);
const response: PokemonDetailResults = await data.json();

return {
props: {
info: response,
},
};
};

We made the use of getServerSideProps we are making the use of Server-Side-Rendering provided by Next which will help us to pre-render the page on each request using the data returned by getServerSideProps. This comes in handy when we want to fetch data that changes often and have the page updated to show the most current data. After receiving data we are mapping it over to display it on the screen.

Until now we really have not implemented caching mechanism into our project. Each time the user visits the page we are hitting the API endpoint and sending them back the data they requested for. Let’s move ahead and implement caching into our application.

Caching data

To implement caching in the first place we want to read our Redis database. As discussed Redis stores its data as key-value pairs. We will find whether the key is stored in Redis or not and feed the client with the respective data needed. For this to achieve we will create a function that reads Redis for the key client is requesting.

export const fetchCache = async <T>(key: string, fetchData: () => Promise<T>) => {
const cachedData = await getKey(key);
if (cachedData)return cachedData
return setValue(key, fetchData);
}

When we will know the client is requesting data they have not visited yet we will provide them a copy of data from the server and also behind the scene make a copy inside our Redis database. So, that we can serve data fast through Redis in the next request.

We will write a function where it takes in a parameter of key and if the key exists in the database it will return us parsed value to the client.

const getKey = async <T>(key: string): Promise<T | null> => {
const result = await redisConnect.get(key);
if (result) return JSON.parse(result);
return null;
}

We also need a function where it takes in a key and set the new values alongside with the keys inside our database only if we don’t have that key stored inside of Redis.

const setValue = async <T>(key: string, fetchData: () => Promise<T>): Promise<T> => {
const setValue = await fetchData();
await redisConnect.set(key, JSON.stringify(setValue));
return setValue;
}

Until now we have written everything we need to implement Caching. We will just need is to invoke the function in our dynamic pages. Inside of our [id].tsx we will make a minor tweak where we can invoke an API call only if we don’t have the requested key in Redis.

For this to happen we will need to pass a function as a prop to our fetchCache function.

export const getServerSideProps: GetServerSideProps = async (context) => {
const { id } = context.query;
const name = id as string;

const fetchData = async () => {
const data = await fetch(`https://pokeapi.co/api/v2/pokemon/${name}`);
const response: PokemonDetailResults = await data.json();
return response;
};

const cachedData = await fetchCache(name, fetchData);

return {
props: {
info: cachedData,
},
};
};

We added some tweaks to our code we wrote before. We imported and made the use of fetchCache functions inside of the dynamic page. This function will take in function as a prop and do the checking for key respectively.

Adding expiry

The expiration policy employed by a cache is another factor that helps determine how long a cached item is retained. The expiration policy is usually assigned to the object when it is added to the cache. This can also be customized according to the type of object that’s being cached. A common strategy involves assigning an absolute time of expiration to each object when it is added to the cache. Once that time passes, the item is removed from the cache accordingly.

Let’s also use the caching expiration feature of Redis in our Application. To implement this we just need to add a parameter to our fetchCache function.

const cachedData = await fetchCache(name, fetchData, 60 * 60 * 24);
return {
props: {
info: cachedData,
},
};

export const fetchCache = async (key: string, fetchData: () => Promise<unknown>, expiresIn: number) => {
const cachedData = await getKey(key);
if (cachedData) return cachedData
return setValue(key, fetchData, expiresIn);
}

const setValue = async <T>(key: string, fetchData: () => Promise<T>, expiresIn: number): Promise<T> => {
const setValue = await fetchData();
await redisConnect.set(key, JSON.stringify(setValue), “EX”, expiresIn);
return setValue;
}

For each key that is stored in our Redis database, we have added an expiry time of one day. When the set amount of time elapses, Redis will automatically get rid of the object from the cache so that it may be refreshed by calling the API again. This really helps when we want to feed the client with the updated fresh data every time they call an API.

Performance testing

After all of all these efforts we did which is all for our App performance and optimization. Let’s take a look at our application performance.

This might not be a suitable performance testing for small application. But app serving thousands of API calls with big data set can see a big advantage.

I will make use of the perf_hooks module to assist in measuring the time for our Next lambda to complete an invocation. This is not really provided by Next instead it’s imported from Node. With these APIs, you can measure the time it takes individual dependencies to load, how long your app takes to initially start, and even how long individual web service API calls take. This allows you to make more informed decisions on the efficiency of specific code blocks or even algorithms.

import { performance } from “perf_hooks”;

const startPerfTimer = (): number => {
return performance.now();
}

const endPerfTimer = (): number => {
return performance.now();
}

const calculatePerformance = (startTime: number, endTime: number): void => {
console.log(`Response took ${endTime – startTime} milliseconds`);
}

This may be overkill, to create a function for a line of code but we basically can reuse this function in our application when needed. We will add these function calls to our application and see the results millisecond(ms) of latency, it can impact our app performance overall.

In the above screenshot, we can see the millisecond of improvements in fetching the response. This can be a small improvement in the small application we have built. But, this may be a huge time and performance boost, especially working with large datasets.

Conclusion

Data-heavy applications do need caching operations to improve the response time and even reduce the cost of data volume and bandwidth. With the help of Redis, we can deduct the expensive operation database operations, third-party API calls, and server to server requests by duplicating a copy of the previous requests in our Redis instance.

There might be some cases, we might need to delegate caching to other applications or microservices or any form of key-value storage system that allows us to store and use when we need it. We chose Redis since it is open source and very popular in the industry. Redis’s other cool features include data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, HyperLogLogs, and many more.

I highly recommend you visit the Redis documentation here to gain a depth understanding of other features provided out of the box. Now we can go forth and use Redis to cache frequently queried data in our applications and gain a considerable performance boost.

Please find the code repository here.

Happy coding!

The post Caching NextJS Apps with Serverless Redis using Upstash appeared first on Flatlogic Blog.Flatlogic Admin Templates banner

What is MySQL

MySQL: What is it?

MySQL is an open-source relational database management system that allows you to manage relational databases based on Structured Query Language (SQL) queries. It supports multiple platforms such as Windows, Ubuntu, Linux, macOS, etc. It was developed by Swedish company MySQL AB in 1994, which was acquired in 2008 by American tech company Sun Microsystems. Afterwards, in 2010 the American giant Oracle acquired Sun Microsystems, and since then MySQL de-facto belongs to Oracle.

The database is a structured set of data, for example, a straightforward shopping list or places to store massive quantities of information throughout a business network. The relational database is a digital repository that collects data and organizes it according to a relational model, where tables consist of rows and columns and the relationships between data items follow a strict logical structure. MySQL is a simple software toolkit we can use to perform, manage, and execute queries on such a database. 

Why use MySQL?

MySQL is the most popular RDBMS, let’s look at why this is like that:

Easy-to-use. The only thing you should learn before using MySQL is a basic knowledge of SQL (Structured Query Language).

Open-source. Virtually anyone can install, modify, and use MySQL because of its incredible ease of use. The source code uses the GPL, that is, the GNU General Public License, which specifies guidelines for what you can and can’t do with this application.

High Productivity. MySQL is possibly the fastest database language, according to a multitude of standard tests.

Scalability. MySQL supports multi-threading, meaning the ability of systems to easily operate on small and large quantities of data, clusters of machines, etc. 

Security. MySQL is composed of a strong security level of data that protects confidential data from attackers. The interface is also secure because it has a versatile password system and guarantees host-based validation before accessing the database. 

Multiple data types. MySQL contains many data types, like integers, float, double, char, varchar, text, blob, date, time, DateTime, timestamp, etc.

Community. The community is very large, so if you get stuck anywhere then you can get help from the community.

Client-Server Architecture. MySQL is a client-server based architecture and accessible from anywhere through networking, i.e. each machine communicates with the server via an Internet connection. The server handles requests from the client and returns the result to the client machine. Clients can send queries to remote servers using an Internet connection, but the main requirement is keeping the server active at the time.

Who uses MySQL

MySQL is the most popular and useful database system, the largest and most important companies are probably choosing it such as:

Uber
Twitter
Slack
Airbnb
Pinterest
Amazon

Using the Flatlogic Platform you can also generate an application with a MySQL database system.

How to create your app with Flatlogic Platform

Step 1. Choosing the Tech Stack

In this step, you’re setting the name of your application and choosing the stack: Frontend, Backend, and Database.

Step 2. Choosing the Starter Template

In this step, you’re choosing the design of the web app.

Step 3. Schema Editor

In this part you will need to know which application you want to build, that is, CRM or E-commerce, also in this part you build a database schema i.e. tables and relationships between them.

If you are not familiar with database design and it is difficult for you to understand what tables are, we have prepared several ready-made example schemas of real-world apps that you can build your app upon modification:

E-commerce app;
Time tracking app;
Books store;
Chat (messaging) app;
Blog.

Like all databases in MySQL, there are such types of table relationships as relation_one, relation_many. You can enforce the relationships by defining the right foreign key constraints on the columns.

Relation (one) – one-sided relation capable of storing one or another entity, for example, Employee: [{ name: ‘John’}].
Relation (many) – two-sided relation capable of storing any number of other entities, for example, Employee: [{ name: ‘John’ }, { name: ‘Joe’ }].

Afterwards, you can deploy your application and in a few minutes, you will get a fully functional CMS application with a MySQL database system.

The post What is MySQL appeared first on Flatlogic Blog.Flatlogic Admin Templates banner

What is PostgreSQL?

Introduction

Postgres (or PostgreSQL) is a powerful open-source relational database that supports both SQL (relational) and JSON (non-relational) querying. It was created by scientists from the University of California at Berkeley. It is a very stable object-oriented database management system. The PostgreSQL community has grown for over 20 years, contributing to its high stability, consistency, and correctness. 

At first, PostgreSQL was called Ingres. Afterward, the creators introduced further improvements and expanded its functionality, and changed the name to Postgres95 and finally to PostgreSQL.

Postgres became the 1st choice for corporations performing complex and volumetric high-data operations because of their powerful core technology, featuring MVCC (Multivariant Parallelism Control), in which multiple readers and writers work simultaneously on the system. Postgres has an extraordinary ability to solve several problems concurrently and effectively. That is why business giants like Yahoo!, Apple, Meta, major telecommunication companies, and financial and government institutions keep using PostgreSQL.

Postgres supports multiple programming languages and protocols, such as Ruby, Python, .Net, C/C++, Go, Java, ODBC.

Why use Postgres

Atomicity, Consistency, Isolation, and Durability (ACID) Support. Postgres is completely ACID compliant. It provides the ability to verify and maintain data integrity regardless of errors or network failures. Postgres ACID compliance qualifies it as a valid option for corporate, e-commerce, and applications requiring resiliency.

MVCC. Multi-Version Concurrency Control provides a unique feature of Postgres that allows users to simultaneously write and read data. Supporting MVCC is possible with other SQL databases, although usually problematic without other technology.

Queries. Postgres is the kind of database with the ability to be creative with custom queries. In case your model is complex, you can extend the queries to the database with custom functionality. This allows you easily query the data in specific ways that fit your model of an application.

Community support. Postgres has pretty strong support and extensive documentation. If you have any questions or problems, you can always reach out to the Postgres community.

Extensive support for data types. Postgres is object-oriented and therefore offers to write and read capabilities for all kinds of data structures. Custom, structured and non-relational data types are supported, such as JSON, BSON, primitive and geometric types. PostgreSQL is great at data scaling as well.

Security. Postgres offers a variety of security mechanisms, including user authentication and/or secure TCP/IP connections, all of which protect data in a high-performance way.

Who uses Postgres

Postgres is widely used in a variety of industries like the finanсial sector, Big Data for R&D, web applications, logistics.

Just because the database system is so great, the largest and most important companies are probably choosing it such as:

Apple
IMDb
Instagram
Reddit
Skype
Spotify
Twitch

Using the Flatlogic Platform you can also generate an application with a PostgreSQL database.

How to create your app with Flatlogic Platform

Step 1. Choosing the Tech Stack

In this step, you’re setting the name of your application and choosing the stack: Frontend, Backend, and Database.

Step 2. Choosing the Starter Template

In this step, you’re choosing the design of the web app.

Step 3. Schema Editor

In this part you will need to know which application you want to build, that is, CRM or E-commerce, also in this part you build a database schema i.e. tables and relationships between them.

If you are not familiar with database design and its difficult for you to understand what tables are, we have prepared several ready-made example schemas of real-world apps that you can build your app upon modification:

E-commerce app;
Time tracking app;
Books store;
Chat (messaging) app;
Blog.

Like all databases in PostgreSQL, there are such types of table relationships as relation_one, relation_many. You can enforce the relationships by defining the right foreign key constraints on the columns.

Relation (one) – one-sided relation capable of storing one or another entity, for example, Employee: [{ name: ‘John’}].
Relation (many) – two-sided relation capable of storing any number of other entities, for example, Employee: [{ name: ‘John’ }, { name: ‘Joe’ }].

Afterwards, you can deploy your application and in a few minutes, you will get a fully functional CMS application with PostgreSQL.

Suggested Articles

What is Webpack – Flatlogic Tech Glossary
How to Create a Vue Application [Learn the Ropes!]
What is Hosting and Domain Name – Flatlogic Blog

The post What is PostgreSQL? appeared first on Flatlogic Blog.Flatlogic Admin Templates banner

Auto Updating Created, Updated and Deleted Timestamps In Entity Framework

In any database schema, it’s extremely common to have the fields “DateCreated, DateUpdated and DateDeleted” on almost every entity. At the very least, they provide helpful debugging information, but further, the DateDeleted affords a way to “soft delete” entities without actually deleting them.

That being said, over the years I’ve seen some pretty interesting ways in which these have been implemented. The worst, in my view, is writing C# code that specifically updates the timestamp when created or updated. While simple, one clumsy developer later and you aren’t recording any timestamps at all. It’s very prone to “remembering” that you have to update the timestamp. Other times, I’ve seen database triggers used which.. works.. But then you have another problem in that you’re using database triggers!

There’s a fairly simple method I’ve been using for years and it involves utilizing the ability to override the save behaviour of Entity Framework.

Auditable Base Model

The first thing we want to do is actually define a “base model” that all entities can inherit from. In my case, I use a base class called “Auditable” that looks like so :

public abstract class Auditable
{
public DateTimeOffset DateCreated { get; set; }
public DateTimeOffset? DateUpdated { get; set; }
public DateTimeOffset? DateDeleted { get; set; }
}

And a couple of notes here :

It’s an abstract class because it should only ever be inherited from
We use DateTimeOffset because we will then store the timezone along with the timestamp. This is a personal preference but it just removes all ambiguity around “Is this UTC?”
DateCreated is not null (Since anything created will have a timestamp), but the other two dates are! Note that if this is an existing database, you will need to allow nullables (And work out a migration strategy) as your existing records will not have a DateCreated.

To use the class, we just need to inherit from it with any Entity Framework model. For example, let’s say we have a Customer object :

public class Customer : Auditable
{
public int Id { get; set; }
public string Name { get; set; }
}

So all the class has done is mean we don’t have to copy and paste the same 3 date fields everywhere, and that it’s enforced. Nice and simple!

Overriding Context SaveChanges

The next thing is maybe controversial, and I know there’s a few different ways to do this. Essentially we are looking for a way to say to Entity Framework “Hey, if you insert a new record, can you set the DateCreated please?”. There’s things like Entity Framework hooks and a few nuget packages that do similar things, but I’ve found the absolute easiest way is to simply override the save method of your database context.

The full code looks something like :

public class MyContext: DbContext
{
public override Task<int> SaveChangesAsync(CancellationToken cancellationToken = default)
{
var insertedEntries = this.ChangeTracker.Entries()
.Where(x => x.State == EntityState.Added)
.Select(x => x.Entity);

foreach(var insertedEntry in insertedEntries)
{
var auditableEntity = insertedEntry as Auditable;
//If the inserted object is an Auditable.
if(auditableEntity != null)
{
auditableEntity.DateCreated = DateTimeOffset.UtcNow;
}
}

var modifiedEntries = this.ChangeTracker.Entries()
.Where(x => x.State == EntityState.Modified)
.Select(x => x.Entity);

foreach (var modifiedEntry in modifiedEntries)
{
//If the inserted object is an Auditable.
var auditableEntity = modifiedEntry as Auditable;
if (auditableEntity != null)
{
auditableEntity.DateUpdated = DateTimeOffset.UtcNow;
}
}

return base.SaveChangesAsync(cancellationToken);
}
}

Now you’re context may have additional code, but this is the bare minimum to get things working. What this does is :

Gets all entities that are being inserted, checks if they inherit from auditable, and if so set the Date Created.
Gets all entities that are being updated, checks if they inherit from auditable, and is so set the Date Updated.
Finally, call the base SaveChanges method that actually does the saving.

Using this, we are essentially intercepting when Entity Framework would normally save all changes, and updating all timestamps at once with whatever is in the batch.

Handling Soft Deletes

Deletes are a special case for one big reason. If we actually try and call delete on an entity in Entity Framework, it gets added to the ChangeTracker as… well… a delete. And to unwind this at the point of saving and change it to an update would be complex.

What I tend to do instead is on my BaseRepository (Because.. You’re using one of those right?), I check if an entity is Auditable and if so, do an update instead. The copy and paste from my BaseRepository looks like so :

public async Task<T> Delete(T entity)
{
//If the type we are trying to delete is auditable, then we don’t actually delete it but instead set it to be updated with a delete date.
if (typeof(Auditable).IsAssignableFrom(typeof(T)))
{
(entity as Auditable).DateDeleted = DateTimeOffset.UtcNow;
_dbSet.Attach(entity);
_context.Entry(entity).State = EntityState.Modified;
}
else
{
_dbSet.Remove(entity);
}

return entity;
}

Now your mileage may vary, especially if you are not using the Repository Pattern (Which you should be!). But in short, you must handle soft deletes as updates *instead* of simply calling Remove on the DbSet.

Taking This Further

What’s not shown here is that we can use this same methodology to update many other “automated” fields. We use this same system to track the last user to Create, Update and Delete entities. Once this is up and running, it’s often just a couple more lines to instantly gain traceability across every entity in your database!

The post Auto Updating Created, Updated and Deleted Timestamps In Entity Framework appeared first on .NET Core Tutorials.Flatlogic Admin Templates banner