Building Automation for Fraud Detection Using OpenSearch and Terraform

Organizations that interface with online payments are continuously monitoring and guarding against fraudulent activity. Transactional fraud usually presents itself as discrete data points, making it challenging to identify multiple actors involved in the same group of transactions. Even a single actor operating over a period of time can be hard to detect. Visibility is key to prevent fraud incidents from occurring and to give meaningful knowledge of the activities within your environment to data, security, and operations engineers.

Understanding the connections between individual data points can reduce the time for customers to detect and prevent fraud. You can use a graph database to store transaction information along with the relationships between individual data points. Analyzing those relationships through a graph database can uncover patterns difficult to identify with relational tables. Fraud graphs enable customers to find common patterns between transactions, such as phone numbers, locations, and origin and destination accounts. Additionally, combining fraud graphs with full text search provides additional benefits as it can simplify analysis and integration with existing applications.

In our solution, financial analysts can upload graph data, which gets automatically ingested into the Amazon Neptune graph database service and replicated into Amazon OpenSearch Service for analysis. Data ingestion is automated with Amazon Simple Storage Service (Amazon S3) and Amazon Simple Queue Service (Amazon SQS) integration. We do data replication through AWS Lambda functions and AWS Step Functions for orchestration. The design is using open source tools and AWS Managed Services to build resources and is available in this https://github.com/aws-samples/neptune-fraud-detection-with-opensearch GitHub repository under an MIT-0 license. You will use Terraform and Docker to deploy the architecture, and will be able to send search requests to the system to explore the dataset.

Solution overview

This solution takes advantage of native integration between AWS services for scalability and performance, as well as the Neptune-to-OpenSearch Service replication pattern described in Neptune’s official documentation.

Figure 1 An architectural diagram that illustrates the infrastructure state and workflow as defined in the Terraform templates.

The process for this solution consists of the following steps, also shown in the architecture diagram here:

Financial analyst uploads graph data files to an Amazon S3 bucket.

Note: The data files are in a Gremlin load data format (CSV) and can include vertex files and edge files.

The action of the upload invokes a PUT object event notification with a destination set to an Amazon SQS queue.
The SQS queue is configured as an AWS Lambda event source, which invokes a Lambda function.
This Lambda function sends an HTTP request to an Amazon Neptune database to load data stored in an S3 bucket.
The Neptune database reads data from the S3 endpoint defined in the Lambda request and loads the data into the graph database.
An Amazon EventBridge rule is scheduled to run every 5 minutes. This rule targets an AWS Step Functions state machine to create a new execution.
The Neptune Poller step function (state machine) replicates the data in the Neptune database to an OpenSearch Service cluster.
Note: The Neptune Poller step function is responsible for continually syncing new data after the initial data upload using Neptune Streams.

User can access the replicated data from the Neptune database with Amazon OpenSearch Service.
Note: A Lambda function is invoked to send a search request or query to an OpenSearch Service endpoint to get results.

Prerequisites

To implement this solution, you must have the following prerequisites:

An AWS account with local credentials is configured. For more information, check the documentation on configuration and credential file settings.
The latest version of the AWS Command Line Interface (AWS CLI).
An IAM user with Git credentials.
A Git client to clone the source code provided.
A Bash shell.

Docker installed on your localhost.

Terraform installed on your localhost.

Deploying the Terraform templates

The solution is available in this GitHub repository with the following structure:

data: Contains a sample dataset to be used with the solution for demonstration purposes. Information on fictional transactions, identities and devices is represented in files within the nodes/ folder, and relationships between them are represented in files in the edges/ folder.
terraform: This folder contains the Terraform modules to deploy the solution.
documents: This folder contains the architecture diagram image file of the solution.

Create a local directory called NeptuneOpenSearchDemo and clone the source code repository:

mkdir -p $HOME/NeptuneOpenSearchDemo

cd $HOME/NeptuneOpenSearchDemo

git clone https://github.com/aws-samples/neptune-fraud-detection-with-opensearch.git

Change directory into the Terraform directory:

cd $HOME/NeptuneOpenSearchDemo neptune-fraud-detection-with-opensearch /terraform

Make sure that the Docker daemon is running:

docker info

If the previous command outputs an error that is unable to connect to the Docker daemon, start Docker and run the command again.

Initialize the Terraform folder to install required providers:

terraform init

The solution is deployed on us-west-2 by default. The user can change this behavior by modifying the variable “region” in variables.tf file.

Deploy the AWS services:

terraform apply -auto-approve

Note: Deployment will take around 30 minutes due to the time necessary to provision the Neptune and OpenSearch Service clusters.

To retrieve the name of the S3 bucket to upload data to:

aws s3 ls | grep “neptunestream-loader.*d$”

Upload node data to the S3 bucket obtained in the previous step:

aws s3 cp $HOME/NeptuneOpenSearchDemo/neptune-fraud-detection-with-opensearch /data s3:// neptunestream-loader-us-west-2-123456789012 –recursive

Note: This is a sample dataset for demonstration purposes only created from the IEEE-CIS Fraud Detection dataset.

Test the solution

After the solution is deployed and the dataset is uploaded to S3, the dataset can be retrieved and explored through a Lambda function that sends a search request to the OpenSearch Service cluster.

Confirm the Lambda function that sends a request to OpenSearch was deployed correctly:

aws lambda get-function –function-name NeptuneStreamOpenSearchRequestLambda –-query ‘Configuration.[FunctionName, State]’

Invoke the Lambda function to see all records present in OpenSearch that are added from Neptune:

aws lambda invoke –function-name NeptuneStreamOpenSearchRequestLambda response.json

The results of the Lambda invocation are stored in the response.json file. This file contains the total number of records in the cluster and all records ingested up to that point. The solution stores records in the index amazon_neptune. An example of a node with device information looks like this:

{
“_index”: “amazon_neptune”,
“_type”: “_doc”,
“_id”: “1fb6d4d2936d6f590dc615142a61059e”,
“_score”: 1.0,
“_source”: {
“entity_id”: “d3”,
“document_type”: “vertex”,
“entity_type”: [
“vertex”
],
“predicates”: {
“deviceType”: [
{
“value”: “desktop”
}
],
“deviceInfo”: [
{
“value”: “Windows”
}
]
}
}
}

Cleaning up

To avoid incurring future charges, clean up the resources deployed in the solution:

terraform destroy –auto-approve

The command will output information on resources being destroyed.

Destroy complete! Resources: 101 destroyed.

Conclusion

Fraud graphs are complementary to other techniques organizations can use to detect and prevent fraud. The solution presented in this blog post reduces the time financial analysts would take to access transactional data by automating data ingestion and replication. It also improves performance for systems with growing volumes of data when compared to executing a large number of insert statements or other API calls.

Flatlogic Admin Templates banner

The AWS Modern Applications and Open Source Zone: Learn, Play, and Relax at AWS re:Invent 2022

AWS re:Invent is filled with fantastic opportunities, but I wanted to tell you about a space that lets you dive deep with some fantastic open source projects and contributors: the AWS Modern Applications and Open Source Zone! Located in the east alcove on the third floor of the Venetian Conference Center, this space exists so that re:Invent attendees can be introduced to some of the amazing projects that power and enhance the AWS solutions you know and use. We’ve divided the space up into three areas: Demos, Experts, and Fun.

Demos: Learn and be curious

We have two dedicated demo stations in the Zone and a deep list of projects that we are excited to show you from Amazonians, AWS Heroes, and AWS Community Builders. Please keep in mind this schedule may be subject to change, and we have some last minute surprises that we can’t share here, so be sure to drop by.

Monday, November 28, 2022

Kiosk #
9 AM – 11 AM
11 AM – 1 PM
1 PM – 3 PM
3 PM – 5 PM

1

Continuous Deployment
and GitOps delivery with Amazon EKS Blueprints and ArgoCD

Tsahi Duek, Dima Breydo

StackGres: An Advanced
PostgreSQL Platform on EKSAlvaro Hernandez

 TBD

Step Functions templates and
prebuilt Lambda Packages for
deploying scalable serverless applications in seconds

Rustem Feyzkhanov

2

Data on EKS

Vara Bonthu, Brian Hammons

 TBD
How to use Amazon Keyspaces
(for Apache Cassandra) and Apache Spark
to build applicationsMeet Bhagdev

Scale your applications beyond IPv4 limits

Sheetal Joshi

Tuesday, November 29, 2022

Kiosk #
9 AM – 11 AM
11 AM – 1 PM
1 PM – 3 PM
3 PM – 5 PM

1

Let’s build a self service developer portal with AWS Proton

Adam Keller

Doing serverless on AWS with Terraform (serverless.tf + terraform-aws-modules)

Anton Babenko

 Steampipe

Chris Farris, Bob Tordella

Using Lambda Powertools for better observability in IoT Applications

Alina Dima

2

Build and run containers on AWS with AWS Copilot

Sergey Generalov

Fargate Surprise
Fargate Surprise

Amplify Libraries Demo

Matt Auerbach

Wednesday, November 30, 2022

Kiosk #
9 AM – 11 AM
11 AM – 1 PM
1 PM – 3 PM
3 PM – 5 PM

1

Building Embedded Devices with FreeRTOS SMP and the Raspberry Pi Pico

Dan Gross

Quantum computing in the cloud with Amazon Braket

Michael Brett, Katharine Hyatt

Leapp

Andrea Cavagna

EKS multicluster management and applications delivery

Nicholas Thomson, Sourav Paul

2

Using SAM CLI and Terraform for local testing

Praneeta Prakash, Suresh Poopandi

How to use Terraform AWS and AWSCC provider in your project

Tyler Lynch, Drew Mullen

How to use Terraform AWS and AWSCC provider in your project

Glenn Chia, Welly Siau

Smart City Monitoring Using AWS IoT and Digital Twin

Syed Rehan

Thursday, December 1, 2022

Kiosk #
9 AM – 11 AM
11 AM – 1 PM
1 PM – 3 PM

1

Modern data exchange using AWS data streaming

Ali Alemi

Learn how to leverage your Amazon EKS cluster as a substrate for execution of distributed Ray programs for Machine Learning.

Apoorva Kulkarni

 TBD

2

Spreading apps, controlling traffic, and optimizing costs in Kubernetes

Lukonde Mwila

 TBD

Terraform IAM policy validator

Bohan Li

Experts

Pull up a chair, grab a drink and a snack, charge your devices, and have a conversation with some of our experts. We’ll have people visiting the zone all throughout re:Invent, with expertise in a variety of open source technologies and AWS services including (but not limited to):

Amazon Athena
Amazon DocumentDB (with MongoDB compatibility)
Amazon DynamoDB
Amazon Elastic Container Service (Amazon ECS)
Amazon Elastic Kubernetes Service (Amazon EKS)
Amazon Eventbridge
Amazon Keyspaces (for Apache Cassandra)
Amazon Kinesis
Amazon Linux
Amazon Managed Grafana
Amazon Managed Service for Prometheus
Amazon Managed Workflows for Apache Airflow (Amazon MWAA)
Amazon MQ
Amazon Redshift
Amazon Simple Notification Service (Amazon SNS)
Amazon Simple Queue Service (Amazon SQS)
Amazon Simple Storage Service (Amazon S3)
Apache Flink, Hadoop, Hudi, Iceberg, Kafka, and Spark
Automotive Grade Linux
AWS Amplify
AWS App Mesh
AWS App Runner
AWS CDK
AWS Copilot
AWS Distro for OpenTelemetry
AWS Fargate
AWS Glue
AWS IoT Greengrass
AWS Lambda
AWS Proton
AWS SDKs
AWS Serverless Application Model (AWS SAM)
AWS Step Functions
Bottlerocket
Cloudscape Design System
Embedded Linux
Flutter
FreeRTOS
Javascript
Karpenter
Lambda Powertools
OpenSearch
Red Hat OpenShift Service on AWS (ROSA)
Rust
Terraform

Fun

Want swag? We’ve got it, but it is protected by THE CLAW. That’s right, we brought back the claw machine, and this year, we might have some extra special items in there for you to catch. No spoilers, but we’ve heard there have been some Rustaceans sighted. You might want to bring an extra (empty) suitcase.

But we’re not done. By popular request, we also brought back Dance Dance Revolution. Warm up your dancing shoes or just cheer on the crowd. You never know who will be showing off their best moves.

Conclusion

The AWS Modern Applications and Open Source Zone is a must-visit destination for your re:Invent journey. With demos, experts, food, drinks, swag, games, and mystery surprises, how can you not stop by?

Flatlogic Admin Templates banner