Building Automation for Fraud Detection Using OpenSearch and Terraform

Organizations that interface with online payments are continuously monitoring and guarding against fraudulent activity. Transactional fraud usually presents itself as discrete data points, making it challenging to identify multiple actors involved in the same group of transactions. Even a single actor operating over a period of time can be hard to detect. Visibility is key to prevent fraud incidents from occurring and to give meaningful knowledge of the activities within your environment to data, security, and operations engineers.

Understanding the connections between individual data points can reduce the time for customers to detect and prevent fraud. You can use a graph database to store transaction information along with the relationships between individual data points. Analyzing those relationships through a graph database can uncover patterns difficult to identify with relational tables. Fraud graphs enable customers to find common patterns between transactions, such as phone numbers, locations, and origin and destination accounts. Additionally, combining fraud graphs with full text search provides additional benefits as it can simplify analysis and integration with existing applications.

In our solution, financial analysts can upload graph data, which gets automatically ingested into the Amazon Neptune graph database service and replicated into Amazon OpenSearch Service for analysis. Data ingestion is automated with Amazon Simple Storage Service (Amazon S3) and Amazon Simple Queue Service (Amazon SQS) integration. We do data replication through AWS Lambda functions and AWS Step Functions for orchestration. The design is using open source tools and AWS Managed Services to build resources and is available in this https://github.com/aws-samples/neptune-fraud-detection-with-opensearch GitHub repository under an MIT-0 license. You will use Terraform and Docker to deploy the architecture, and will be able to send search requests to the system to explore the dataset.

Solution overview

This solution takes advantage of native integration between AWS services for scalability and performance, as well as the Neptune-to-OpenSearch Service replication pattern described in Neptune’s official documentation.

Figure 1 An architectural diagram that illustrates the infrastructure state and workflow as defined in the Terraform templates.

The process for this solution consists of the following steps, also shown in the architecture diagram here:

Financial analyst uploads graph data files to an Amazon S3 bucket.

Note: The data files are in a Gremlin load data format (CSV) and can include vertex files and edge files.

The action of the upload invokes a PUT object event notification with a destination set to an Amazon SQS queue.
The SQS queue is configured as an AWS Lambda event source, which invokes a Lambda function.
This Lambda function sends an HTTP request to an Amazon Neptune database to load data stored in an S3 bucket.
The Neptune database reads data from the S3 endpoint defined in the Lambda request and loads the data into the graph database.
An Amazon EventBridge rule is scheduled to run every 5 minutes. This rule targets an AWS Step Functions state machine to create a new execution.
The Neptune Poller step function (state machine) replicates the data in the Neptune database to an OpenSearch Service cluster.
Note: The Neptune Poller step function is responsible for continually syncing new data after the initial data upload using Neptune Streams.

User can access the replicated data from the Neptune database with Amazon OpenSearch Service.
Note: A Lambda function is invoked to send a search request or query to an OpenSearch Service endpoint to get results.

Prerequisites

To implement this solution, you must have the following prerequisites:

An AWS account with local credentials is configured. For more information, check the documentation on configuration and credential file settings.
The latest version of the AWS Command Line Interface (AWS CLI).
An IAM user with Git credentials.
A Git client to clone the source code provided.
A Bash shell.

Docker installed on your localhost.

Terraform installed on your localhost.

Deploying the Terraform templates

The solution is available in this GitHub repository with the following structure:

data: Contains a sample dataset to be used with the solution for demonstration purposes. Information on fictional transactions, identities and devices is represented in files within the nodes/ folder, and relationships between them are represented in files in the edges/ folder.
terraform: This folder contains the Terraform modules to deploy the solution.
documents: This folder contains the architecture diagram image file of the solution.

Create a local directory called NeptuneOpenSearchDemo and clone the source code repository:

mkdir -p $HOME/NeptuneOpenSearchDemo

cd $HOME/NeptuneOpenSearchDemo

git clone https://github.com/aws-samples/neptune-fraud-detection-with-opensearch.git

Change directory into the Terraform directory:

cd $HOME/NeptuneOpenSearchDemo neptune-fraud-detection-with-opensearch /terraform

Make sure that the Docker daemon is running:

docker info

If the previous command outputs an error that is unable to connect to the Docker daemon, start Docker and run the command again.

Initialize the Terraform folder to install required providers:

terraform init

The solution is deployed on us-west-2 by default. The user can change this behavior by modifying the variable “region” in variables.tf file.

Deploy the AWS services:

terraform apply -auto-approve

Note: Deployment will take around 30 minutes due to the time necessary to provision the Neptune and OpenSearch Service clusters.

To retrieve the name of the S3 bucket to upload data to:

aws s3 ls | grep “neptunestream-loader.*d$”

Upload node data to the S3 bucket obtained in the previous step:

aws s3 cp $HOME/NeptuneOpenSearchDemo/neptune-fraud-detection-with-opensearch /data s3:// neptunestream-loader-us-west-2-123456789012 –recursive

Note: This is a sample dataset for demonstration purposes only created from the IEEE-CIS Fraud Detection dataset.

Test the solution

After the solution is deployed and the dataset is uploaded to S3, the dataset can be retrieved and explored through a Lambda function that sends a search request to the OpenSearch Service cluster.

Confirm the Lambda function that sends a request to OpenSearch was deployed correctly:

aws lambda get-function –function-name NeptuneStreamOpenSearchRequestLambda –-query ‘Configuration.[FunctionName, State]’

Invoke the Lambda function to see all records present in OpenSearch that are added from Neptune:

aws lambda invoke –function-name NeptuneStreamOpenSearchRequestLambda response.json

The results of the Lambda invocation are stored in the response.json file. This file contains the total number of records in the cluster and all records ingested up to that point. The solution stores records in the index amazon_neptune. An example of a node with device information looks like this:

{
“_index”: “amazon_neptune”,
“_type”: “_doc”,
“_id”: “1fb6d4d2936d6f590dc615142a61059e”,
“_score”: 1.0,
“_source”: {
“entity_id”: “d3”,
“document_type”: “vertex”,
“entity_type”: [
“vertex”
],
“predicates”: {
“deviceType”: [
{
“value”: “desktop”
}
],
“deviceInfo”: [
{
“value”: “Windows”
}
]
}
}
}

Cleaning up

To avoid incurring future charges, clean up the resources deployed in the solution:

terraform destroy –auto-approve

The command will output information on resources being destroyed.

Destroy complete! Resources: 101 destroyed.

Conclusion

Fraud graphs are complementary to other techniques organizations can use to detect and prevent fraud. The solution presented in this blog post reduces the time financial analysts would take to access transactional data by automating data ingestion and replication. It also improves performance for systems with growing volumes of data when compared to executing a large number of insert statements or other API calls.

Flatlogic Admin Templates banner

Starting a Web App in 2022 [Research]

Participate in our research and get 70% OFF discount on all Flatlogic products! We will also make the final results public.

The team of Flatlogic is carrying out a global scientific research that aims to identify modern patterns of web application creation, specifically which tools and approaches are used to start web apps in 2022. This is the world’s first research dedicated to this specific topic.

The research is based on an anonymous online survey that consists of 20 questions related to web development. If you are planning to build a web app, whether you are an experienced software engineer or have no web development skills at all, you are eligible for the survey. We strongly ask you to take the online survey! It will help the global web development audience a lot to find out more about the recent approaches to web app development.

We focus on uncovering a diverse representation of the usage of technologies, stacks, methodologies and other dev instruments used by coders that will help us and global audience to understand the patterns of the modern web app creation process.

We plan to advertise the survey to our socials and seek ways to earn responses from those who may not be frequent on our sites.

We look forward to your responses on how to build a full stack web app — and if you share this survey on social media, be sure to tag us! We are commited to share the results with you.

Ready? Steady?
Go!

The post Starting a Web App in 2022 [Research] appeared first on Flatlogic Blog.Flatlogic Admin Templates banner