This guidance outlines a comprehensive approach for Extract, Transform, and Load (ETL) of blockchain data into a column-oriented storage format, enabling streamlined access and efficient analysis. The framework encompasses an open-source architecture tailored for cross-chain analytics, encompassing public blockchain data from Bitcoin and Ethereum. These datasets are available through AWS Open Data. The process involves normalizing data from Bitcoin and Ethereum blockchains into tabular structures for blocks, transactions, and additional block-specific data.
Architecture Overview
Step 1
Leverage Amazon Managed Blockchain for Ethereum and self-hosted Bitcoin Core via Amazon Elastic Container Registry (Amazon ECR), Amazon Elastic File System (Amazon EFS), Amazon DynamoDB, and Erigon Ethereum nodes to access Ethereum and Bitcoin data.
Step 2
Deploy Bitcoin feed and worker services using AWS Copilot on AWS Fargate and Amazon Elastic Container Store (Amazon ECS). Subscribe to Bitcoin Core node to fetch historical and live data.
Step 3
Deploy Ethereum feed and worker services using AWS Copilot on Fargate and Amazon ECS. Subscribe to Managed Blockchain for Ethereum and Erigon Ethereum node for fetching historical and live data.
Step 4
Store data from the feeds as Parquet files in Amazon Simple Storage Service (Amazon S3). Amazon S3 promptly ingests new data upon block creation.
Step 5
Aggregate daily and intraday Parquet files using AWS Glue.
Step 6
Utilize catalog data within AWS Glue Data Catalog for querying historical and live data via Amazon Athena and Amazon Redshift.
Step 7
Leverage Amazon QuickSight for visualizing data, catering to business analysts’ needs.
Step 8
For cross-chain analytics, researchers and data scientists can utilize Amazon SageMaker, employing Jupyter Notebooks.
Well-Architected Principles
The AWS Well-Architected Framework assists in understanding the implications of decisions made when building cloud systems. Its six pillars guide the design of reliable, secure, efficient, cost-effective, and sustainable systems. Utilizing the AWS Well-Architected Tool, you can assess your workloads against these best practices by answering questions for each pillar.
The provided architecture diagram adheres to Well-Architected best practices, demonstrating a Solution crafted with the utmost consideration for architectural excellence. To achieve full Well-Architected status, it’s recommended to integrate as many best practices as feasible.
For a comprehensive understanding of the solution and its alignment with AWS Well-Architected Framework, please refer to the detailed guidance and accompanying architecture diagram.
The following disclaimer pertains to the usage of various components provided by AWS, including sample code, software libraries, command line tools, proofs of concept, templates, or any related technology. This material is considered as AWS Content and is subject to the terms outlined in the AWS Customer Agreement or the relevant written agreement between you and AWS, depending on the context.
It is strongly advised that you refrain from employing this AWS Content in your production accounts or utilizing it with production or critical data. The responsibility lies with you to thoroughly test, secure, and optimize the AWS Content, which includes sample code, according to your specific quality control practices and standards, ensuring its suitability for production-grade use.
Deploying AWS Content may lead to incurring AWS charges related to the creation or utilization of AWS chargeable resources. For instance, running Amazon EC2 instances or utilizing Amazon S3 storage could result in associated charges.
In essence, exercise caution and due diligence when working with AWS Content, making certain that you understand the potential implications, risks, and charges involved.