573 billion rows | 300+ tables across 10 blockchains |
30K messages/second ingestion
Coinbase is a cryptocurrency exchange platform that powers over 100 million users worldwide, and is often considered the de-facto crypto currency exchange. To that end, coinbase operates at the intersection of cryptocurrency and compliance, managing infrastructure that must handle the unique demands of blockchain data.
Blockchain data is inherently open, creating massive analytical appetite. Teams need to trace fund flows, detect suspicious activities, and ensure regulatory compliance across multiple chains simultaneously. This requires a system that can balance fast data ingestion with complex join capabilities while providing near real-time data serving.
The nature of crypto data creates extraordinary challenges: blockchain data exploded in volume after 2017, and between 2021 and 2025, data volumes continued increasing at astronomical rates. Ethereum alone processed over 30 billion transactions, with Bitcoin handling 100 times more and Solana managing 10,000 times more.
Around 2022, Coinbase faced a critical decision about their analytics infrastructure as blockchain data volumes scaled exponentially.
The company needed to support fraud detection and compliance workflows that demanded both speed and complexity. Analysts investigating suspicious activities needed to start from a blockchain address and quickly view all transactions to or from that address across multiple blockchains. They needed to generate summaries, highlight potential links to drug trafficking, terrorism, scams, or sanctioned entities, and automatically assess risk levels.
Another critical requirement involved tracing the flow of funds back to their source. When investigating illegal activities, analysts needed to follow money trails with speed and clarity, uncovering networks like fake ID vendor rings. This required combining graph analysis with fast analytical queries on massive datasets.
The technical requirements were demanding. Coinbase needed to handle hundreds of billions of rows across hundreds of tables and it also needed to support complex joins across normalized data, while maintaining fast query performance. Data freshness mattered because suspicious activity detection loses value with stale data.
Coinbase evaluated multiple candidates including TiDB and ClickHouse. Some teams at Coinbase were already using ClickHouse, making it a natural comparison point. The company needed to find which system best fit their blockchain analytics use cases while balancing fast ingestion, join capability, and near real-time serving.
After rigorous evaluation, Coinbase selected StarRocks as their analytics engine for blockchain data and established a close partnership to optimize the platform for their use cases.
The team conducted TPC-H 1TB benchmarking comparing StarRocks and ClickHouse on identical AWS EKS Kubernetes infrastructure. The results were decisive. StarRocks completed all 22 benchmark queries while ClickHouse failed 12 queries with out-of-memory errors. For queries ClickHouse completed, StarRocks was consistently faster from P50 to P99 latency. ClickHouse performed especially poorly on queries involving joins, a critical requirement for blockchain analytics where transactions link addresses across complex networks.
Both systems share architectural similarities as MPP columnar databases, but StarRocks stood out for its superior join performance and memory management. Combined with its extensive SQL support, materialized views, and seamless integration with Kafka, Spark, and Flink for analytics on CDC data, metrics, logs, and event streams, it was the clear choice.
With StarRocks as the foundation, Coinbase built their blockchain transaction explorer: a UI tool helping analysts detect suspicious activities across multiple blockchains. Starting from a blockchain address, analysts can quickly search and view transactions to or from that address. The explorer generates summaries and highlights potential suspicious activities, automatically assessing risk levels and applying labels to transactions associated with each address.
Architecturally, the explorer integrates multiple components to deliver blazing-fast streaming analytics. StarRocks tables are periodically synchronized with Delta tables in the Databricks platform, with complex aggregations being performed through asynchronous materialized views in StarRocks. With change data feed events flowing into StarRocks via a Spark connector, Coinbase achieves strict data freshness and low-latency queries, with results returning in just seconds.
Today, this explorer runs on StarRocks supporting 10 blockchains with over 300 tables and 573 billion rows of data. The system protects customers and ensures regulatory compliance by enabling fast investigation of suspicious patterns.
Coinbase also built a tracer service that helps analysts uncover illegal activities by tracing the flow of funds back to their source. The service is powered by StarRocks for data storage and query execution across blockchains, and by PuppyGraph, a graph data warehouse for graph analysis. Together, they power a visual interface that enables analysts to follow money trails with speed and clarity, revealing networks such as fake ID vendor rings through graph visualization.
Real-time ingestion proved critical for fraud detection. The Coinbase data platform team partnered with CelerData engineers to optimize StarRocks’ Kafka sink connector, enabling events to be delivered from dozens of microservices. This collaboration allowed for the ingestion of 30,000 Kafka messages per second into primary key tables, despite upsert operations traditionally being more expensive than append-only writes.
Coinbase adopted this approach for its rich data format support (JSON, CSV, Protobuf, Avro), built-in transformations via Kafka Connect SMT, and strong observability through connector metrics. It now ingests both frontend user events and backend service events in real time, powering low-latency analytic queries across the platform.
The StarRocks-powered infrastructure now delivers production-grade blockchain analytics supporting Coinbase's most critical fraud detection and compliance workflows:
Coinbase continues expanding their StarRocks deployment and capabilities. They now have their sights on:
Want the story straight from the source? At the first-ever StarRocks Global Summit, Coinbase data leaders Eric Sun (Head of Data Platform) and Xinyu Liu (Senior Staff Software Engineer) walk through how they run analytics at scale with StarRocks. They dive into real-world data modeling patterns for crypto transaction pipelines and graph-based retrieval, share how they built a high-performance Kafka ingestion stack with CelerData that hits 30K records/sec with just-seconds freshness, and explain how they combine hot data in StarRocks with cold datasets in Unity Catalog for seamless hybrid querying. It’s hands-on, production-tested, and full of ideas you can adapt to your own platform. Watch now!