How Celonis Built a Process Mining Platform on StarRocks to Handle 20+ Billion Row Queries

Written by Ron Kapoor | Nov 3, 2025 11:17:05 PM

20 billion row queries | 100+ table joins | Sub-2-second responsiveness

About Celonis

Celonis is the global leader in process mining and process intelligence, serving over 1,400 organizations worldwide. The platform uncovers inefficiencies hidden within business operations by analyzing digital footprints across systems like ERP and CRM. Celonis automatically maps how processes actually run, identifying issues like duplicate invoice payments, delayed purchase orders, and bottlenecks in order-to-cash and procure-to-pay workflows.

The Process Intelligence Platform extracts data from enterprise systems like SAP and Oracle, transforms it through intermediate objects and events, and surfaces insights via dashboards and analytics views. As query demands grew, particularly for multi-billion-row datasets, Celonis needed to transition to a modern OLAP engine capable of handling process mining's unique analytical requirements at scale.

When Process Mining Meets Scale: The Challenges Celonis Had to Solve

Celonis needed a query engine that could handle the demands of process mining at massive scale, but building one from scratch wasn't feasible.

The core problem stemmed from PQL, Celonis' domain-specific query language. Unlike SQL, which produces tables, PQL produces columns that automatically join across different tables without explicit specifications. Users simply define columns they need, and the system determines the join path. This composability makes PQL dashboard-friendly and intuitive, as filters propagate automatically across visualizations, but it generates queries that regularly join 10+ tables, with some exceeding 100+ joins. To top it off, almost 100% of PQL queries involve aggregation.

The platform's data model added another layer of complexity. Process mining works with objects (entities like purchase orders) and events (actions that happen to those objects). The central concept is the event log, which is an ordered list of events per object over time. These acyclical join graphs must be processed efficiently at scale.

This created several critical challenges:

Complex multi-table joins at scale: Queries with 100+ joins are common, and most systems that handled denormalized data well failed completely with complex joins. Denormalizing would be prohibitively expensive.
Massive dataset sizes: Customer databases range from 5 billion rows on the small side to 20-30 billion rows for larger deployments. Bad statistics on tables this size almost certainly mean poor query performance.
Dashboard performance requirements: Most analysis happens through dashboards requiring sub-10-second response times, with a target of 2 seconds for most queries. Query cost then must be proportional to output rows, not total table size.
Real-time updates without heavy ETL: Customers demanded near real-time analysis without heavy preprocessing phases.
Specialized process mining operations: Out-of-the-box databases couldn't handle Celonis-specific operations like creating event logs, testing if event sequences match process and object definitions, or working in array spaces without unnesting data.

With all of these unique hurdles, Celonis needed a solution that could handle extreme join complexity, scale to tens of billions of rows, deliver sub-second query performance on dashboards, and remain extensible enough to support specialized process mining operations without requiring a complete architectural overhaul.

Solution: How Celonis Re-Engineered Its Analytics Architecture with StarRocks

After evaluating alternatives, Celonis selected StarRocks as their low-latency analytics engine. The team, led by well-known database researcher Jeff Naughton and seasoned database engineers, recognized StarRocks as a state-of-the-art, cloud-ready query engine that could not only meet SLAs out of the box but also serve as an extensible foundation that would otherwise take hundreds of engineer-years to build from scratch.

StarRocks offered the right foundation at its core:

Strong cost-based optimizer that could handle complex joins efficiently with advanced statistics collection, predicate pushdown, and runtime filters
Powerful execution engine with intricate join reordering capabilities and memory-efficient shuffle, broadcast, and collocated join support
Open source ecosystem, which allowed for simple integration with existing tools like PostgreSQL, change data capture, Iceberg, and cloud providers like AWS, GCP, and Azure
Extensibility, which could allow the engineers to add native functions (scalar, aggregate, analytic, table functions) and modify the optimizer to their use cases
Update-forward capabilities in a columnar representation, suitable for batch and streaming workloads that cover inserts via primary key tables and at the column level via partial column manipulation

The biggest challenge was originally query complexity. PQL's automatic joins meant queries regularly exceeded 50 tables, and without intelligent optimization, performance would collapse. With StarRocks' easily extensible cost-based optimizer, the Celonis team made targeted modifications to better handle their workloads:

Cardinality estimation for arrays
Custom cost estimators for column cardinalities
Improved existing strategies for skew joins

These optimizer extensions worked in concert with StarRocks’ native statistics framework, including full-table statistics, sample-based sampling, and column-level histograms for foreign keys, to ensure efficient query planning and cost estimation.

Delivering sub-2-second dashboard responsiveness across datasets exceeding 20 billion rows required more than intelligent optimization, however. StarRocks’ predicate pushdown and runtime filters were essential in keeping query costs proportional to result size rather than total table size, ensuring interactive performance even on massive models.

To sustain this speed at scale, Celonis relied heavily on collocated joins. StarRocks automatically analyzes the join graph, identifies tables sharing common keys such as order IDs, and co-partitions them to eliminate shuffle and cross-node communication entirely, achieving true linear scaling across distributed workloads.

Process mining’s event-sequence based data model introduced another challenge. Traditional relational operations would require costly unnesting and renesting, but Celonis adopted an array-first approach, leveraging functions like array_filter, array_map, and array_agg. StarRocks’ sorted streaming aggregation complemented this design, enabling non-blocking, memory-efficient event log creation when data arrived sorted by object ID and timestamp.

Finally, process mining’s multi-dimensional outputs demanded flexibility beyond primitive data types. StarRocks’ native JSON support allowed Celonis to generate and consume JSON structures directly, representing process models and analytical outputs seamlessly. Combined with StarRocks’ extensible framework, this enabled Celonis to implement hundreds of custom functions optimized for process-mining operations. For multi-tenant deployments, resource groups ensured predictable performance by isolating CPU, memory, and parallelism across workloads.

The Results: Sub-2-Second Dashboards Across 20B+ Rows of Process Data

StarRocks enabled Celonis to deliver production-grade process mining analytics at enterprise scale, supporting complex workloads with exceptional performance and flexibility.

Massive scalability: Data models exceeding 20 billion rows now achieve P90 query latencies of ~20 seconds.
Fast data ingestion: Efficient batch updates allow models to stay current with minimal delay.
Adaptive processing: Celonis can dynamically choose between no preprocessing for faster time to insight, such as executing queries directly on Parquet files, or heavy preprocessing for high-QPS dashboards and large-scale analytics.
Extensive native capabilities: Hundreds of Celonis-specific native functions run in production, optimized for process-mining workloads.
Advanced interoperability: Deep integration with open-source ecosystems enhances flexibility and extensibility.

With this flexible architecture, Celonis achieves the best of both worlds: real-time process visibility and high-performance analytics at scale. The ability to tune workloads without heavy ETL unlocks faster decisions, simpler operations, and a foundation built for continuous optimization.

What's Next for Celonis

Looking ahead, Celonis’s goal is to continue refining and expanding its StarRocks-powered architecture to further enhance performance and scalability across its process mining platform, including:

Faster query execution for string-heavy systems like SAP through smarter in-memory processing.
Better performance for data-intensive workloads with distributed sorting.
Greater efficiency for frequent small updates with improved concurrency and throughput.

To hear the story directly from the Celonis team, watch Hector Gonzalez’s session from StarRocks Summit 2025, where he explains how StarRocks drives process mining at scale.

View full post