Celonis is the global leader in process mining and process intelligence, serving over 1,400 organizations worldwide. The platform uncovers inefficiencies hidden within business operations by analyzing digital footprints across systems like ERP and CRM. Celonis automatically maps how processes actually run, identifying issues like duplicate invoice payments, delayed purchase orders, and bottlenecks in order-to-cash and procure-to-pay workflows.
The Process Intelligence Platform extracts data from enterprise systems like SAP and Oracle, transforms it through intermediate objects and events, and surfaces insights via dashboards and analytics views. As query demands grew, particularly for multi-billion-row datasets, Celonis needed to transition to a modern OLAP engine capable of handling process mining's unique analytical requirements at scale.
Celonis needed a query engine that could handle the demands of process mining at massive scale, but building one from scratch wasn't feasible.
The core problem stemmed from PQL, Celonis' domain-specific query language. Unlike SQL, which produces tables, PQL produces columns that automatically join across different tables without explicit specifications. Users simply define columns they need, and the system determines the join path. This composability makes PQL dashboard-friendly and intuitive, as filters propagate automatically across visualizations, but it generates queries that regularly join 10+ tables, with some exceeding 100+ joins. To top it off, almost 100% of PQL queries involve aggregation.
The platform's data model added another layer of complexity. Process mining works with objects (entities like purchase orders) and events (actions that happen to those objects). The central concept is the event log, which is an ordered list of events per object over time. These acyclical join graphs must be processed efficiently at scale.
This created several critical challenges:
With all of these unique hurdles, Celonis needed a solution that could handle extreme join complexity, scale to tens of billions of rows, deliver sub-second query performance on dashboards, and remain extensible enough to support specialized process mining operations without requiring a complete architectural overhaul.
After evaluating alternatives, Celonis selected StarRocks as their low-latency analytics engine. The team, led by well-known database researcher Jeff Naughton and seasoned database engineers, recognized StarRocks as a state-of-the-art, cloud-ready query engine that could not only meet SLAs out of the box but also serve as an extensible foundation that would otherwise take hundreds of engineer-years to build from scratch.
StarRocks offered the right foundation at its core:
The biggest challenge was originally query complexity. PQL's automatic joins meant queries regularly exceeded 50 tables, and without intelligent optimization, performance would collapse. With StarRocks' easily extensible cost-based optimizer, the Celonis team made targeted modifications to better handle their workloads:
These optimizer extensions worked in concert with StarRocks’ native statistics framework, including full-table statistics, sample-based sampling, and column-level histograms for foreign keys, to ensure efficient query planning and cost estimation.
Delivering sub-2-second dashboard responsiveness across datasets exceeding 20 billion rows required more than intelligent optimization, however. StarRocks’ predicate pushdown and runtime filters were essential in keeping query costs proportional to result size rather than total table size, ensuring interactive performance even on massive models.
To sustain this speed at scale, Celonis relied heavily on collocated joins. StarRocks automatically analyzes the join graph, identifies tables sharing common keys such as order IDs, and co-partitions them to eliminate shuffle and cross-node communication entirely, achieving true linear scaling across distributed workloads.
Process mining’s event-sequence based data model introduced another challenge. Traditional relational operations would require costly unnesting and renesting, but Celonis adopted an array-first approach, leveraging functions like array_filter, array_map, and array_agg. StarRocks’ sorted streaming aggregation complemented this design, enabling non-blocking, memory-efficient event log creation when data arrived sorted by object ID and timestamp.
Finally, process mining’s multi-dimensional outputs demanded flexibility beyond primitive data types. StarRocks’ native JSON support allowed Celonis to generate and consume JSON structures directly, representing process models and analytical outputs seamlessly. Combined with StarRocks’ extensible framework, this enabled Celonis to implement hundreds of custom functions optimized for process-mining operations. For multi-tenant deployments, resource groups ensured predictable performance by isolating CPU, memory, and parallelism across workloads.
StarRocks enabled Celonis to deliver production-grade process mining analytics at enterprise scale, supporting complex workloads with exceptional performance and flexibility.
With this flexible architecture, Celonis achieves the best of both worlds: real-time process visibility and high-performance analytics at scale. The ability to tune workloads without heavy ETL unlocks faster decisions, simpler operations, and a foundation built for continuous optimization.
Looking ahead, Celonis’s goal is to continue refining and expanding its StarRocks-powered architecture to further enhance performance and scalability across its process mining platform, including:
To hear the story directly from the Celonis team, watch Hector Gonzalez’s session from StarRocks Summit 2025, where he explains how StarRocks drives process mining at scale.