Skip to content

Apache Druid

Why users are migrating from Apache Druid to StarRocks

Initially launched in 2011, Apache Druid® was once the leader in real-time analytics. Unfortunately, as analytics use cases have expanded and become more demanding and sophisticated, Druid now struggles to meet the performance needs of modern data users. Some of these limitations include:
Not ANSI SQL Compatible
Druid provides Druid SQL, a 'SQL Like' query interface. It does not support standard ANSI SQL. Any consuming applications are limited by the functionalities and syntax of Druid SQL.
No Joined Table Support
Druid may have great performance for queries running against a single table, but it struggles with querying joined tables.
No Real-Time Updates
In Druid, once data is written into a segment, it is impossible to update or delete it (imutable). This limited the usage of Druid in many use cases.
Dated Architecture
Based on the Scatter-Gather architecture, Druid is naturally challenged by operations like high cardinality aggregations and precise count distinct.
Because of this, many Apache Druid users have started migrating to the open-source project StarRocks. With StarRocks, these former Apache Druid users are able to enjoy superior performance, concurrency, and ease of use. 

StarRocks vs. Apache Druid

2.2x Greater Performance In Single-Table
3.2x Greater Performance In Low-Cardinality Scenarios
3.0x+ Greater performance compared to other leading solutions

Keep using the tools and languages you love

SQL is the de-facto standard for any analytics application. Any database query engine should natively support SQL.

Unlike Druid, in which SQL is an afterthought added onto its native query language, StarRocks natively supports SQL as its sole query language. StarRocks supports industry-standard ANSI SQL syntax so that you are not locked into a proprietary SQL Like language with limited SQL functions.

StarRocks is also compatible with MySQL protocol, which means all your existing BI tools and applications can work with StarRocks out of the box by using MySQL drivers.


Free yourself from denormazlied tables

Join relationships are the foundation of modern analytics, but they also pose a challenge to query performance.
Apache Druid has tried to circumvent this challenge by focusing on single-table query performance. Because of this, users have to flatten joined tables into a single table in Apache Druid. This step adds pipeline delay and requires extra resources.
StarRocks delivers excellent performance on both single-table queries and joined queries. With StarRocks, users can simplify their data ingestion pipeline, improve data freshness, and cut down on ETL costs.

Embrace mutable data

Mutable data is a common byproduct of business activities. It can be caused by glitches in the underlying data pipeline or it can simply be a part of normal business logic.
Apache Druid, like most other analytical databases, doesn't support UPDATE and DELETE operations natively. Instead, it provides a MUTATION operation to asynchronously ALTER TABLE.
With StarRocks, mutable data is handled natively, and updated analytics results are calculated immediately.


Scale analytics with ease

StarRocks has a Massively Parallel Processing (MPP) architecture. With this architecture, a query request is split into different logical execution units and runs simultaneously on multiple nodes. Each node has its own exclusive resources (CPU, memory) that the MPP architecture can make efficient use of, which enables better horizontal scalability.

In contrast, Druid is built on Scatter-Gather architecture. In this architecture, the Gather component inevitably becomes the bottleneck. That's why Druid struggles with some analytics operations such as high cardinality aggregations and precise count distinct.

StarRocks has also built its native vectorized query engine. The native vectorized engine makes full use of SIMD instructions in CPU to process multiple dataset for every instruction. StarRocks' native vectorized engine improves the overall performance of operators by 3 to 10 times.

Simplify Operations

StarRocks architecture contains a group of Frontend (FE) nodes and Backend (BE) nodes. With no dependencies on any external components, StarRocks is easy to deploy and maintain. Meanwhile, the entire system eliminates single points of failure through replication of meta-data and data.

FE and BE nodes can automatically scale out to support larger data volume or stricter query performance requirements. Data redistribution is handled automatically behind the scene without impacting end users' query experiences.

Druid users would appreciate StarRocks' streamlined architecture since they don't have to manage legacy Hadoop style components such as HDFS, ZooKeeper, etc.

Compare Apache Druid to StarRocks

Designed for the analytics needs of modern enterprises, StarRocks delivers the capabilities and performance. Apache Druid can't say the same.

Apache Druid

Legacy scatter-gather architecture
Only partial SQL syntax support
Poor high-cardinality aggregation performance
Zookeeper-based operations
    No real-time updates
    No distributed joins
    No data lake query support
    No support for federated queries


Modern MPP architecture
Full SQL syntax support
Great performance for high-cardinality dimensions
No 3rd party dependencies
Real-time updates and deletes
Distributed joins
Query support for Hive, Hudi, Iceberg, and Delta
Federated queries with Hive, MySQL, ES, and JDBC sources

Talk to an engineer

Have questions about CelerData and StarRocks? You can connect with our team of solutions architects and experienced engineers who can answer all of your questions and even offer a personalized demo aligned with your specific needs and analytics scenarios.