Why users are migrating from Apache Druid to StarRocks

Initially launched in 2011, Apache Druid® was once the leader in real-time analytics. Unfortunately, as analytics use cases have expanded and become more demanding and sophisticated, Druid now struggles to meet the performance needs of modern data users. Some of these limitations include:
no-sql

NOT ANSI SQL COMPATIBLE

Druid provides Druid SQL, a 'SQL Like' query interface. It does not support standard ANSI SQL. Any consuming applications are limited by the functionalities and syntax of Druid SQL.
no-table

NO JOINED TABLE SUPPORT

Druid may have great performance for queries running against a single table, but it struggles with querying joined tables.
no-real-time

NO REAL-TIME UPDATES

In Druid, once data is written into a segment, it is impossible to update or delete it (imutable). This limited the usage of Druid in many use cases.
no-data-architecture

DATED ARCHITECTURE

Based on the Scatter-Gather architecture, Druid is naturally challenged by operations like high cardinality aggregations and precise count distinct.

StarRocks vs. Apache Druid

8.9x

GREATER PERFORMANCE IN
WIDE-TABLE SCENARIOS OUT OF THE BOX

4.05x

GREATER PERFORMANCE IN WIDE-TABLE SCENARIOS WITH BITMAP INDEX

3x+

GREATER PERFORMANCE COMPARED TO OTHER LEADING SOLUTIONS

Keep using the tools and languages you love

SQL is the de-facto standard for any analytics application. Any database query engine should natively support SQL.

 

Unlike Druid, in which SQL is an afterthought added onto its native query language, StarRocks natively supports SQL as its sole query language. StarRocks supports industry-standard ANSI SQL syntax so that you are not locked into a proprietary SQL Like language with limited SQL functions.

 

StarRocks is also compatible with MySQL protocol, which means all your existing BI tools and applications can work with StarRocks out of the box by using MySQL drivers.

Why more Druid users are switching to StarRocks

The benefits of StarRocks reach beyond SQL and MySQL compatibility. Here are some additional great reasons to switch:
free

Free yourself from denormalized tables

Join relationships are the foundation of modern analytics, but they also pose a challenge to query performance.
 
Apache Druid has tried to circumvent this challenge by focusing on single-table query performance. Because of this, users have to flatten joined tables into a single table in Apache Druid. This step adds pipeline delay and requires extra resources.
 
StarRocks delivers excellent performance on both single-table queries and joined queries. With StarRocks, users can simplify their data ingestion pipeline, improve data freshness, and cut down on ETL costs.
management

Embrace mutable data

Mutable data is a common byproduct of business activities. It can be caused by glitches in the underlying data pipeline or it can simply be a part of normal business logic.
 
Apache Druid, like most other analytical databases, doesn't support UPDATE and DELETE operations natively. Instead, it provides a MUTATION operation to asynchronously ALTER TABLE.
 
With StarRocks, mutable data is handled natively, and updated analytics results are calculated immediately.
scale

Scale analytics with ease

StarRocks has a Massively Parallel Processing (MPP) architecture. With this architecture, a query request is split into different logical execution units and runs simultaneously on multiple nodes. Each node has its own exclusive resources (CPU, memory) that the MPP architecture can make efficient use of, which enables better horizontal scalability.

In contrast, Druid is built on Scatter-Gather architecture. In this architecture, the Gather component inevitably becomes the bottleneck. That's why Druid struggles with some analytics operations such as high cardinality aggregations and precise count distinct.

StarRocks has also built its native vectorized query engine. The native vectorized engine makes full use of SIMD instructions in CPU to process multiple dataset for every instruction. StarRocks' native vectorized engine improves the overall performance of operators by 3 to 10 times.
operation

Simplify Operations

StarRocks architecture contains a group of Frontend (FE) nodes and Backend (BE) nodes. With no dependencies on any external components, StarRocks is easy to deploy and maintain. Meanwhile, the entire system eliminates single points of failure through replication of meta-data and data.

FE and BE nodes can automatically scale out to support larger data volume or stricter query performance requirements. Data redistribution is handled automatically behind the scene without impacting end users' query experiences.

Druid users would appreciate StarRocks' streamlined architecture since they don't have to manage legacy Hadoop style components such as HDFS, ZooKeeper, etc.

Compare Apache Druid to StarRocks

Designed for the analytics needs of modern enterprises, StarRocks delivers the capabilities and performance. Apache Druid can't say the same.
Comparison

Apache DruidApache Druid

 starrocksStarRocks

Architecture

yesLegacy scatter-gather architecture yesModern MPP architecture

SQL syntax support

yesOnly partial SQL syntax support

yesFull SQL syntax support

High-cardinality aggregation performance

yesPoor high-cardinality aggregation performance

yesGreat performance for high-cardinality dimensions

3rd party dependencies

yes
Zookeeper-based operations

yesNo 3rd party dependencies

Real-time updates

 
no
 No real-time updates
 

yesReal-time updates and deletes

Distributed joins

no
No distributed joins

yesDistributed joins

Data lake query support

no
No data lake query support

yesQuery support for Hive, Hudi, Iceberg, and Delta

Support for federated queries

no
No support for federated queries

yesFederated queries with Hive, MySQL, ES, and JDBC sources