When it came time to replace our Citus (Postgres) database, here's why we went with StarRocks and how its performance, concurrency, and open source community impressed us.
I think about databases a lot. As the head of business intelligence for a fast-growing midsize company, it's important that I keep up with what's new in the analytics infrastructure space. It's one of my favorite parts of the job too, because it gives me a chance to discover cool new technology that I can share with my friends in the industry and the wider data engineering community at large.
Today I'd like to tell you about StarRocks, an analytical database that really impressed me and my team and a project that more people should be aware of.
Searching for a Better Database
A recent project had me working on a new data platform for our business that would serve internal users across all of our departments. This platform was needed to support a wider set of initiatives rolling out across all of our business units, like marketing and logistics.
We'd been using Citus up until this point, but I was concerned it wouldn't be able to handle the speed and scale of the projects our teams would be working on. We needed a database that could handle replacing and running live queries on the data warehouse itself. We were using Power BI at the time and wanted to avoid having to import data into it since that just wasn't something we could scale.
Speed and concurrency were our biggest priorities, but maintenance was a concern too. My team is responsible for everything related to data, data infrastructure, and reporting, so if we could eliminate the extra maintenance work Citus required, it would help a lot.
Database Solutions We Evaluated
Cost was also a factor (isn't it always?). So right from the start I knew we'd be looking at open source solutions. When I wrote up my initial list, it looked something like this:
- Apache Pinot
- Apache Doris
I cut ClickHouse pretty early on. Its writes were just way too slow for the work we were doing. We also needed to handle updates in the data, but this is another area where ClickHouse is just too difficult to work with.
I looked at Pinot next, but it really wasn't designed for the data warehousing work we were trying to do. That's when I found Doris. Doris turned out to be exactly what we'd been looking for, primarily because of its columnar distributed database that was ideal for our data warehousing application.
The story could have ended here, but as I learned more about Doris, I came across another project that had spawned from it: StarRocks.
Why StarRocks Impressed Us
Doris may have had everything we needed, but StarRocks gave us everything we wanted. It offered the distributed database structure that attracted us to Doris, but it eclipsed Doris (and our other options) in terms of speed, scalability, and community support.
Our initial tests with StarRocks sealed the deal.
First, the setup was easy, which is always nice. StarRocks was ready to test right out of the box, and the documentation and Slack community were great at filling in any remaining gaps. All of our tests were run against real-world queries and reports so we would have an accurate picture of what we could expect from the solution. The results were shocking to say the least.
- Even without optimization, we were seeing a minimum 5x performance improvement over Citus. This was under our typical BI workloads on about 500 million rows of data.
- StarRocks was superior in terms of scalability and cost too. Compared to Citus' Postgres database, StarRocks' column-based database was able to expand without taking up too much storage. Management when scaling was way better too compared to Citus.
- Concurrency was also something I looked at. While Citus' concurrency was sufficient to meet our needs, StarRocks not only matched it, but could deliver the high concurrency we wanted with significantly fewer resources.
- As mentioned above, StarRocks' community really won us over. We were evaluating open source solutions to control costs, but the community an open project comes with is also worth considering. StarRocks' community was great. It's big, active, and global. Even on the weekends, there were people around on Slack who could answer our questions and help us get the most from our evaluation.
All things considered, when accounting for speed, concurrency, scalability, and maintenance, nothing beats StarRocks. Nothing.
Should You Try StarRocks?
As I said, one of the best parts of my job is that I get to test new technology and spread the word when I find something worth sharing. StarRocks is a great example of that.
When it comes to open source databases out there, I can tell you from experience that StarRocks' query performance is among the best I've ever seen. It's absolutely worth a look if you're looking for better query speeds.
But even if speed isn't a top priority, keeping maintenance costs in line almost always is. StarRocks shines here too. This goes beyond just being an open source solution. I've had the pleasure of speaking with the core team behind the project, and they all come from a data engineering background. They know all about the grunt work and waste that comes with scaling platforms and keeping query performance up, and they specifically built StarRocks to eliminate as much of that as possible. It's an actual joy to work with.
So give StarRocks a look. You can thank me later.