Join StarRocks Community on Slack
Connect on SlackSave your spot nowJanuary always starts with big intentions. End of January is when those intentions meet reality—so we’re here with the practical stuff: what shipped, what worked, and what’s next.
This edition brings together a StarRocks 2025 recap, a draft 2026 roadmap (we’d love your feedback), an Inside StarRocks deep dive on fast joins, webinars on production lessons and agentic analytics, and community write-ups on funnel analytics, CDC orchestration, query tuning, and repeatable deployment—plus a quick tooling spotlight on the StarRocks AI Assistant (@Rocky in Slack and the docs-site assistant) for faster answers when you’re heads-down.
2026 Roadmap & 2025 Recap
👉 StarRocks Roadmap 2026: Help Us Build What’s Next
The draft 2026 Roadmap is out, and we want to pressure-test it with people actually building and running systems in production. We’d love your thoughts on what should come first, what gaps would matter most over the next 6–12 months, and where stability matters more than speed.
Key themes we are exploring:
-
Shared-data native storage: Multi-warehouse, indexing, incremental MVs, and time travel.
-
Deeper Apache Iceberg support: Full DELETE/UPDATE/MERGE support and Iceberg v3.
-
Stronger execution engine: Result caching, optimizer improvements, and better scan parallelism.
-
Better observability: Tools for operating StarRocks at scale.
Join the discussion on GitHub!
👉 StarRocks: 2025 Year in Review
2025 was a milestone year for StarRocks. From major performance breakthroughs and real-time analytics advancements to deeper lakehouse integration and growing production adoption, the project continued to evolve alongside a fast-growing global community. If you’re curious what actually moved (and why), this recap is the fastest way to catch up.
Webinars
On-Demand: StarRocks at Fresha — Carving Streams into Rock
We sat down with Anton Borisov (Principal Data Architect) to walk through Fresha’s real-time analytics platform—one of the first StarRocks production deployments in the UK. He discusses the constraints they faced (data freshness, concurrency, and join-heavy queries) and the trade-offs that mattered most.
👉 Upcoming: How Conductor Builds Sub-Second Agentic Analytics at Scale
Uday Rajanna (Principal Engineer, Data Platform) and Dominik Lange (Director, Data Platform) are joining us on Feb 26 for a behind-the-scenes look at how Conductor powers sub-second agentic analytics at scale.
Conductor is an end-to-end, enterprise AEO platform that combines AEO/SEO intelligence, AI content generation, and real-time website monitoring to help teams grow visibility in AI and traditional search.
They’ll break down how they used CelerData Cloud to deliver sub-second interactive analytics, how they’re enabling agentic workflows with MCP, and what their MCP server + OpenAI app integration looks like in practice.
We’ll cover:
-
Platform architecture + scale (and where other approaches fell short)
-
How they achieved sub-second query performance with CelerData Cloud
-
How they structure reliable, LLM-ready context in production
Latest Reads: How It Works, How It Scales, And Where It’s Going
👉 Inside StarRocks: Why Joins Are Faster Than You’d Expect
Denormalization in OLAP isn’t usually a design preference—it’s a workaround for slow joins. StarRocks approaches the problem from the engine up. In this deep dive, we break down how StarRocks keeps data normalized while still delivering fast joins at query time by using a cost-based optimizer and efficient distributed execution. It details the join planning and reordering strategies behind StarRocks’ join speed, and why they remain effective in production.
👉 Escaping the Small-File Trap: How StarRocks Optimizes Bulk Ingestion
Bulk backfills in shared-data setups can turn into a small-file factory: frequent flushes to object storage create thousands of tiny writes, waste CPU, and hurt query performance later. This post explains how StarRocks (3.5+) minimizes small-file churn and remote write overhead by changing the ingestion pipeline (local spill → centralized merge → object storage), improving throughput and keeping post-load queries predictable.
👉 From Snowflake To StarRocks + Apache Iceberg: How Fanatics Cut 90% Of Analytics Cost At 6PB Scale
Analytics stacks rarely start complicated—they become complicated one “new engine for a new workload” at a time. Fanatics consolidated a fragmented stack into an open lakehouse powered by Iceberg + StarRocks, simplifying architecture while keeping performance fast for self-serve use cases. Highlights include major cost reduction, dramatically less Snowflake usage, and sub-second dashboards.
👉 2026 Is When Open Data, Real-Time Analytics and AI Agents Converge
The industry has been talking about “unified data” for years, but it usually lived on roadmaps rather than in production. 2026 feels different. Sida Shen explains why, pointing to a convergence driven by data engineering agents moving into scoped production work, Apache Iceberg becoming operationally stable, and customer-facing applications making real-time, governed analytics a core product requirement.
From the Community: Real Scale, Real Stories
👉 Northstar ⭐️ (open-source): Optimising StarRocks Queries in Practice: Scans, Joins, and What to Look For
If you’re tuning StarRocks queries, Jesús Gómez-Escalonilla Guijarro ( Fresha ) shares a clean, repeatable approach: start with scans and joins, use plans + profiles to confirm what’s really happening, and iterate from there. Plus a huge shoutout to Jesús and the Fresha Data Engineering team for building Northstar (open-source)—a plan/profile visualizer that makes bottlenecks easier to spot and improvements easier to validate.
👉 PlaySimple Games: Querying Billions of User Events in Seconds: Our Journey with StarRocks
Sukhjot Singh from PlaySimple Games shares how complex joins and deep timeline reconstruction pushed their Trino + Druid stack to the limit. By switching to StarRocks, the team cut hours of manual SQL recomputation and sped up end-to-end funnel workflows by ~30×.
👉 Fresha: The Real-Time Data Journey: Connecting Flink + Airflow + StarRocks — Part 2
Following up on their first post, Nicoleta Lazar shares how the team built and scaled CDC pipelines with Apache Flink, Airflow, and StarRocks—covering what worked, key lessons learned, and the real-world decisions behind streaming data from PostgreSQL → Kafka → StarRocks for real-time analytics.
👉 What Is StarRocks? A Complete Guide Covering All the Key Concepts
Muaaz Muzammil from Devrolls put together a clear, hands-on guide to StarRocks—what it is, how the FE/BE/CN architecture works, and how it fits into modern real-time analytics stacks. He also shares a practical comparison with Apache Iceberg based on his own testing, including where StarRocks internal tables tend to shine for latency-sensitive workloads.
🤖 Tooling Spotlight: Meet @Rocky
Need help finding a doc or debugging an error message? We’ve rolled out the StarRocks AI Assistant.
-
In Slack: Simply tag @Rocky in the #questions-and-troubleshooting channel (or use #ask-ai for longer threads).

-
In Docs: You’ll find the chat icon in the bottom-right corner of the documentation site—give it a try and let us know if it helps you ship faster.

Give it a spin and tell us what you love (or what you’d change)!
January Events Recap: Six Cities, Lots Of Stories
January weather didn’t make it easy, but the CelerData team hit the road—escaping the inbox for a bit to dig into real architectures, real constraints, and what teams are actually shipping in production. We were on the ground at Open Lakehouse + AI in New York City (Jan 20) and Chicago (Jan 22) with The Open Source Analytics Community, where Simo/Chelsea Wang shared a practical playbook for low-latency, high-concurrency analytics on Apache Iceberg. We also made it to Austin for Data Day Texas (Jan 24), and kept the momentum going in Tokyo with Apache Iceberg Meetup Japan #4 (Jan 21) and Open Data Circle — Lakehouse Meetup #2 (Jan 27).

We’ve loved seeing our users and customers take StarRocks beyond their own teams and into local data communities.
At Data Engineers London, Anton Borisov, Nicoleta Lazar, and Emiliano Mancuso from Fresha shared how they deliver exceptional insights from real-time data—walking through different approaches to joining streams with Apache Flink SQL, from traditional stream joins to delta joins, and how those pipelines feed into StarRocks as their next-generation analytical database.
And in Tel Aviv, Yoav Nordmann from Tikal spoke at The Modern Data Stack: Managing Iceberg Lakehouse meetup, sharing how StarRocks enables high-performance analytics on Apache Iceberg within a modern, open data stack.
And Finally…
If you made it this far, you’re officially past the “January planning” phase and into the part of the year where the best wins are the ones that keep paying off.
A quick ask before you go: if you’re running StarRocks (or evaluating it seriously), we’d love your input on the 2026 roadmap—especially the “must-not-break” areas and the gaps that matter most over the next 6–12 months. The best roadmap feedback usually comes from real workloads and real constraints.
And if you’ve got a story, tuning trick, deployment playbook, or “this took us way too long to figure out” lesson, send it our way—we’re always looking to highlight the most useful production learnings from the community.
Here’s to a 2026 with fewer workarounds, faster dashboards, and a lot more time spent building than debugging. 💙
Prefer updates in your feed? Subscribe to our LinkedIn newsletter.
CelerData




