CelerData Blog

February 2026 Highlights: Sub-Second Agentic Analytics, CelerData Cloud & Engine Internals

Written by CelerData | Mar 2, 2026 11:11:14 PM

Our February edition is a mix of production wins, engine internals, and community deep dives—with a look at where things are heading: sub-second analytics serving both end users and AI agents at scale—from the same engine.

This month: one team shares how they rebuilt their data platform on CelerData Cloud to power both interactive dashboards and LLM-driven workloads—including their MCP server + OpenAI integration in production. Others walk through unifying fragmented Trino and ClickHouse stacks, hitting sub-second on Iceberg with materialized views, and building governed analytics with dbt. Plus a deep dive on vectorized execution, a new Snowflake Horizon integration, and community posts on federation and fast joins.

 

Agentic Analytics & AI & Cloud Data Warehouse


👉 On-Demand: How Conductor Builds Sub-Second Agentic Analytics at Scale

Whether you're rethinking your data infrastructure or exploring what it takes to make analytics agent-ready, this session from Dominik Lange and Uday Rajanna at Conductor is worth your time — a genuine thank you to both for the depth and honesty they brought to it.

Dominik covered the full data infrastructure rebuild that got Conductor to sub-second query performance on large datasets with CelerData Cloud. Uday broke down the agentic layer — MCP Server, OpenAI integration, and the split reasoning architecture that keeps LLMs focused on intent while the data API handles the precision work.

And don't miss the live demo of the Claude integration at the end — an autonomous agent running a full AI search audit in real time!

No shortcuts, no hand-waving — just the full picture!


👉 On-Demand: Customer-Facing Analytics — How to Choose a Real-Time Cloud Data Warehouse

If you're building analytics into your product—or evaluating real-time cloud data warehouses for customer-facing workloads—this session breaks down what actually matters: interactive performance, multi-table analysis, data freshness, and operational simplicity. Includes a side-by-side comparison of ClickHouse Cloud and CelerData Cloud with a live demo and a real-world production use case.

 

CelerData Cloud: What Shipped & What It Unlocks

 

👉 Sub-Second Analytics at Scale with Snowflake Horizon Catalog and CelerData Cloud

For teams governing data in Snowflake, CelerData Cloud now integrates with Snowflake Horizon Catalog via the standard Iceberg REST Catalog protocol. You can run sub-second, high-concurrency analytics directly on your Snowflake-governed Iceberg data—no data copying, no separate serving layer. The post walks through the architecture, the multi-layer caching strategy, and what setup looks like in practice.

 

👉 SmartNews: Replacing Trino + ClickHouse with a Single Engine — CelerData Cloud

SmartNews ran ClickHouse for customer-facing advertiser analytics (p95 latency ~100ms, 3TB ingested daily, ~20TB managed) and Trino for ad-hoc queries and ML feature ETL. Maintaining both was costly and complex. After benchmarking against production workloads, they chose CelerData Cloud as a single replacement—achieving 3.6x faster ad-hoc query performance, stable sub-second latency at 800+ QPS, and efficient real-time joins without denormalization. Dennis Zhao walks through the evaluation, the results, and what's next: migrating storage from Hive to Apache Iceberg.

 Want to see CelerData Cloud in action? Try CelerData Cloud free for 30 days—no commitment required. Spin up your own environment and put it to the test against your real workloads!

Latest Reads: How It Works, How It Scales, And Where It's Going

 

👉 Deep Dive: How StarRocks Built a High-Performance Vectorized Engine

Vectorization gets talked about a lot, but the implementation details matter. Kaisen Kang (StarRocks TSC Member, Query Engine & AI Agent Team Lead) walks through how StarRocks uses CPU SIMD instructions to process multiple data elements in parallel, and why true database vectorization goes well beyond enabling a hardware feature. A good read if you want to understand what's actually happening under the hood when queries return in under a second.

 

👉 DataOps-Driven Governance and Analytics with dbt and StarRocks

Jacky Wu (dbt-starrocks contributor, Senior Enterprise Solution Manager at SJM Resorts) lays out how to unify data modeling, automation, and analytics into a single framework using dbt, StarRocks, and DataOps practices. The post covers dbt's role in governance automation, how DataOps improves iteration speed and control, and includes real-world case studies showing how the approach works in production for both real-time and batch scenarios.

 

From the Community: Real Scale, Real Stories

 

👉 Why Coinbase and Pinterest Chose StarRocks: Lakehouse-Native Design and Fast Joins at Terabyte Scale

Simon Späti put together an in-depth look at why StarRocks is gaining traction in the real-time analytics space—with interviews from Eric Sun and Anton Borisov, and production details from Coinbase, Pinterest, and Fresha. Key takeaways: joins are consistently the differentiator (Coinbase's TPC-H 1TB benchmark saw ClickHouse fail 12 of 22 queries), colocated joins are surprisingly simple in concept, Pinterest cut p90 query latency by 50% on 32% of their previous Druid infrastructure, and cold S3 data still returns in 3–5 seconds when Iceberg metadata is well-sorted. A practical, balanced deep dive—including the trade-offs and when other tools might still fit.

 

👉 Naver: Iceberg Low-Latency Queries with Materialized Views

NAVER Corp Commerce, one of South Korea's leading e-commerce platforms, needed sub-second analytics on real-time transactional data—~15 analytical dimensions, ~13 metrics, dynamic and unpredictable query patterns, and 7 weeks of historical comparison. 홍남춘 (Namchun Hong) shares how the team built a low-latency platform using Apache Iceberg for storage, StarRocks external catalog with aggressive metadata caching, and StarRocks Materialized Views for pre-aggregated queries. The results speak for themselves: Trino on Iceberg took ~1 minute; StarRocks MVs returned in under 1 second. 90% of production dashboard queries now return sub-second.

 

👉 Fresha: Jack of All Trades — Query Federation in Modern OLAP Databases

Nicoleta Lazar from Fresha digs into query federation: what it is, why it matters for modern OLAP workloads, and how StarRocks approaches it differently from Trino. The post covers the vectorized execution engine, native connectors, deep Apache Iceberg integration, and real-world challenges like schema evolution, file fragmentation, and object-storage latency. She also walks through Fresha's hot/cold data separation strategy and federating additional sources like Elasticsearch, PostgreSQL, and Apache Paimon into a single analytical layer.

 

👉 Fresha: Optimising StarRocks Queries in Practice: Scans, Joins, and What to Look For

Still getting traction from last month—Jesús Gómez-Escalonilla Guijarro (Fresha) shares a clean, repeatable approach to tuning StarRocks queries: start with scans and joins, use plans + profiles to confirm what's really happening, and iterate from there. Plus a shoutout to the Fresha Data Engineering team for building Northstar (open-source)—a plan/profile visualizer that makes bottlenecks easier to spot and improvements easier to validate.

 

Upcoming Events

 

Open Lakehouse and AI Meetups — Austin (Mar 10) & San Francisco (Mar 12)

Kaisen Kang (Head of Query & Agent Team, CelerData) shares the 10 core engine capabilities needed to power AI data agents in production—with real examples from StarRocks. Plus talks from Altinity, Grafana Labs, Fivetran, and PostHog on Iceberg, data lake visualization, and AI-ready context.

📍 Austin (Mar 10): Register here 📍 San Francisco (Mar 12): Register here

 

Data Streaming World Tour — Seattle AI Day (Mar 17) & Jersey City (Mar 26)

Confluent's Data Streaming World Tour hits two cities. Seattle (Mar 17) is an AI-focused day with sessions on agentic AI use cases, context engineering, MCP, Agent2Agent, streaming agents on Flink, and a hands-on multi-agent workshop. Jersey City (Mar 26) covers production streaming architectures, lakehouse patterns, and how to build AI agents that ingest, process, and act on streaming data in real time. Both are free, with breakfast and lunch included. Space is limited.

📍 Seattle (Mar 17): Register here 📍 Jersey City (Mar 26): Register here

 

Iceberg Summit 2026 — San Francisco (Apr 8–9)

As a Gold Sponsor of the Iceberg Summit this year, we'll be at the Marriott Marquis in San Francisco on April 8–9. If you're building on Apache Iceberg or working in modern data analytics, this is a must-attend event. We'd love to meet you on the expo floor to talk architecture, swap tuning tricks, or just say hi!

👉 Register here

🏆 Rocky Is Out in the Wild

StarRocks Award recipients have been receiving their trophies, and we've already spotted a few on social media. We're here for it. Tag StarRocks or CelerData in your trophy photo — we want to see where Rocky is living now!

And Finally…

If you made it this far, you deserve a trophy as well! 🏆

A quick ask before you go: with so much of the industry shifting toward agentic workflows and real-time context, we want to know what you are building. If you're spinning up a CelerData Cloud trial or pushing our products to their limits with LLMs, let us know what's working, what's surprising you, and where you need more horsepower. The best engine upgrades always come from your toughest production constraints.

And if you've got a story, a tuning trick for feeding context to agents, or a "this took us way too long to figure out" lesson, send it our way. We're always looking to highlight the most useful production learnings from the community.

Here's to a month of sub-second queries, seamless joins, and analytics that actually keep up with your users and your agents. 💙