Petabyte-Scale Data | Blazing Fast Ingestion | Open and Resilient
Demandbase is the leading account-based go-to-market platform for B2B enterprises. The company helps businesses identify and target the right customers at the right time with the right message through unified intent data, AI-powered insights, and prescriptive actions.
Thousands of companies rely on Demandbase to maximize revenue and consolidate their data and tech stacks into one platform. The platform combines and processes marketing data from numerous sources, handling both bulk imports and streaming events. Core to the product is processing data at scale and delivering flexible, fast reporting and insights. To support this, Demandbase needed a high-performance data infrastructure that could handle growing and complex workloads while maintaining near real-time responsiveness.
Demandbase faced a critical architectural decision as their data volumes scaled into the petabyte range. The company needed a solution that could handle massive data growth while supporting both batch and streaming workloads without compromising query performance.
The platform required several capabilities that traditional data warehouses struggled to deliver together.
The company evaluated multiple architectures but struggled to find a solution that could handle their unique requirements. They needed to support concurrent writes from multiple systems, separate storage from compute for cost efficiency and flexibility, and maintain strong separation between batch processing and real-time analytics. Traditional data warehouses either couldn't scale cost-effectively or required complex workarounds to support their streaming and batch workflows together.
After evaluating their architecture needs, Demandbase built a data lakehouse combining Apache Iceberg as the storage layer with StarRocks as the analytics engine.
Apache Iceberg provided the storage foundation:
StarRocks served as the low-latency query engine:
The architecture connects the two through batch and streaming patterns.
For batch operations, StarRocks' INSERT OVERWRITE supports SELECT statements that allow the Demandbase team to query Iceberg directly. This allows partition-level updates from Iceberg on demand. StarRocks' built-in task queue manages these operations with configurable parallelization.
On the streaming side, Spark jobs read from and write to Kafka for StarRocks consumption. With StarRocks’ routine loads, users can pull from Kafka topics and update designated tables on the fly while maintaining state for tracking and supporting full-table, partial, and conditional update patterns. The data then can be inserted into primary key tables to enable create, update, and delete operations.
For Demandbase, a critical use case was net-new records in their data pipeline. Now, using partial routine loads, StarRocks streams create and delete changes for columns filled on creation, providing immediate visibility. After nightly Iceberg enrichment completes, StarRocks loads the full enriched data via INSERT INTO, acting as upserts for existing records.
Architecturally, Iceberg can now easily act as the source of truth, enabling enterprise disaster recovery strategies. Teams can now stand up new StarRocks clusters with region failover and load directly from Iceberg. For upgrades, they employ blue-green deployment by standing up a second cluster with the newer version, loading it from Iceberg, mirroring production traffic, then starting routine loads with different consumer groups. Traffic then migrates via Route 53 in a timely fashion.
And to top it off, Demandbase built a lightweight load service on top of StarRocks to streamline user requests and resource delegation. Internal teams submit requests specifying tables and partitions, and the service handles task submission, completion tracking, and retries using StarRocks' task management features. A similar service manages routine loads, automatically handling schema changes by stopping loads, validating upstream data, and creating updated routine loads.
The StarRocks and Iceberg architecture delivered significant operational and cost benefits:
The architecture proved particularly valuable for concurrent access patterns. Multiple systems write to Iceberg simultaneously while StarRocks queries the same data for analytics. Partitioned workloads during enrichment avoid conflicts, and teams report minimal issues with concurrent writes to the same Iceberg tables.
Performance comparisons during proof-of-concept testing showed significantly better query latency from StarRocks native tables compared to querying Iceberg directly via external catalogs. This justified the architecture of loading frequently accessed data into StarRocks while keeping rarely used tables as external Iceberg references.
Demandbase continues expanding their StarRocks and Iceberg integration:
Want the story straight from the source? In this StarRocks Summit 2025 session, Connor Clark, Staff Software Engineer at Demandbase, breaks down how his team runs Iceberg and StarRocks together as a unified analytics platform. From nightly INSERT OVERWRITE batch loads with StarRocks Submit Task to Spark-driven CDC streams and a dual-loading strategy across Iceberg and StarRocks, he shows exactly how they keep customer-facing analytics both fast and reliable. Watch now!