For years, denormalization has been the go-to workaround when query engines couldn’t handle complex joins at scale. Flatten your data, run a nightly pipeline, and pray it doesn’t break. Sound familiar? This approach works — until it doesn’t. As customer-facing analytics becomes more interactive and real-time, the cracks start to show. In this article, we’ll look at why you’d denormalize, what the costs of denormalization are, and, lastly, what options you have for avoiding these costs.
It’s not because they want to — it’s because they have to. Traditional analytical engines struggle with:
To make dashboards responsive, engineers precompute as much as possible.
Flattening data may save the cost of running joins on the fly, but it introduces deep architectural trade-offs that become painful at scale.
In internal BI workflows, these compromises might be tolerable. But when analytics is embedded in the product — visible to customers, partners, or sellers — any delay, mismatch, or failure becomes a user experience issue.
Skipping denormalization sounds appealing — but only works if the engine can handle joins and aggregations reliably at scale. That requires more than just SQL syntax.
Architectural capabilities matter:
With the right foundation, normalized workloads can scale in production. Demandbase shows what that looks like in practice.
Demandbase — a leading B2B go-to-market platform — initially relied on ClickHouse, where limited JOIN performance forced them to denormalize everything. The results:
After switching to CelerData (powered by StarRocks), they enabled runtime JOINs over normalized data, eliminating denormalization in most cases. The impact:
Our latest customer-facing analytics white paper walks through the system design behind scalable customer-facing analytics, including:
Learn how to ditch denormalization. Read the full white paper here.