3 Data Lakehouse Predictions for 2024

Join StarRocks Community on Slack

TABLE OF CONTENTS

Throughout 2023, data lakehouses played a pivotal role in how businesses stored, accessed, and analyzed their vast amounts of information. Yet, as we approach 2024, a shift is on the horizon towards a more capable data lakehouse.

This move towards keeping data and workloads within the data lakehouse and abandoning data in motion suggest opportunities abound for organizations looking for better performance in the new year, but only if they're willing to embrace change. Let's take a look at what's coming in 2024.

Prediction 1: The Diminishing Role of Data Warehouses In Query Acceleration

Expect a shift in the functionality of data warehouses, driven by advancements in query engines. As we look towards 2024, the need for separate data warehouses to accelerate queries is expected to decline significantly. This shift addresses key pain points associated with maintaining a separate data warehouse:

Addressing cost and complexity: Data warehouses are infra-level software that require significant effort for maintenance. Data ingestion places major demands on hardware resources both in terms of computing and storage.
Improving data governance: Replicating data to other systems poses risks to data integrity and governance, requiring efforts to maintain data consistency and reliability between systems.

Prediction 2: On-Demand Pre-Computation On The Rise

2024 will move towards on-demand pre-computation, driven by performance leaps in modern query engines and integrations with new technologies like query rewrite in materialized views.

Traditionally, pre-computation involved extensive upfront planning, and most queries ended up relying on pre-computed tables. This was caused by the rigid nature of existing ETL tools and query engines not being able to handle modern workloads on the fly. This approach often led to labor and resource inefficiencies, as many pre-computed tables never got used.

The move to on-demand pre-computation promises significant labor savings by eliminating extensive planning for pre-computation. It also reduces the need for building and maintaining unnecessary pre-computation pipelines and offers greater querying flexibility.

Prediction 3: Open, Versatile, Broader Application Scope

In 2024, data lakehouse architectures are expected to reach new heights in versatility. This will enable them to support a wider range of dynamic workloads, significantly enhancing its application scope.

The key factor that drives this development is the data lakehouse's nature of openness. Open systems foster innovation and flexibility, allowing integrations with diverse tools and technologies. This adaptability invites a broader community of developers to contribute, accelerating the introduction of features and functionalities. Such an environment not only nurtures technological advancements but also enables data lakehouses to adapt rapidly to evolving business needs.

Databricks' UniForm and Onehouse's OneTable are prime examples. They exemplify the benefits of open architectures that allow for the seamless interpretation of data across different lake formats. Such advancements will pave the way for more flexible and capable data lakehouse systems.

Be Prepared for the Year Ahead

2024 is not just about incremental improvements but a significant shift toward how businesses store, access, and analyze information. It's time to embrace a more capable data lakehouse architecture with a broader application scope, and reduce the need for data movement, enabling a more efficient, 'data-in-place' approach to analytics.

If you'd like to get a head start on things, you can try CelerData Cloud for free right now. It's the best gift you can give your engineers, your analysts, and yourself for this new year. Get started here.

Sida Shen

Sida Shen is a contributor to the StarRocks project and a product manager at CelerData. As an engineer with a background in building machine learning and big data infrastructures, he oversees the company’s market research while working closely with engineers and developers across the analytics industry to tackle challenges related to big data and AI.

copy success

5 Brilliant Lakehouse Architectures from Tencent, WeChat, and More

Your data lakehouse promised flexibility, scalability, and greater cost-effectiveness, but you'd consider yourself lucky if it could de...

Sida Shen

From Hours to Seconds: How Yuno Accelerated Customer-Facing Analytics By Switching Off Snowflake and Athena

<1s latency | 5-second data freshness | 100s of concurrency About Yuno Yuno is a fast-growing fintech company that provides payment ...

Sida Shen

What Is User-Facing Analytics?

The democratization of data analytics continues to rank as a top priority for enterprises, and this in turn has made user-facing analyt...

Sida Shen

3 Data Lakehouse Predictions for 2024

Prediction 1: The Diminishing Role of Data Warehouses In Query Acceleration

Prediction 2: On-Demand Pre-Computation On The Rise

Prediction 3: Open, Versatile, Broader Application Scope

Be Prepared for the Year Ahead

Related Articles

Have questions? Talk to a CelerData expert.