Throughout 2023, data lakehouses played a pivotal role in how businesses stored, accessed, and analyzed their vast amounts of information. Yet, as we approach 2024, a shift is on the horizon towards a more capable data lakehouse.

This move towards keeping data and workloads within the data lakehouse and abandoning data in motion suggest opportunities abound for organizations looking for better performance in the new year, but only if they're willing to embrace change. Let's take a look at what's coming in 2024.


Prediction 1: The Diminishing Role of Data Warehouses In Query Acceleration

Expect a shift in the functionality of data warehouses, driven by advancements in query engines. As we look towards 2024, the need for separate data warehouses to accelerate queries is expected to decline significantly. This shift addresses key pain points associated with maintaining a separate data warehouse:

  • Addressing cost and complexity: Data warehouses are infra-level software that require significant effort for maintenance. Data ingestion places major demands on hardware resources both in terms of computing and storage.

  • Improving data governance: Replicating data to other systems poses risks to data integrity and governance, requiring efforts to maintain data consistency and reliability between systems.


Prediction 2: On-Demand Pre-Computation On The Rise

2024 will move towards on-demand pre-computation, driven by performance leaps in modern query engines and integrations with new technologies like query rewrite in materialized views.

Traditionally, pre-computation involved extensive upfront planning, and most queries ended up relying on pre-computed tables. This was caused by the rigid nature of existing ETL tools and query engines not being able to handle modern workloads on the fly. This approach often led to labor and resource inefficiencies, as many pre-computed tables never got used.

The move to on-demand pre-computation promises significant labor savings by eliminating extensive planning for pre-computation. It also reduces the need for building and maintaining unnecessary pre-computation pipelines and offers greater querying flexibility.


Prediction 3: Open, Versatile, Broader Application Scope

In 2024, data lakehouse architectures are expected to reach new heights in versatility. This will enable them to support a wider range of dynamic workloads, significantly enhancing its application scope.

The key factor that drives this development is the data lakehouse's nature of openness. Open systems foster innovation and flexibility, allowing integrations with diverse tools and technologies. This adaptability invites a broader community of developers to contribute, accelerating the introduction of features and functionalities. Such an environment not only nurtures technological advancements but also enables data lakehouses to adapt rapidly to evolving business needs.

Databricks' UniForm and Onehouse's OneTable are prime examples. They exemplify the benefits of open architectures that allow for the seamless interpretation of data across different lake formats. Such advancements will pave the way for more flexible and capable data lakehouse systems.


Be Prepared for the Year Ahead

2024 is not just about incremental improvements but a significant shift toward how businesses store, access, and analyze information. It's time to embrace a more capable data lakehouse architecture with a broader application scope, and reduce the need for data movement, enabling a more efficient, 'data-in-place' approach to analytics.

If you'd like to get a head start on things, you can try CelerData Cloud for free right now. It's the best gift you can give your engineers, your analysts, and yourself for this new year. Get started here.

Join StarRocks Community on Slack

Connect on Slack
copy success