Materialized View

 

What is a Materialized View?

A materialized view is a database object that contains the results of a query. It may be a local copy of data located remotely, or a subset of the rows and/or columns of a table, or a summary using an aggregate function. The name materialized views can be broken down into two parts - materialization and views:

Materialization involves caching intermediate results of a query using additional storage space. This process accelerates the computation by reusing precomputed materialized results.

Views emphasize logical semantics, providing a unified schema for the upper layer, shielding complexity, and simplifying permission management.

For example, let's say you run an online store, and you have a database with millions of rows containing sales data. You often need to know the total sales for each product category. Instead of executing a complex query every time, you can create a materialized view that calculates the total sales for each category in advance. When you need the information, the database quickly reads the pre-aggregated data from the materialized view, saving time and resources.

Basic Concepts

  • Base Table: The original table specified when creating a materialized view (e.g., sales data table).

  • Materialized Table: A new table that is created based on the base table (e.g., total sales per product category).

  • Materialized View: The user-visible materialized view, which may include both the materialized table and incremental data.

  • Materialized View Task: A task that defines the processing from the base table to the materialized view table.

Materialized View

Materialized views can serve two distinct scenarios:
  • Query Acceleration: materialized views can speed up queries by using precomputed data instead of scanning the base table and performing calculations on the fly (e.g., calculating total sales per product category).
Materialized View_starrocks_celerdata_1
  • Data Modeling: Materialized views can generate a wide table by joining two or more tables and scheduling periodic updates, allowing direct querying on the generated materialized view (e.g., joining sales data with customer information).
Materialized View_starrocks_celerdata

 

Benefits of Materialized Views

  • Improved query performance: Materialized views simplify the process of optimizing performance in applications by relying on them for the intermediate results you use most frequently (e.g., faster access to total sales per product category).
  • Efficient handling of large datasets: Materialized views speed up queries by shifting the computational work to write time instead of query time (e.g., pre-calculating total sales per product category).
  • Greater consistency: Materialized views promote consistency by embedding business logic within data structures, ensuring its constant reuse (e.g., always calculating sales data in the same way).

 

Understand Materialized Views in StarRocks

StarRocks offers intelligent materialized views to accelerate queries and simplify real-time data pipelines. Compared to non-real-time materialized views that need to be refreshed manually in other databases, StarRocks’ real-time materialized views are automatically updated when changes are made to the base tables.
StarRocks automatically selects suitable materialized views to further accelerate queries during query planning. This allows it to rewrite queries to fetch results from the appropriate materialized views in order to increase the speed of queries.
What's more, with the help of multi-table materialized views, StarRocks significantly simplifies the data pipeline by cutting off complex data processes before the database.
In StarRocks v2.3 and earlier versions, synchronized materialized views, also known as Rollup, were available for single-table optimization. These synchronized materialized views offered higher data freshness and lower refresh costs, making them an attractive option for enhancing query performance. However, they also had several limitations compared to the asynchronous materialized views introduced in v2.4 and later.
Synchronized materialized views in earlier StarRocks versions were constrained by the available aggregation operators, which limited their ability to accelerate or rewrite queries effectively. As a result, users had fewer options for tailoring materialized views to meet their specific analytical needs.
With the introduction of asynchronous materialized views in StarRocks v2.4 and beyond, users now have more flexibility in choosing aggregation functions, filters, and join conditions when creating materialized views. This allows for greater customization and optimization of query patterns, leading to improved performance and efficiency for a broader range of analytical workloads.

Materialized View Use Cases in StarRocks

  • Transparent Acceleration: acceleration scenarios typically involve improving query performance without modifying user SQL (e.g., making sales data queries faster without changing the underlying SQL code).
  • Single-table Acceleration: pre-aggregation, cube optimization, and adjusting sorting columns (e.g., optimizing sales data for faster access to most popular products).
  • Multi-table Acceleration: de-normalization, multi-stream join, and ETL processing (e.g., joining sales data with customer and product tables for more comprehensive analysis).
  • ETL Processing:
    • Data Warehouse Layering: StarRocks supports ETL processing through its materialized views, enhancing data analysis across four data warehouse layers: ODS, DWD, DWS, and ADS (e.g., organizing data from raw inputs to aggregated summaries).
    • Scheduled Tasks: StarRocks schedules tasks related to materialized views through manual, periodic automatic, and stream-triggered scheduling (e.g., refreshing materialized views daily, weekly, or on-demand).
    • Data Lineage: by defining relationships between base tables and materialized views (MV), it's possible to organize multiple tables into a data map. Tasks, organized through the materialized view definitions, form a Directed Acyclic Graph (DAG) that reveals the flow of data updates (e.g., tracking how data moves through various tables and transformations).
    • Data Import: materialized views can encapsulate all data imports by converting all imports from external tables to internal tables. This approach simplifies and streamlines the data import process, making it more efficient and easier to manage (e.g., seamlessly integrating data from external sources).