Adaptive Query Execution (AQE)
Join StarRocks Community on Slack
Connect on SlackWhat is Adaptive Query Execution (AQE)?
Adaptive Query Execution (AQE) refers to a dynamic approach in query execution that moves away from the traditional "plan once, execute once" model. Instead, AQE collects additional statistics and runtime data, including information on data distribution, characteristics, and cluster resource usage during query execution. With this information, it continuously re-optimizes the execution plan and adjusts strategies in real-time. AQE offers several key advantages:
- From static to dynamic decisions: Execution plans are adjusted based on real-time information gathered during the query, allowing for more accurate decisions.
- From one-time planning to multiple planning steps: Instead of planning the query execution only once, the plan can be revised multiple times throughout the execution.
- From a single execution strategy to multiple strategies: AQE enables the system to switch between different execution strategies depending on the runtime data characteristics.
- Execution plans are driven by real data: The query execution adapts based on the actual data being processed, with different data sets potentially leading to different execution strategies.
In essence, AQE ensures that the execution plan and strategy are not fixed but instead evolve dynamically to maximize performance as the query runs. This adaptability is particularly beneficial for complex queries or environments with unpredictable data patterns.
Why Do We Need Adaptive Query Execution (AQE)?
Traditional databases typically use SQL as their user interface, which is a declarative language. SQL describes what the user wants to do but not how to do it. Therefore, the database first needs to pass the SQL query through an optimizer, which translates it into an executable physical plan (how), and then the query executor runs the plan.
However, the challenge with this approach is that a single SQL query can have thousands of possible execution plans, and the efficiency of each plan can vary significantly. In some cases, the difference in performance between execution plans can be enormous—by hundreds or even thousands of times. If the optimizer selects a suboptimal plan, no matter how powerful the execution engine is, the query performance will suffer. This is similar to a military strategy: if the overall strategy is flawed, no matter how capable the individual soldiers are, they won’t change the outcome of the battle.
The effectiveness of the optimizer heavily depends on the accuracy of statistics, selectivity estimates, and cardinality estimations. However, there are several harsh realities:
- Inaccurate or unavailable statistics: Often, the database lacks accurate statistics, or the available statistics are outdated or incomplete.
- Unreliable assumptions: When estimating selectivity and cardinality, many assumptions are made about data distribution and characteristics. Most of the time, these assumptions don't hold true in real-world scenarios.
- Changing data characteristics in streaming environments: In streaming scenarios, data is constantly flowing in, and its characteristics can change over time. It’s unrealistic to rely on a single static execution plan for the entire task.
- Exponential growth in query complexity: As queries grow to involve dozens or hundreds of tables, the number of possible execution plans increases exponentially. The query optimizer simply cannot explore all possible plans in a reasonable amount of time.
Given these challenges, the traditional "plan once, execute once" model is not suitable for all query scenarios. This is why we need Adaptive Query Execution (AQE)—an approach that allows the system to dynamically adjust execution plans during query runtime, improving query performance by reacting to real-time data and conditions.
How to Implement Adaptive Query Execution (AQE)
Adaptive Query Execution (AQE) aims to optimize queries dynamically, based on real-time data characteristics and execution feedback. This approach provides better query performance by avoiding the limitations of static, pre-generated plans. Let's explore several strategies for implementing AQE.
Top-Level Decisions: Multi-Step Planning
One of the simplest and most effective ways to achieve AQE is through multi-step planning. Instead of relying on a single execution plan, the system collects statistics at various points during execution (known as materialization points) and re-optimizes the plan accordingly.
- Materialization points refer to moments when the entire result set of a query step is available. In a Stage-by-Stage execution model, the end of each stage is a materialization point. In a Pipeline execution model, the completion of a block operator's execution marks a materialization point.
- A typical AQE strategy in this approach is to adapt the join execution strategy. For example, the query may start with a Shuffle Join, but if runtime statistics show that the right-hand table is small, the system may switch to a Broadcast Join for improved performance.
For a better understanding of joins, check out this video.
Examples of this strategy can be found in Spark, Oracle, and BigQuery.
Top-Level Decisions: Late Binding and Precomputing Multiple Plans
A related strategy is Late Binding, where the system prepares multiple execution plans ahead of time based on possible data conditions. For example, consider a query table_a JOIN table_b ON table_b.column1 = x
. If the statistics for table_b.column1
are missing, the system cannot accurately estimate the number of rows filtered in table_b
. However, if the system knows that table_a
has 10 million rows, it can prepare three different plans based on the possible size of table_b
after filtering.
This precomputation of multiple plans allows the system to select the best execution path once more accurate data becomes available during runtime.
Top-Level Decisions: Feedback Mechanism
Another effective strategy involves using historical query information to correct inaccurate optimizer statistics. Often, multiple consecutive queries are issued on the same columns of a table, allowing the system to gather data such as max
, min
, and cardinality for those columns. With this feedback loop, the optimizer can refine its statistics for future queries, leading to better execution plans.
While widely implemented in industry, this feedback mechanism has some limitations:
- It cannot resolve the lack of statistics for the first query on a column or table.
- It is ineffective in streaming environments where statistics are constantly changing.
Top-Level Decisions: Running Multiple Plans Simultaneously
An approach similar to speculative execution in systems like Spark or MapReduce is to execute multiple plans in parallel. Instead of selecting a single "best" plan at the outset, the system may choose two or three plans to run concurrently. After some execution time, it selects the fastest plan and cancels the others.
However, this method has drawbacks:
- It consumes more resources, as multiple plans are executed at once.
- The system can only execute a limited number of plans simultaneously, which may not be sufficient when dealing with highly complex queries.
Top-Level Decisions: Different Operators for Different Data Groups
Another advanced strategy is to group data based on specific characteristics and apply different execution plans to each group. This approach can integrate machine learning techniques to analyze data features and determine the most suitable plan for each data segment.
In such cases, the execution engine routes data to different operators based on the data's specific characteristics, enabling highly specialized and efficient processing paths for each group.
Mid-Level Decisions: Specialized Decision Operators
One way to enable more precise AQE is to introduce specialized decision operators into the query execution pipeline. A notable example is the Eddy operator, which acts as a decision-making control point in the pipeline.
- The Eddy operator manages multiple operators, adjusting the order in which tuples flow through them dynamically.
- For example, if a query involves a 3-table join, the Eddy operator controls the order in which rows are processed by each join operator. It can adjust dynamically to balance the workload, ensuring efficient use of system resources.
Mid-Level Decisions: Dynamic Plan Switching
Similar to global re-planning at runtime, dynamic plan switching allows the system to switch between different plans locally. For instance, consider a 3-table join followed by a GROUP BY
. The system may change the join order at runtime based on real-time data characteristics, adding correction plans to ensure the correctness of results.
This strategy requires the Pipeline engine to support dynamic topology modification.
Low-Level Decisions: Intra-Operator Adaptation
The finest level of AQE occurs within individual operators themselves. For example, the query optimizer can defer index selection and let the storage layer make adaptive decisions based on runtime data. In systems like StarRocks, the optimizer does not decide on the index to use; instead, the storage layer dynamically chooses the best index (e.g., Bitmap index) based on the query's characteristics.
-
Predicate ordering within an operator is another common adaptation strategy. Similar to the Eddy operator's dynamic predicate ordering, individual operators can reorder predicates dynamically to maximize efficiency.
-
Symmetric Hash Joins are another example. Traditional hash joins use a fixed strategy where the right table builds the hash table, and the left table probes it. In Symmetric Hash Joins, both tables build hash tables, allowing for more flexibility in streaming environments. This method consumes more memory but provides greater adaptability when used with Eddy operators for join reordering.
-
N-ary Symmetric Hash Joins expand on this concept by allowing multiple input streams for a join operator, enabling flexible join order decisions for each row of data. This is especially useful in streaming scenarios.
To get a clearer picture of how query execution works in StarRocks, watch this video.
In summary, AQE involves a range of strategies, from high-level multi-plan execution to fine-grained operator-level adaptations. These methods work together to improve query performance by dynamically responding to runtime conditions and data characteristics, providing a more intelligent and efficient execution model.
Challenges of Adaptive Query Execution (AQE)
-
Handling Stateful Operators: Managing stateful operators during plan adaptation is complex because these operators store data or context. Switching plans can lead to inefficiencies, and re-initializing these operators without losing data requires precision.
-
Cost of Plan Switching: Determining the optimal time to switch execution plans is tricky. Frequent switching can introduce overhead, slowing down execution instead of optimizing it.
-
Ensuring Correctness: One of the biggest challenges is maintaining data integrity during plan transitions. Adaptive execution must ensure that no data is processed twice and that partial results are handled correctly.
-
Lack of Predictability: Since adaptive execution plans are determined at runtime, predicting query performance is difficult. This unpredictability can complicate resource management and performance expectations.
-
Increased Testing Complexity: Testing adaptive execution paths requires more effort than static plans. Every possible adaptation path needs to be tested, increasing the complexity and cost of testing.
-
Avoiding Query Degradation: Adaptive execution can degrade performance for queries that don't need adaptation. Proper thresholds must be set to ensure that only the right queries are re-planned.
Conclusion
Adaptive execution strategies can greatly improve performance by dynamically adjusting to runtime conditions. The finer the granularity at which information is collected and acted upon, the more accurate and timely the adaptations. This means that an adaptive execution strategy can be applied at various levels—from the entire query down to individual data batches or even rows. However, increased granularity comes at the cost of additional overhead in terms of computation and data collection.
The complexity of operators also affects how easily they can be adapted. For instance, operators like Nested Loops Joins are easier to adapt because they can reassess after each iteration, while Hash Joins are more rigid, making real-time adaptation harder. Advanced adaptive strategies often resemble reinforcement learning, where the system continuously learns and adjusts based on feedback from the data and query execution process.
In the future, adaptive execution will likely become a standard feature in high-performing databases. As more databases adopt adaptive strategies, query performance will improve, even when accurate statistics or prior knowledge about the data is lacking. Databases will focus heavily on making real-time decisions about execution plans based on live data and runtime statistics, making them smarter and more efficient.