Adaptive Query Execution (AQE) refers to a dynamic approach in query execution that moves away from the traditional "plan once, execute once" model. Instead, AQE collects additional statistics and runtime data, including information on data distribution, characteristics, and cluster resource usage during query execution. With this information, it continuously re-optimizes the execution plan and adjusts strategies in real-time. AQE offers several key advantages:
In essence, AQE ensures that the execution plan and strategy are not fixed but instead evolve dynamically to maximize performance as the query runs. This adaptability is particularly beneficial for complex queries or environments with unpredictable data patterns.
Traditional databases typically use SQL as their user interface, which is a declarative language. SQL describes what the user wants to do but not how to do it. Therefore, the database first needs to pass the SQL query through an optimizer, which translates it into an executable physical plan (how), and then the query executor runs the plan.
However, the challenge with this approach is that a single SQL query can have thousands of possible execution plans, and the efficiency of each plan can vary significantly. In some cases, the difference in performance between execution plans can be enormous—by hundreds or even thousands of times. If the optimizer selects a suboptimal plan, no matter how powerful the execution engine is, the query performance will suffer. This is similar to a military strategy: if the overall strategy is flawed, no matter how capable the individual soldiers are, they won’t change the outcome of the battle.
The effectiveness of the optimizer heavily depends on the accuracy of statistics, selectivity estimates, and cardinality estimations. However, there are several harsh realities:
Given these challenges, the traditional "plan once, execute once" model is not suitable for all query scenarios. This is why we need Adaptive Query Execution (AQE)—an approach that allows the system to dynamically adjust execution plans during query runtime, improving query performance by reacting to real-time data and conditions.
Adaptive Query Execution (AQE) aims to optimize queries dynamically, based on real-time data characteristics and execution feedback. This approach provides better query performance by avoiding the limitations of static, pre-generated plans. Let's explore several strategies for implementing AQE.
One of the simplest and most effective ways to achieve AQE is through multi-step planning. Instead of relying on a single execution plan, the system collects statistics at various points during execution (known as materialization points) and re-optimizes the plan accordingly.
For a better understanding of joins, check out this video.
Examples of this strategy can be found in Spark, Oracle, and BigQuery.
A related strategy is Late Binding, where the system prepares multiple execution plans ahead of time based on possible data conditions. For example, consider a query table_a JOIN table_b ON table_b.column1 = x
. If the statistics for table_b.column1
are missing, the system cannot accurately estimate the number of rows filtered in table_b
. However, if the system knows that table_a
has 10 million rows, it can prepare three different plans based on the possible size of table_b
after filtering.
This precomputation of multiple plans allows the system to select the best execution path once more accurate data becomes available during runtime.
Another effective strategy involves using historical query information to correct inaccurate optimizer statistics. Often, multiple consecutive queries are issued on the same columns of a table, allowing the system to gather data such as max
, min
, and cardinality for those columns. With this feedback loop, the optimizer can refine its statistics for future queries, leading to better execution plans.
While widely implemented in industry, this feedback mechanism has some limitations:
An approach similar to speculative execution in systems like Spark or MapReduce is to execute multiple plans in parallel. Instead of selecting a single "best" plan at the outset, the system may choose two or three plans to run concurrently. After some execution time, it selects the fastest plan and cancels the others.
However, this method has drawbacks:
Another advanced strategy is to group data based on specific characteristics and apply different execution plans to each group. This approach can integrate machine learning techniques to analyze data features and determine the most suitable plan for each data segment.
In such cases, the execution engine routes data to different operators based on the data's specific characteristics, enabling highly specialized and efficient processing paths for each group.
One way to enable more precise AQE is to introduce specialized decision operators into the query execution pipeline. A notable example is the Eddy operator, which acts as a decision-making control point in the pipeline.
Similar to global re-planning at runtime, dynamic plan switching allows the system to switch between different plans locally. For instance, consider a 3-table join followed by a GROUP BY
. The system may change the join order at runtime based on real-time data characteristics, adding correction plans to ensure the correctness of results.
This strategy requires the Pipeline engine to support dynamic topology modification.
The finest level of AQE occurs within individual operators themselves. For example, the query optimizer can defer index selection and let the storage layer make adaptive decisions based on runtime data. In systems like StarRocks, the optimizer does not decide on the index to use; instead, the storage layer dynamically chooses the best index (e.g., Bitmap index) based on the query's characteristics.
Predicate ordering within an operator is another common adaptation strategy. Similar to the Eddy operator's dynamic predicate ordering, individual operators can reorder predicates dynamically to maximize efficiency.
Symmetric Hash Joins are another example. Traditional hash joins use a fixed strategy where the right table builds the hash table, and the left table probes it. In Symmetric Hash Joins, both tables build hash tables, allowing for more flexibility in streaming environments. This method consumes more memory but provides greater adaptability when used with Eddy operators for join reordering.
N-ary Symmetric Hash Joins expand on this concept by allowing multiple input streams for a join operator, enabling flexible join order decisions for each row of data. This is especially useful in streaming scenarios.
To get a clearer picture of how query execution works in StarRocks, watch this video.
In summary, AQE involves a range of strategies, from high-level multi-plan execution to fine-grained operator-level adaptations. These methods work together to improve query performance by dynamically responding to runtime conditions and data characteristics, providing a more intelligent and efficient execution model.
Handling Stateful Operators: Managing stateful operators during plan adaptation is complex because these operators store data or context. Switching plans can lead to inefficiencies, and re-initializing these operators without losing data requires precision.
Cost of Plan Switching: Determining the optimal time to switch execution plans is tricky. Frequent switching can introduce overhead, slowing down execution instead of optimizing it.
Ensuring Correctness: One of the biggest challenges is maintaining data integrity during plan transitions. Adaptive execution must ensure that no data is processed twice and that partial results are handled correctly.
Lack of Predictability: Since adaptive execution plans are determined at runtime, predicting query performance is difficult. This unpredictability can complicate resource management and performance expectations.
Increased Testing Complexity: Testing adaptive execution paths requires more effort than static plans. Every possible adaptation path needs to be tested, increasing the complexity and cost of testing.
Avoiding Query Degradation: Adaptive execution can degrade performance for queries that don't need adaptation. Proper thresholds must be set to ensure that only the right queries are re-planned.
Adaptive execution strategies can greatly improve performance by dynamically adjusting to runtime conditions. The finer the granularity at which information is collected and acted upon, the more accurate and timely the adaptations. This means that an adaptive execution strategy can be applied at various levels—from the entire query down to individual data batches or even rows. However, increased granularity comes at the cost of additional overhead in terms of computation and data collection.
The complexity of operators also affects how easily they can be adapted. For instance, operators like Nested Loops Joins are easier to adapt because they can reassess after each iteration, while Hash Joins are more rigid, making real-time adaptation harder. Advanced adaptive strategies often resemble reinforcement learning, where the system continuously learns and adjusts based on feedback from the data and query execution process.
In the future, adaptive execution will likely become a standard feature in high-performing databases. As more databases adopt adaptive strategies, query performance will improve, even when accurate statistics or prior knowledge about the data is lacking. Databases will focus heavily on making real-time decisions about execution plans based on live data and runtime statistics, making them smarter and more efficient.