Query execution is the process by which a database management system (DBMS) processes a SQL query and retrieves the requested data. The DBMS performs several internal operations, including parsing, optimization, and the execution of a physical plan, which is ultimately responsible for accessing, filtering, and returning the correct dataset.
Query execution plays a crucial role in database management as it is responsible for processing SQL queries and fetching data from the database. Efficient query execution ensures that data retrieval is fast and resource usage is optimized, which is especially important for large-scale analytical queries in modern database systems like StarRocks. In database management, query execution involves multiple components that work together to ensure queries are parsed, analyzed, optimized, and executed efficiently.
Query execution is a vital process in database management, responsible for processing SQL queries to retrieve data in an efficient and optimized manner. Modern database systems, such as StarRocks, handle complex queries by breaking down the query execution process into several key components. These components work together to ensure that the query is parsed, analyzed, optimized, and executed in a way that minimizes resource consumption and delivers the results quickly.
The main components involved in query execution are:
To better illustrate the query execution process, we will use StarRocks, an MPP (Massively Parallel Processing) database system, as an example. StarRocks handles complex SQL queries by breaking them down into multiple stages, including query parsing, optimization, scheduling, and execution across distributed compute nodes. Each stage involves key components working together to ensure efficient and optimized execution.
Below is a step-by-step explanation of how StarRocks handles query execution for a SQL query, illustrating its high-performance capabilities.
In StarRocks, the first step in query execution is SQL Parsing. The input to the parsing process is the SQL query string, and the output is an Abstract Syntax Tree (AST). The AST is a hierarchical structure where each node represents a ParseNode. For example, when you submit a query, the system will generate an object known as a QueryStmt, which includes various elements such as SelectList
, FromClause
, WherePredicate
, GroupByClause
, HavingPredicate
, OrderByElement
, and LimitElement
. Each element corresponds closely to the components in the original SQL text.
Currently, StarRocks uses ANTLR4 as its parser generator, with the grammar rules defined in the StarRocks.g4
file.
Once StarRocks has obtained the AST, it proceeds to SQL Analysis, where both syntactic and semantic checks are performed. During this phase, StarRocks performs the following tasks:
GROUPING
are not allowed in the WHERE
clause, and columns using HLL and Bitmap cannot be summed.SUM
function must be numeric, and the second and third parameters for LEAD
and LAG
window functions must be constants.BIGINT
with DECIMAL
, the system automatically casts the BIGINT
type to DECIMAL
for the comparison to be valid.The result of this analysis is a hierarchical structure known as a Relation Tree. For example, a FROM
clause will correspond to a TableRelation
, while a subquery will generate a SubqueryRelation
.
After the SQL analysis, StarRocks converts the Relation Tree into a Logical Plan Tree. Each set of operations in the query corresponds to a logical node within this tree. For example, a query that performs a join and then filters data will have corresponding logical nodes for these operations. The Logical Plan provides an abstract view of the data processing operations but does not yet consider how these operations will be executed physically.
The StarRocks Optimizer takes the Logical Plan Tree as input and produces a Distributed Physical Execution Plan with the lowest possible cost. The importance of the optimizer grows with the complexity of the query, especially when multiple tables are involved, and large datasets need to be processed. The performance of different execution strategies can vary by orders of magnitude.
StarRocks’ optimizer is entirely custom-built, incorporating ideas from the Cascades and ORCA optimization papers. It has been deeply customized for StarRocks’ execution engine and scheduler, resulting in several performance enhancements and innovations. StarRocks fully supports all 99 TPC-DS benchmark queries and implements optimizations such as Common Expression Reuse, Subquery Rewrites, Lateral Joins, CTE Reuse, Join Reordering, Distributed Join Strategy Selection, Global Runtime Filter Pushdown, and Low Cardinality Dictionary Optimization.
Before entering the Cost-Based Optimizer (CBO) phase, StarRocks applies a series of rewrite rules to the logical plan to produce a more efficient logical structure. Key rewrite rules include:
LIMIT
clauses to reduce the number of rows processed.UNION
, INTERSECT
, and EXCEPT
pruning: Removing unnecessary set operations.COUNT(DISTINCT)
.After the logical plan rewrite, StarRocks applies the Cost-Based Optimizer (CBO), which is based on the Columbia research paper. The CBO performs further optimizations such as:
COUNT
, SUM
, MAX
, and MIN
, StarRocks can split them into two phases. For COUNT(DISTINCT)
queries, it can be split into three or four phases.During the CBO phase, the logical plan is transformed into a Memo data structure, which records both logical and physical execution plans. The Memo serves as the search space for the optimizer to evaluate multiple strategies, expanding this space with various transformation rules. Finally, StarRocks selects the execution plan with the lowest estimated cost based on statistics and cost models.
The accuracy of the Cost-Based Optimizer largely depends on accurate cost estimation, which, in turn, relies on timely and accurate statistics collection. StarRocks supports both table-level and column-level statistics and provides options for either automatic or manual collection. Users can choose between full and sampling-based statistics collection methods.
With these statistics, StarRocks estimates the cost of a query by factoring in CPU usage, memory requirements, network bandwidth, and disk I/O. Different operators in the query have different cost formulas, which are adjusted based on the query’s specific characteristics.
When query performance issues arise—such as inefficient join orders or incorrect join distribution strategies—users can refer to StarRocks’ CBO documentation to guide them through collecting and using statistics to improve query performance.
The Plan Fragment Generator takes the optimized physical plan and breaks it down into Plan Fragments. These fragments represent smaller, self-contained units of the plan that can be executed independently and in parallel across different compute nodes in the cluster.
In StarRocks, the system is divided into two main components: Frontend (FE) and Backend (BE). The FE is responsible for query parsing, optimization, task scheduling, and metadata management. It receives SQL queries, generates optimized execution plans, and distributes query fragments to the BE nodes. The BE nodes are responsible for executing these fragments, handling data processing tasks such as scanning, filtering, joining, and aggregating data in a distributed and parallelized manner.
Once the distributed plan has been generated, the Frontend (FE) is responsible for scheduling the execution. This involves generating execution instances for the plan fragments, assigning fragments to nodes, managing the execution state on each BE, and receiving the final query results.
Some of the key challenges StarRocks must address during this stage include:
StarRocks first confirms where the Scan Operator will execute by assigning it to specific BE nodes based on the tablet list it needs to scan. For each tablet, StarRocks selects a healthy and available replica that matches the desired version. When multiple replicas are available, StarRocks uses a randomized selection process but strives to balance requests across BE nodes. For example, if there are 10 BE nodes and 10 tablets, each BE is expected to scan one tablet.
Once the Scan fragments are assigned to BE nodes, other Plan Fragment instances are also scheduled on the same or other randomly selected BE nodes, depending on parameters set by the user.
After the FE determines which Plan Fragments will run on which BE nodes, it sends execution parameters to the BE using Thrift.
Currently, the FE schedules multiple Plan Fragments all at once, traversing the Plan Fragment tree from top to bottom and sending each Plan Fragment’s execution information to the corresponding BE nodes.
StarRocks employs a Massively Parallel Processing (MPP) architecture to utilize multiple machines in parallel for maximum query performance. Additionally, it uses Pipeline Execution for efficient use of multicore processors and vectorized execution to fully leverage individual CPU cores.
MPP, or Massively Parallel Processing, is a technique where the query plan is divided into many smaller instances that can be executed on individual nodes. None of the nodes share CPU, memory, or disk resources, and the performance of MPP queries can scale linearly as the cluster expands.
As shown in the diagram, StarRocks logically splits a query into multiple Query Fragments, each of which can have one or more execution instances. These fragments are dispatched to different BE nodes for parallel execution. Each fragment may include multiple operators, such as Scan, Filter, and Aggregate. Fragments are executed with varying degrees of parallelism, depending on the query complexity and available resources.
multiple fragments in StarRocks are executed in parallel in memory through a Pipeline mechanism, rather than being processed stage by stage like in batch processing engines. The Shuffle operation, which redistributes data across nodes, is crucial for improving query performance as the MPP (Massively Parallel Processing) database scales horizontally with the cluster. This operation is also key to efficiently handling high-cardinality aggregations and large table joins.
StarRocks introduces the concept of Pipeline Execution between fragments and operators. Within a pipeline, data is processed continuously without the need for materialization at each step. However, if an operator requires materialization (e.g., an AGG
, SORT
, or JOIN
operator), a new pipeline is created. Therefore, a single fragment may correspond to multiple pipelines.
Each pipeline consists of several operators. The first operator is typically a Source Operator (e.g., a Scan or Exchange node), which generates the data. The last operator is a Sink Operator, which materializes or consumes the data. Operators between the Source and Sink nodes are responsible for transforming the data.
The core of StarRocks' pipeline parallel execution framework is based on user-space coroutine scheduling, which minimizes the overhead associated with creating, destroying, and switching threads in the operating system. In this framework, StarRocks runs a number of execution threads equivalent to the number of CPU cores, with each thread retrieving ready pipelines from a multi-level feedback queue. Additionally, a global Poller thread constantly checks blocked pipelines to see if they can be moved back to the ready queue for execution once the blockage is resolved.
As the performance bottleneck in modern databases shifts from I/O to CPU, StarRocks has implemented a fully vectorized execution engine to take full advantage of modern CPU architectures. Vectorization in StarRocks refers to executing operators and expressions in batches of columns rather than row-by-row. This method significantly improves performance by minimizing function calls, reducing branch mispredictions, and improving CPU cache utilization.
By processing columns in batches, StarRocks achieves greater efficiency, particularly in CPU-bound workloads. It also enables the use of SIMD (Single Instruction, Multiple Data) instructions, allowing a single CPU instruction to operate on multiple data points simultaneously.
Vectorized execution is more than just processing operators and expressions in a columnar fashion—it represents a comprehensive performance optimization effort that includes redesigning data structures and algorithms, optimizing memory management, leveraging SIMD instructions, and refining CPU cache usage. As a result, StarRocks has achieved performance improvements of 5 to 10 times compared to the previous row-based execution model.
Query execution is a fundamental process in database management, responsible for translating SQL queries into actionable steps that retrieve and process data efficiently. In modern systems like StarRocks, this process is highly optimized and involves several key stages: parsing, analysis, optimization, and execution. StarRocks exemplifies how a database system can handle complex, large-scale queries through its use of advanced techniques such as Cost-Based Optimization (CBO), distributed execution, pipeline parallelism, and vectorized execution. These methods work together to ensure that queries are executed with minimal resource consumption and maximum performance, particularly in multi-node, high-performance environments. By breaking queries into smaller, parallelizable tasks, leveraging modern CPU capabilities, and efficiently managing resources across nodes, StarRocks demonstrates the importance of well-executed query processing in enabling fast, scalable, and accurate data retrieval.