Rethinking Amazon Redshift: When to Move On and What Alternatives Solve the Pain Points

Join StarRocks Community on Slack

Connect on Slack

TABLE OF CONTENTS

See All Glossary Items

What is ANSI SQL and why it matters

What Really Happens When You Run a SQL Query? A Look into Query Execution

MonetDB

Apache Parquet vs. Apache Iceberg: Understanding Their Roles in Data Processing

DuckDB

Publish date: Jul 30, 2024 11:59:38 AM

What is Amazon Redshift?

Definition and Overview

Amazon Redshift is a fully managed, petabyte-scale data warehouse service offered by Amazon Web Services (AWS). It enables organizations to efficiently store and analyze large volumes of structured and semi-structured data using standard SQL. Redshift is designed for complex analytical workloads, offering high performance through features like columnar storage, parallel processing, and integration with various AWS services.

Key Features

Amazon Redshift offers several key features that enhance its functionality and performance:

Columnar Storage

Redshift stores data in a columnar format, which is optimal for analytical queries that typically access a subset of columns. This storage model reduces I/O and enhances query performance.

Massively Parallel Processing (MPP)

Redshift's MPP architecture allows it to distribute and process data across multiple nodes simultaneously, significantly speeding up query execution.

Concurrency Scaling

To handle unpredictable workloads and spikes in query demand, Redshift can automatically add transient clusters to process queries, ensuring consistent performance.

Automatic Workload Management (WLM)

Redshift's automatic WLM dynamically manages query concurrency and resource allocation based on workload characteristics, optimizing throughput and performance.

Machine Learning Integration

Redshift integrates with Amazon SageMaker, allowing users to create, train, and deploy machine learning models directly within the Redshift environment using SQL commands.

How Amazon Redshift Works

architecture source

Architecture

Amazon Redshift follows a distributed architecture designed for scalability and parallelism. It is composed of two main types of components: a Leader Node and one or more Compute Nodes.

Leader Node:
The leader node acts as the orchestrator. It accepts incoming SQL client connections (JDBC, ODBC, Redshift-specific drivers), parses the queries, generates an optimized query execution plan, and dispatches work to the compute nodes. It doesn’t perform heavy data processing itself but manages metadata and coordination.
Compute Nodes:
These nodes are responsible for the heavy lifting — storing the actual data and executing query operations (scans, joins, aggregations). Each compute node runs multiple slices, where each slice handles a portion of the data. This parallelism ensures that large datasets can be processed efficiently.
Communication Between Nodes:
Compute nodes talk directly to each other when necessary (e.g., for performing joins across nodes), and results are sent back to the leader node, which aggregates them and returns the final result to the client.
Storage-Compute Decoupling (in RA3 instances):
In newer Redshift RA3 instance types, storage and compute are separated: compute nodes access data in Amazon Redshift Managed Storage (backed by S3). This separation allows scaling compute capacity independently of storage size.

Example:
If you're running a dashboard that aggregates sales from millions of rows, the leader node translates your SQL query, the compute nodes each process their portion of the data simultaneously, and the leader node stitches the results together for you.

Data Storage and Management

Redshift uses columnar storage to store data, which differs from traditional row-based storage:

Columnar Format:
Data is stored column-by-column rather than row-by-row. This dramatically reduces I/O because analytical queries often need only a few columns.
Example: A report pulling sales_amount and sale_date doesn’t need to scan customer addresses or product descriptions.
Data Distribution Styles:
Redshift uses different strategies to distribute data across compute node slices:
- KEY Distribution: Rows are distributed based on the value of a specific column (e.g., customer ID), useful for collocating joins.
- EVEN Distribution: Rows are distributed round-robin, ideal when no clear distribution key exists.
- ALL Distribution: A full copy of a small table is replicated to all nodes, reducing join overhead.
Compression and Encoding:
Redshift automatically applies compression (encoding) schemes to optimize storage and speed up scans, choosing algorithms based on sample data (though you can manually fine-tune it if needed).
Semi-Structured Data Support:
Using Redshift’s SUPER data type and PARTITION BY clauses, you can store and query semi-structured data like JSON or Avro without flattening it first — useful for modern workloads like IoT logs or application telemetry.

Query Processing

When a query is issued:

The leader node parses and optimizes the SQL query.
It generates a query execution plan (which includes steps like table scans, aggregations, joins).
The query is broken into segments and distributed to the compute node slices.
Each compute slice operates on its local data, performing operations in parallel.
Intermediate results are shuffled between compute nodes if needed (for joins, aggregations).
The leader node assembles the final result set and returns it to the client.

Performance Enhancements:

Vectorized Execution: Processes blocks of values at a time (SIMD-like behavior), dramatically speeding up CPU-bound operations.
Short Query Acceleration (SQA): Detects simple, quick-running queries and prioritizes them on dedicated slots for faster turnaround.
Materialized Views: Supports automatic refresh of precomputed results, improving response time for repeated complex queries.

Example Use Case:
A product team might use Redshift to rapidly analyze A/B test results. With millions of events, Redshift's parallel query execution lets them run aggregations (e.g., average session time by variant) in seconds rather than hours.

Benefits of Using Amazon Redshift

Scalability

Amazon Redshift is designed to scale from small GB-size workloads up to 16 PB (petabytes) or more:

Vertical Scaling: Resize your cluster to a larger instance type if you need more power.
Horizontal Scaling: Add more compute nodes to a cluster.
Elastic Resize: Quickly add/remove nodes with minimal disruption.
Concurrency Scaling: Launches additional transient clusters to absorb bursts of read traffic — Redshift automatically routes queries to these without manual tuning.

Example:
An e-commerce platform can scale up compute power during Black Friday to handle massive query volumes and scale it back down afterward to save costs.

Performance

Performance improvements are baked into Redshift’s core:

Massively Parallel Processing (MPP): All nodes and slices work in parallel to execute queries.
Advanced Query Optimizer: Redshift applies cost-based optimization, including join order optimizations and distribution strategy adjustments.
Data Skipping and Zone Maps: Metadata about data ranges (e.g., min/max values) is stored to skip blocks irrelevant to a query.
Materialized Views and Result Caching: Reduce recomputation on repeated queries.

Example:
A marketing analytics team can quickly slice user behavior data by geography and campaign without experiencing slowdowns, even as datasets grow year over year.

Cost-Effectiveness

Redshift provides several ways to manage and reduce costs:

Pay-as-You-Go (On-Demand Pricing): No long-term commitments, billed per hour per node.
Reserved Instances: Significant discounts (up to 75%) for committing to 1- or 3-year terms.
Redshift Serverless: Launch a data warehouse without managing clusters, paying only for usage.
RA3 Storage Optimization: In RA3 nodes, storage scales separately — you don’t have to pay for compute you don’t need just to get more storage.
Data Sharing: Share live data across Redshift clusters without copying it, reducing duplication costs.

Example:
A SaaS company running customer-facing analytics might choose Redshift Serverless for development environments to save money and use provisioned RA3 clusters for production systems requiring predictable, high performance.

Redshift Challenges: General Overview and Eightfold.ai’s Specific Case

Amazon Redshift has been a foundational data warehouse for thousands of organizations looking to centralize and analyze their structured data. For classic internal business intelligence (BI) reporting—where dashboards refresh every few hours and query concurrency is limited—Redshift works well.

But as companies push into real-time, customer-facing analytics, agent-driven systems, and large-scale AI applications, several architectural limitations in Redshift start to surface.

General Challenges with Amazon Redshift:

Single Leader Node Bottleneck
Redshift’s architecture revolves around a single leader node responsible for query coordination. No matter how many compute nodes you add, all queries must pass through the leader node, which becomes a hard limit on concurrency and throughput. High query-per-second (QPS) demands create queuing and rising latencies.
Limited and Expensive Concurrency Scaling
Redshift offers “concurrency scaling” to handle query bursts by spinning up extra clusters. However:
- New clusters take time to spin up, introducing latency.
- There’s a cap (usually about 10–11 extra clusters).
- Extra clusters mean extra cost, and managing them adds operational complexity.
Serverless Trade-offs for High-Throughput Workloads
While Redshift Serverless removes infrastructure management, it struggles to deliver consistent low-latency performance at high QPS:
- Caching behavior is unpredictable.
- Queries often need to fetch data from S3, which has much higher I/O latency than local storage.
- Cold starts and scaling lag can introduce sudden spikes in query response times.
Rigid Partitioning and Noisy Neighbor Problems
Data in Redshift is distributed across a fixed number of “slices” in the cluster.
- Heavy tenants (like a major customer) can monopolize the slices they’re mapped to.
- This leads to “noisy neighbor” problems, where smaller tenants experience degraded performance through no fault of their own.
- Fine-grained tenant-aware partitioning is not possible within Redshift’s model.
Operational Maintenance Burden
Running Redshift well means constant vigilance:
- Managing WLM queues (Workload Management).
- Performing VACUUM operations to reclaim space and optimize storage.
- Redistributing slices after schema changes.
  For teams wanting real-time, low-latency analytics, this ongoing tuning becomes a heavy tax.
Difficulties Supporting Real-Time Customer-Facing Analytics
Redshift’s original design is optimized for internal analytics, not for serving sub-second queries to external users embedded inside products. When product teams need LinkedIn-like live engagement metrics or AI agents querying analytics APIs every few seconds, Redshift begins to crack under the pressure.

Eightfold.ai's Redshift Journey: A Specific Case Study

To make this concrete, let's walk through the experience of Eightfold.ai, a leading talent intelligence platform.

About Eightfold.ai:

Eightfold is a leading AI-driven solution provider specializing in talent acquisition, talent management, and workforce planning. Their platform helps organizations — including 200 of the Fortune 500 companies — match candidates to jobs, manage internal mobility, and analyze workforce dynamics using deep learning models.

As part of their innovation, Eightfold wanted to embed real-time, customer-facing analytics directly into their application. Think dashboards showing live candidate engagement, recruiter activity, or hiring funnel health—similar to the real-time engagement metrics you see in LinkedIn.

When they attempted to build this on Redshift, they encountered all the general challenges mentioned earlier, magnified by their scale and goals.

Specific Problems at Eightfold:

Single Leader Node Bottleneck
Even as they added more compute nodes to the cluster, concurrency did not scale. With customers interacting live with analytics features, the leader node became overwhelmed, leading to query queueing and high variability in response times.
Expensive and Limited Concurrency Scaling
Redshift’s concurrency scaling was inadequate.
- Spinning up new clusters during peak loads took time—defeating the purpose of "real-time" responsiveness.
- The limit of 10–11 extra clusters wasn’t enough to support their concurrency demands.
- The cost of maintaining enough concurrency scaling clusters around-the-clock was prohibitive.
Serverless Redshift Couldn’t Meet Low-Latency SLAs
When testing Redshift Serverless, Eightfold found that performance degraded unpredictably under load:
- Data wasn’t always cached locally on compute nodes.
- Cold-start and IO penalties caused query latencies to spike, especially under sudden user surges.
Tenant Isolation Challenges
Eightfold’s customer base ranges from large enterprises (e.g., major banks, global tech firms) to smaller companies.
Heavy-usage tenants monopolized Redshift slices, creating noisy neighbor effects:
- Smaller customers' analytics queries slowed down dramatically.
- Slice resource contention was hard to predict or mitigate.
Operational Headaches
Managing workload queues (WLM), rebalancing slices, monitoring vacuum jobs, tuning for concurrency—all became daily operational burdens.
This manual overhead conflicted with Eightfold's need to move fast on delivering new AI-driven analytics features.
Real-Time Analytics Was Out of Reach
Redshift performed well for internal reporting (like quarterly hiring reports), but fell short for embedding live product analytics into Eightfold’s platform.
Customer-facing dashboards demanded sub-second query latencies at high concurrency—something Redshift couldn’t reliably deliver.

How StarRocks Solved Eightfold’s Redshift Pain Points

At Eightfold, the search for a new analytics engine was driven by very specific pain points they encountered while using Amazon Redshift for customer-facing analytics. Let’s walk through how StarRocks addressed each of these core challenges:

1. Eliminating the Single Leader Node Bottleneck

In Redshift, all query coordination runs through a single leader node, creating a natural ceiling for concurrency and throughput. Even if compute nodes were scaled horizontally, the leader node remained a bottleneck, limiting the number of simultaneous queries that could be processed efficiently.

StarRocks solved this by adopting a different architecture:

Instead of a single coordinator, StarRocks separates responsibilities across multiple Front-End (FE) nodes that can be scaled horizontally.
Queries are distributed and planned across multiple FEs and Back-End (BE) nodes in parallel, removing the single choke point.
This design allows StarRocks to scale both query planning and execution capacity linearly as needed.

Impact for Eightfold: It unlocked the ability to serve high volumes of low-latency, concurrent customer queries without queuing delays.

2. True Elastic Scaling without Concurrency Penalties

While Redshift offers “Concurrency Scaling,” it involves launching additional short-lived clusters, which comes with cold start delays, operational complexity, and strict upper limits (only about 10–11 extra clusters).

StarRocks approached concurrency differently:

Compute nodes (BEs) can be added dynamically without spinning up isolated clusters.
There’s no concurrency ceiling tied to temporary cluster limits.
Scaling compute means scaling QPS (queries per second) capability directly.

Impact for Eightfold: It provided a much smoother and truly elastic concurrency model, with no manual orchestration or startup latency to manage.

3. Solving the I/O Bottlenecks of Serverless Architectures

Redshift Serverless aims to simplify operations, but it introduces another problem:

Data is not consistently cached locally.
Queries often have to fetch data from Amazon S3, introducing significant I/O overhead, especially during traffic bursts or cold starts.

StarRocks designed for locality and high cache efficiency:

Even when using S3 as the durable storage layer, StarRocks aggressively caches "hot" data onto local disk volumes (EBS or equivalent).
Queries are served from local caches whenever possible, minimizing expensive object storage reads.

Impact for Eightfold: Queries remained consistently fast even under bursty, unpredictable workloads, without suffering from the cold-start latency typical of serverless Redshift.

4. Smarter Partitioning to Avoid Noisy Neighbor Problems

In Redshift, data is distributed across a fixed number of slices.

Heavy tenants (large customers) can dominate slices, leaving smaller tenants competing for scarce resources.
Fine-grained, tenant-aware isolation wasn’t practical because slices are a fixed cluster-level resource.

StarRocks introduced fine-grained, tenant-friendly partitioning:

Data is partitioned at the tablet level (similar to logical shards).
Tablets can be distributed flexibly across multiple BE nodes.
This means large tenants don't monopolize physical resources, and small tenants are no longer unfairly throttled.

Impact for Eightfold: They could finally achieve tenant-level resource balancing and high QPS without suffering from "noisy neighbor" slowdowns.

5. Reducing Operational Burden with a Simpler Stack

Managing Redshift at scale also involved significant operational overhead:

Manual tuning of WLM (Workload Management).
Regular vacuuming to manage deleted data and storage bloat.
Careful planning to redistribute slices during cluster expansions.

StarRocks simplified operations in multiple ways:

No manual WLM configuration required.
Background compaction and storage maintenance are automatic.
Adding or removing nodes requires minimal rebalancing work.

Impact for Eightfold: The team could focus on building new analytics features rather than constantly tuning and maintaining the database layer.

6. Enabling Truly Real-Time, Customer-Facing Analytics

Finally, perhaps the most critical gap:
Redshift was built for internal BI-style reporting, not real-time interactive analytics embedded inside customer-facing applications. Query latencies, concurrency ceilings, and cache misses made true "in-product" analytics impractical.

StarRocks, by contrast, was designed to:

Serve sub-second queries at high concurrency.
Perform large-scale joins across fact and dimension tables efficiently (crucial for Eightfold’s star schema models).
Handle both real-time data ingestion and live query traffic simultaneously.

Impact for Eightfold: They could power live dashboards and metrics directly inside their product — giving customers instant visibility into hiring pipelines, talent analytics, and workforce trends, without needing to pre-compute or batch-refresh dashboards.

Eightfold’s Results After Adopting StarRocks

Switching to StarRocks brought immediate and measurable improvements:

2x Query Performance: Even in the worst-case benchmarks, StarRocks delivered at least twice the query performance compared to Redshift — and in many real-world scenarios, the gains were even greater.
2x Cost Reduction: By eliminating the need for expensive concurrency scaling clusters and minimizing operational overhead, Eightfold achieved approximately a 50% reduction in overall infrastructure cost.
Sub-Second Latency at High Concurrency: StarRocks enabled live dashboards and customer-facing analytics features that consistently met sub-second query SLAs, even during traffic surges.
Operational Simplicity: Without the need for constant WLM tuning, manual vacuuming, and slice redistribution, the Eightfold engineering team could reallocate time from infrastructure maintenance to building new product features.
Scalable Real-Time Analytics Foundation: With StarRocks’ architecture, Eightfold unlocked the ability to innovate toward agentic analytics and more autonomous, AI-driven analytics experiences without being constrained by underlying system bottlenecks.

Redshift vs. StarRocks: A Detailed Comparison

Feature	Amazon Redshift	StarRocks
Leader Node Architecture	Single leader node bottleneck	Multiple front-end nodes, no bottleneck
Concurrency Scaling	Expensive, limited (max 10-11 clusters)	Unlimited horizontal scaling
Serverless Read Behavior	Cold reads from S3, cache invalidation risk	EBS caching + intelligent separation
Partitioning	Fixed slices; noisy neighbor issues	Fine-grained partitioning via tablets
Query Latency	Sub-second possible, but inconsistent	Consistent sub-second query performance
Star Schema Joins	Supported but bottlenecked under load	Optimized for large, normalized joins
Deployment Simplicity	Requires WLM tuning, cluster tuning	Simple scaling with minimal moving parts
Storage Durability	S3-based (with compute cache)	S3-based shared data lake architecture
Materialized Views	Traditional; expensive to refresh	Async refresh, partition-based
Agentic Analytics Readiness	Limited concurrency and QPS	Built for agent-driven, high-QPS workloads

Conclusion: Choosing the Right Analytics Engine for Modern Needs

Amazon Redshift has long been a reliable foundation for cloud-based data warehousing, especially for internal business intelligence and structured reporting. Its architecture, built for centralized query coordination and periodic analytics workloads, served a generation of organizations transitioning from on-premise systems to the cloud.

But as analytics moves beyond back-office reporting into real-time, customer-facing applications, the demands on data infrastructure have fundamentally changed. Today, systems must support high concurrency, consistent sub-second latency, elastic scaling, and fine-grained multi-tenant isolation — often under unpredictable and bursty workloads.

Eightfold.ai’s journey highlights this shift: pushing Redshift beyond its design center revealed critical architectural limitations that were manageable for internal reporting but untenable for live, in-product analytics. Challenges like the single leader node bottleneck, costly concurrency scaling, serverless cold-start penalties, and rigid partitioning made it clear that a different approach was necessary.

StarRocks addresses these new realities not with incremental patches, but with a reimagined architecture: distributed coordination, intelligent caching, elastic scalability, and native support for real-time operational analytics at scale.

As organizations increasingly embed analytics into their products, empower autonomous agents, and deliver interactive user experiences, the requirements for cloud analytics platforms are evolving rapidly.
Choosing the right engine is no longer just about running SQL queries faster — it's about aligning the architecture with where modern data demands are going.

In this context, Redshift remains strong for traditional warehouse-centric use cases But for real-time, high-concurrency, customer-embedded, and AI-driven analytics, newer architectures like StarRocks are shaping the future.

Frequently Asked Questions (FAQ)

1. What is Amazon Redshift primarily designed for?

Amazon Redshift was originally designed as a cloud-based data warehouse optimized for internal business intelligence (BI) workloads. It excels at periodic reporting, dashboarding, and complex analytical queries over structured and semi-structured data. It is best suited for use cases where query concurrency and real-time requirements are moderate rather than extreme.

2. Why does Redshift have a single leader node architecture, and why is that a limitation?

The single leader node design simplifies query coordination and metadata management. However, as workloads grow, the leader node becomes a bottleneck because all queries must pass through it, regardless of how many compute nodes exist. This limits Redshift's ability to scale concurrency and can introduce query queuing under heavy load.

3. What is Redshift Concurrency Scaling, and what are its limitations?

Concurrency Scaling automatically adds transient clusters to handle query surges. However:

New clusters take time to spin up (cold start latency).
There is a limit (about 10-11 extra clusters per queue).
These clusters incur additional costs.
Managing traffic between base and transient clusters adds operational complexity. It is a useful feature but not sufficient for real-time, high-concurrency applications.

4. How does Redshift Serverless differ from provisioned Redshift?

Redshift Serverless removes the need to manage cluster infrastructure, offering automatic scaling based on demand. However, Serverless instances frequently read data directly from S3, which can cause higher query latencies due to slower I/O compared to data cached on local disks. For workloads needing consistently low latencies (e.g., live customer dashboards), this can be problematic.

5. What is the "noisy neighbor" problem in Redshift?

In Redshift, compute nodes are divided into slices, and each slice handles part of the data. If a large tenant monopolizes its assigned slices (e.g., due to high query volume or large data size), it can impact the performance of smaller tenants sharing the same physical resources. This "noisy neighbor" problem is hard to mitigate in Redshift's fixed slice architecture.

6. Why is maintenance overhead higher with Redshift?

Efficient Redshift operation requires:

Manual configuration and tuning of Workload Management (WLM) queues.
Regular VACUUM and ANALYZE operations to manage disk space and optimize performance.
Careful slice redistribution after schema changes. These tasks are essential to maintaining healthy performance but add to operational complexity.

7. Is StarRocks a direct replacement for Redshift in all cases?

Not necessarily. Redshift remains an excellent choice for:

Traditional BI reporting.
Scheduled dashboards and periodic reporting.
Organizations heavily invested in AWS-native services. StarRocks is better suited for:
Real-time, customer-facing analytics.
AI-driven, agentic analytics systems.
Use cases requiring very high concurrency and low latency. Choosing between them depends on your workload characteristics and business needs.

10. What should teams consider when evaluating a migration from Redshift to StarRocks?

Workload Profile: Are you moving from batch BI to real-time analytics?
Concurrency Needs: Will you need to support many simultaneous queries?
Latency Expectations: Do you need consistent sub-second performance?
Operational Model: Are you willing to adopt a newer architecture with simpler operational management?
Cost Sensitivity: Do you seek better price/performance at scale? If the answer to many of these is "yes," then evaluating StarRocks seriously makes sense.

Recommended Resources

Trino vs. StarRocks: Get Data Warehouse Performance on the Data Lake

Once praised for its data lake performance, Trino now struggles. Discover what's new in data lakehouse querying and why it's time to move to StarRocks.

5 Brilliant Lakehouse Architectures from Tencent, WeChat, and More

Explore 5 data lakehouse architectures from industry leaders that showcase how enhancing your query performance can lead to more than just compute savings.

Airbnb Builds a New Generation of Fast Analytics Experience with StarRocks

Learn from Airbnb's journey. Get a deep dive into how Airbnb developed their real-time data analytics infrastructure with StarRocks.