Join StarRocks Community on Slack

Connect on Slack
TABLE OF CONTENTS
     

    The ability to extract timely insights from data separates the winners from the rest, and this is especially true for real-time user-facing analytics. But many businesses drag their feet when it comes to implementing real-time user-facing analytics. Why is it so hard?

    In this article, we'll examine how the need for speed and concurrency makes delivering real-time user-facing analytics difficult, and equip you with the knowledge to better navigate these challenges.

     

    Low-Latency, Especially For Multi-Table Queries

    Users demand immediacy, especially when interacting with applications or dashboards. But addressing this demand is a twofold problem that requires engineers to design for:

    1. User Experience: Users' expectations of fast and immediate feedback necessitate rapid query execution. Slow responses can lead to user disinterest, hurting user experience.

    2. Timeliness: Certain user-facing analytics scenarios require time-sensitive data processing, such as stock prices or social media mentions, making low-latency queries crucial to ensure data relevancy.

    While existing approaches can offer low-latency for single table queries, they necessitate denormalization in the upstream pipeline, a process that is both complex and costly, especially when performed in real time.

    To counter this issue, you should always prioritize solutions that provide efficient query performance for both flat table schemas and star and snowflake schemas, circumventing the need for denormalization. You can read about this approach here.

     

    High Concurrency for Complex Queries

    High concurrency is a fundamental requirement for user-facing analytics platforms. These platforms must be ready to support a large influx of users generating a significant volume of queries per second (QPS). Failure to account for this can lead to resource contention, query hanging, and performance decline.

    Hundreds of thousands of complex OLAP queries can be troublesome, but the increased number of queries means a higher likelihood of semantic equivalences or overlaps in the data scanned. Partial results from these computations can be reused to save computation, which is useful for interactive dashboards and reports where the queries are less random.

    Despite these optimization opportunities, most solutions choose to rely on final result caching. This approach only provides benefits when two queries and their underlying data are exactly the same—a scenario that doesn't often align with the dynamic nature of high-concurrency OLAP use cases.

    An intelligent approach to user-facing analytics is to seek databases that offer more versatile caching options, such as intermediate result caching. By optimizing and diversifying caching strategies, you can propel your system's concurrency capabilities to new heights.

     

    Real-Time Mutable Data

    Delayed data means bad decisions and missed opportunities. This makes fresh data crucial for user-facing analytics. Analytics systems must be ready to process data streams that change rapidly and unpredictably to ensure data freshness. This presents significant technical challenges for scalability, reliability, and consistency.

    Many systems use the merge-on-read (MOR) technique to handle real-time mutable data, wherein data changes are applied to a separate copy of the data. The original data is only updated when it merges with the new copy later, but this can cause problems when data needs to be fresh within seconds. Merge operations can frustrate users and significantly affect performance, making queries slow, unstable, and unpredictable.

     

    Efficient Resource Isolation

    Resource isolation is an essential requirement for businesses, particularly in B2B SaaS models where multiple user groups concurrently access the same system. Here are two great examples of why resource isolation is needed:

    • Maintaining Business-Critical Operations: Certain operations can not afford to be disrupted or slowed down by other workloads. Ensuring these crucial tasks have dedicated resources preserves performance.

    • Assuring Quality of Service for Each User Group: In a multi-tenant environment, various user groups require a guaranteed minimum amount of resources to sustain their workloads. Resource isolation ensures each user group has the resources it needs.

    Most of today's solutions employ hard resource isolation at the Virtual Machine (VM) level. While this strategy indeed provides effective isolation, it has some significant drawbacks:

    • Low Hardware Utilization: VM-level isolation often results in low overall hardware usage. Each VM needs a certain amount of dedicated resources, which can not be shared or overprovisioned. This often leads to underutilized resources.

    • High Costs: The inability to share or overprovision resources means that businesses must provide enough resources to meet the peak demand of each user group. As a result, they may need to purchase additional resources that remain idle most of the time. This issue becomes even more costly when dealing with hundreds or thousands of user groups.

     

    Make Real-Time User-Facing Analytics Easy

    Delivering real-time user-facing analytics involves traversing a landscape riddled with challenges. Navigating this terrain can be tough, but thankfully, you don't have to do it alone.

    CelerData Cloud, powered by the open-source project StarRocks, is uniquely equipped to address these challenges head-on. You can learn more about it here or get started now with a 30-day free trial. Sign up here.

    copy success