Join StarRocks Community on Slack

Connect on Slack
TABLE OF CONTENTS
     

    <1s latency | 5-second data freshness | 100s of concurrency

     

    About Yuno

    Yuno is a fast-growing fintech company that provides payment orchestration and merchant analytics across Latin America. Its platform connects merchants to hundreds of payment providers, helping them optimize routing, monitor performance, and improve conversion in real time.

    With hundreds of merchants and millions of daily transactions, Yuno's business relies on delivering insights instantly — every second of delay can impact customer satisfaction and revenue.

     

    Yuno's Customer-Facing Data Stack & Requirements

    Yuno's analytics platform powers a range of customer-facing workloads that demand exceptional speed and scale, including:

    • Merchant dashboards for real-time approval, decline, and fraud monitoring.
    • Conversion funnel analytics combining checkout behavior with payment outcomes.
    • Routing optimization tools that help merchants dynamically adjust providers.

    To support these workloads, the data platform needed to evolve into a real-time analytics layer capable of handling:

    • Concurrency: hundreds of simultaneous dashboard queries from merchants.
    • Latency: sub-second response times for interactive workloads.
    • Freshness: data updated within seconds of each transaction.

    These requirements quickly outgrew Yuno's previous Snowflake-based architecture and drove the search for a new, high-performance analytics solution.

     

    Previous Architecture

    Before adopting CelerData, Yuno operated a hybrid data stack centered on Snowflake for analytics and Athena/Hudi for historical reporting. The system was designed for batch analytics, not for high-frequency updates or real-time dashboards.

    Core Components

    • Snowflake – main analytical warehouse for real-time dashboards and reports.
    • Snowpipe + dbt – handled data ingestion and transformations.
    • Athena on Hudi (S3) – stored historical data and powered ad-hoc or long-tail queries.

    Operational Model

    • Each use case or team had its own Snowflake warehouse to avoid concurrency limits.
    • Data was loaded in batches and merged periodically, often taking more than an hour to propagate.
    • Analytics teams maintained two separate query systems — Snowflake for live dashboards and Athena for deep history — resulting in duplicated logic and fragmented governance.

     

    Challenges

    This dual-system setup — Snowflake for batch analytics and Athena on Hudi for historical access — served its purpose early on but could no longer keep pace with Yuno's rapidly growing, customer-facing workloads.

    • Data Freshness: Data latency frequently exceeded one hour, far from the second-level goal.
    • Latency: Query response times could reach several seconds during peak load.
    • Concurrency: Snowflake's warehouse isolation limited scalability; supporting more users required provisioning new warehouses.
    • Scalability: As data volume grew to terabytes, with hundreds of tables and billions of rows, performance degraded and maintenance effort increased.
    • Data Model Complexity: The main analytics table — a large "flattened" schema — combined numerous joins from transaction and event data. Over time, this table became massive and slow to maintain, with query latency compounding as data volume grew.
    • Complexity & Cost: Managing two analytical systems was operationally heavy. Each required separate tuning, security, and governance, making the overall stack hard to maintain.

    Yuno needed a solution that could unify real-time ingestion, fast queries, and cost efficiency.

     

    Evaluating the Next Solution

    The team evaluated multiple technologies, including ClickHouse, Apache Druid, and CelerData (powered by StarRocks).

    ClickHouse offered strong query performance but lacked the lakehouse integration and native support for real-time upserts that Yuno required. Druid provided good time-series capabilities but introduced significant operational overhead and performance challenges.

    CelerData stood out for several reasons:

    • Unified real-time and federated analytics: StarRocks is able to federate real-time data in its internal storage format and external tables on Hudi/S3 with full SQL semantics.
    • Native support for data upserts from CDC sources: Second-level data freshness without impacting query performance
    • Materialized views and shared-data architecture: Enabling high concurrency and sub-second response time even at large scale.
    • High Performance: CelerData's fully vectorized C++ engine and cost-based optimizer deliver consistently low-latency queries, even under high concurrency, making it ideal for customer-facing workloads

    After a successful proof of concept — where CelerData handled 500 queries per second with sub-second latency and second-level freshness — Yuno decided to migrate its customer-facing workloads.

     

    The New Architecture

    Yuno's new analytics platform is built around CelerData as the central query engine, directly connected to its operational data and data lake.

    Data Pipeline:

    • Sources: Streaming data sources through Flink CDC, preprocessed, and ingested into CelerData's primary-key tables.
    • Data Lake: Historical datasets remain in Hudi on S3, accessible via federated queries.
    • Processing: CelerData maintains both synchronous materialized views (for near-instant aggregation) and asynchronous views (for complex precomputations on large joins).
    • Access Layer: Merchant dashboards and internal tools query CelerData directly through APIs, achieving 5–10 seconds data freshness across transactional and aggregated data.

    This architecture eliminated the need for separate batch and streaming systems. It allowed Yuno to serve both live and historical analytics from a single source of truth.

     

    Implementation

    The migration was executed in phases:

    • Schema Redesign: Yuno transitioned from a massive flattened table to a set of primary-key tables optimized for upserts and incremental refresh.
    • Materialized Views: Critical dashboards (e.g., merchant conversion rates, provider success metrics) were powered by synchronous MVs with sub-second latency. More complex metrics used asynchronous MVs that refreshed every 20–30 seconds.
    • Partitioning Optimization: Early ingestion issues were mitigated by switching from daily to monthly partitions, which drastically reduced IOPS and improved stability.
    • CI/CD and Metadata Governance: CelerData was integrated with Yuno's deployment pipeline. Every schema or column change required metadata annotations and was automatically propagated to CelerData, building a semantic layer for downstream analytics and LLM-driven query generation.
    • LLM and Self-Service Analytics: Yuno built an internal layer where users could type natural-language questions, which the LLM converted to SQL queries on CelerData. The system reused metadata and sample data to generate charts and dashboards automatically, turning complex analytics into a self-service experience.
    • Observability and Troubleshooting: The team enhanced ingestion reliability and visibility, adding monitoring for partition health, versioning, and MV refresh status to prevent previous stability issues.

     

    Results

    After migrating to CelerData, Yuno achieved measurable improvements:

    • Query Latency: Reduced from ~3 seconds on Snowflake to under 1 second for most dashboards.
    • Data Freshness: Improved from 1 hour to around 5 seconds for core transactional data.
    • Concurrency: Supported hundreds of concurrent dashboard queries without performance degradation.
    • Cost Efficiency: Eliminated the need for multiple Snowflake warehouses and reduced compute cost per query by over 40%.
    • Reliability: The shared-data architecture and optimized partitioning significantly improved ingestion stability.

    Operational teams can now monitor provider performance, identify conversion drops, and adjust routing rules in real time — transforming analytics from a retrospective tool into a live decision engine.

     

    Future Work

    Yuno continues to evolve its analytics platform with CelerData. The next steps include:

    • Integrating S3 Express One Zone for lower latency and reduced cost of hot data storage.
    • Enhancing LLM-driven analytics by expanding the internal knowledge base with query examples and column semantics.
    • Improving disaster recovery and metadata resilience to further harden ingestion during upgrades and reboots.
    • Expanding customer-facing analytics to new regions and product lines, supported by CelerData's scalability and lakehouse integration.

    With CelerData as the foundation, Yuno has built a modern analytics architecture that delivers real-time insights at scale — combining the speed of a database with the openness of a lakehouse, and turning data into a true competitive advantage.



    copy success