- Improved gaming experiences
- Increased in-game purchase revenue
- Higher customer satisfaction
- Shortened new game development time
- Reduced data engineering costs
- High-performance, low-latency queries
- Real-time data updates
- High concurrency queries
- Effortless operations and linear scalability
- MySQL protocol compatible
Analytics in Gaming
Analytics plays a critical role in the gaming industry. Game developers rely on data to improve high-impact storyline development, create individualized in-game experiences, scale promotions, and optimize monetization. Other areas of business operations, such as sales and marketing, financial analysis, fraud detection, and security also rely heavily on data analytics.
Technical Challenges at YooZoo Games
YooZoo Games' previous analytics architecture consisted of several components:
- Presto and ClickHouse were used for business metrics calculation, usually at minute or hourly intervals.
- Spark Stream and Apache Flink were used to read data from Apache Kafka and calculate real-time metrics.
- The results were written to MySQL, which serves as the backend for the reporting and dashboard system.
- Apache HBase was used for interactive queries as well as tagging tables. Data in HBase is exposed to other systems through DataAPI. For example, the Customer Services system can look up a gamer's tag information in HBase.
- Apache Hive was used to provide metadata services to other components.
Together, these components formed the analytical platform at YooZoo Games, but this approach was far from perfect and presented several challenges:
- High Operational Costs - Each of the aforementioned components had to be managed individually.
- High Development Costs - Developers had to learn different syntaxes and tools for these components.
- Consistency Issues - Data and calculations in different systems could become inconsistent over time.
- Limited Scalability - MySQL's multidimensional query performance deteriorated with data volume growth.
Selecting a Next-Generation OLAP Layer
To address these issues and improve its analytics capabilities, YooZoo Games set out to build a new OLAP platform. Their selection criteria were:
- Fast ingestion - event data should be ingested into the OLAP layer within a couple seconds.
- Low query latency - query latency should be in the milliseconds.
- High query performance on joined tables.
- Easy to manage and expand.
- High concurrency support.
- A shallow learning curve for application developers.
During the evaluation process, YooZoo Games looked at ClickHouse, Apache Doris, and StarRocks, and came to the following conclusions:
- ClickHouse is hard to maintain and lacks joined table query support.
- During performance comparisons, StarRocks showed better performance than Apache Doris.
Satisfied with the results of their evaluation, YooZoo selected StarRocks as the next-generation OLAP layer for their Big Data platform.
- Query Performance - With technologies such as an MPP architecture, cost based optimizer, columnar storage, and a fully vectorized engine, StarRocks outperforms other technologies by a wide margin.
- Rich Data Ingestion Methods - Depending on factors like data structure and volume, StarRocks supports multiple data loading techniques, including broker load, spark load, stream load, and routine load. In most scenarios, users don't need third-party ETL tools.
- Straightforward Operations - StarRocks doesn't need 3rd-party components such as ZooKeeper. Operating and expanding a StarRocks cluster is simple.
- Rich Data Model Support - StarRocks supports detailed data, aggregation, updates, and primary key models as well as intelligent materialized views for optimal query performance.
- Supports MySQL Protocol - Which means most BI dashboards and applications can be reused.
- Support for External Tables - This is critical for lake house analytics and to avoid vendor lock-in.
How StarRocks Improved YooZoo Games' Analytics Experience
Real-Time Metrics Calculation
With the introduction of StarRocks, Flink now only needs to handle simple ETL operations such as interacting with HBase to generate users and roles logon information, creating tags in corresponding log files, and resolving IP information. After initial processing in Flink, data is written into Kafka as well as Hive, with data eventually being separated into different stages in StarRocks. Dashboards and reports read StarRocks' end result data directly and expose them through DataAPI.
Data Models: from De-normalized Tables To Star Schema
In YooZoo Games' previous ClickHouse based architecture, the data model was mainly de-normalized wide tables. ClickHouse's query performance on wide tables was great, but modifying the data model to reflect the changes in the source was time consuming and error prone.
StarRocks, on the other hand, has great query performance on multi-table joins and supports real-time updates. This allowed YooZoo Games to move from de-normalized tables to star schema models, making it much easier to handle dimension changes. Additionally, fact tables and dimension tables are now decoupled, which offers more flexibility to run multi-dimensional analysis.
Once and Exactly Once Data Ingestion
ClickHouse could not guarantee the data uniqueness in real time data ingestion hence YooZoo Games often had to de-duplicate data in downstream logics. This was tedious and time consuming.
With StarRocks, YooZoo Games was able to use the Flink-Connector-StarRocks (FCS) plugin to ingest data. This FCS plugin can guarantee that data is ingested once, and exactly once, which greatly improves productivity.
A New Metrics Storage Architecture
As mentioned earlier, calculated reports and more were loaded into MySQL from ClickHouse and Hive, using tools such as Sqoop or other homegrown programs.
This isn't the case for StarRocks, which has federated query capabilities through its support of external tables. Data can stay in Hive or MySQL instead of being moved between different components, resulting in improved data freshness and a simplified data pipeline.
What's Next for YooZoo Games?
YooZoo Games has big plans for StarRocks in 2023 with a goal of improving their StarRocks-based analytics platform further with several projects:
- Move all other real-time analytics workloads to StarRocks.
- Build a unified analytics platform on top of StarRocks.
- DataAPI services will fully integrate StarRocks in the backend.
- Build monitoring facilities around query performance, task scheduling, system health, and more.