Zepp Health is a cloud-based health service provider that has world-leading smart wearable technologies. Zepp Health's vision is to create smart products that promote a healthy lifestyle.
At Zepp Health, a huge amount of event tracking data is generated every day. Previous Apache HBase® computation and analytics system is inefficient and inflexible in dealing with queries against data at such a large scale. During event tracking data analysis, users often query metrics in a single dimension or a combination of multiple dimensions. The dimension can be time, event name, city, or device attribute. The metric can be the number of users and the number of times an event is tracked.
In face of such massive event tracking data, how to flexibly and efficiently compute dimensions and metrics is an issue worth exploring. Building an event tracking analytics platform on top of an efficient OLAP engine has become an important part of business development.
In the original solution, to analyze event tracking data, engineers need to collect metric statistics, use Apache Spark™ or Apache Hive™ to pre-calculate metrics such as page views (PVs) and unique views (UVs), and then write the pre-calculated data to Apache HBase for point queries.
This solution has the following three drawbacks:
To address the problems and challenges at the data storage layer, we are looking for a high-performance, simple, and easy-to-maintain database product to replace the existing Apache Spark + Apache HBase architecture. In addition, we hope to go beyond point queries of Apache HBase® and use real-time multi-table joins to cultivate more business possibilities.
There are a variety of OLAP database products on the market, including Apache Impala®, Apache Druid®, ClickHouse, and StarRocks. After comprehensive and thorough comparison, we find that StarRocks stands out among many products thanks to its outstanding read/write performance. In addition, the active and inclusive community ecology offered superior development and user experience. So we choose StarRocks as the OLAP engine to replace Apache HBase.
Product | Advantage | Disadvantage | O&M Cost | Performance |
ClickHouse |
|
|
High | High |
StarRocks |
|
|
Low | High |
Apache Impala |
|
|
Medium | Low |
Apache Druid |
|
|
High | Medium |
StarRocks is a blazing-fast enterprise-grade massively parallel processing (MPP) database designed for all analytical scenarios. It offers a mix of innovative features, including online horizontal scaling, finance-grade high availability, compatibility with MySQL protocol and MySQL ecology, fully vectorized engine, and query federation on various data sources. StarRocks provides a unified solution for OLAP services in all scenarios. It is suitable for various scenarios that have demanding requirements for performance, real-time analytics, concurrency, and flexibility.
Description of the overall architecture:
The detailed process is as follows:
StarRocks' monitoring platform shows that the average query latency is about 100 ms, and the P99 latency is about 250 ms, which fits our business needs well.
The system can quickly respond to users' query needs. The latency of large queries is reduced from minutes to seconds.
Users can analyze metrics in a combination of dimensions and time periods. There is no need to pre-compute redundant metrics.
StarRocks has an efficient storage structure, which reduces the cost of storing the same volume of data by 20%.
StarRocks is easy to O&M, which reduces the maintenance labor cost compared to ClickHouse.
Users can perform self-service queries. The data retrieving experience is improved. The latency of point queries against some metrics has been reduced from minutes to seconds, and even to milliseconds for some metrics.
An open and thriving community is also an important reason why we choose StarRocks. An open community helps users quickly learn product features and ensures timely problem resolution. We at Zepp Health are also actively engaged in community building. We hope that by giving back to the community, we can grow and thrive together with the community. We contributed the following feature improvements, which have been merged into the main branch of the StarRocks community:
When we reconstruct the Zepp Health big data OLAP platform, we have close communication with the StarRocks community in business implementation and problem tracking and resolution. The platform is impossible without the support from the StarRocks community.
The Open & Collaborate community spirit not only drives efficient resolution of tricky problems but also enhances the technical skills of all community participants.
StarRocks is an excellent OLAP product. Zepp Health is willing to use and grow with StarRocks. In the future, we will continue to engage in community building and make our share of contribution. We have the following improvement plans:
Apache®, Apache Spark™, Apache Flink®, Apache Hadoop®, Apache Hive™, Apache Druid®, Apache Hudi, Apache Impala®, Apache HBase®, and their logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.