How to Squeeze Value out of Real-Time Analytics at Scale
Jan 9, 2023 11:48:04 AM
Online analytical processing (OLAP) has provided businesspeople with insights into data for many years – but not without its tradeoffs.
Early database technologies lacked the scalability and performance characteristics companies are familiar with today. As a result, vendors designed OLAP systems to provide analytics based on prepared summary data – data sets that were incomplete and hours or even days old.
With today’s cloud computing technologies, such limitations should no longer hold companies back. Business professionals expect real-time insights into all relevant data, without the constraints of yesterday’s legacy technologies.
The Power of Real-Time Data
The term real-time has different meanings in different contexts. Multiplayer games like Fortnite require real-time interactions, but this low-latency connotation of real-time doesn’t apply in most business situations.
Some businesses, like real-time stock trading, require data that are current to the millisecond – but this bar is higher than most organizations set for their data-centric interactions. More generally, companies require up-to-the-minute information about their businesses. Whether data are a millisecond or a minute old, however, is beside the point. What counts is that business decision makers have information current enough to make the best decisions they can without delay.
Delivering this level of real-time data, however, requires a rethink of the entire data architecture end-to-end. Data collection on web sites and devices must take place in real time. Middleware, databases, and visualization technologies must similarly operate in real time. If any component in the data lifecycle presents a bottleneck, then the performance of the entire interaction will suffer.
For this reason, modern OLAP technologies often depend upon massively parallel processing (MPP) – dividing up queries in order to run them in parallel. Here are three stories of different companies who struggled with their existing OLAP solutions in the face of real-time requirements, and who moved to the StarRocks solution and its MPP framework to resolve their OLAP bottlenecks.
Trip.com: High-Concurrency Queries Supporting Diverse Needs
Trip.com is a leading online travel platform that provides booking services for over 1.5 million hotels worldwide. Its business intelligence platform provides data visualization across these services to its lodging line of business to support rapid decision making. The trip.com web site receives heavy consumer traffic, especially during holidays. As a result, the company deals with massive quantities of data daily. Furthermore, each team in the lodging department cares about different metrics, thus requiring diverse views of real-time data.
To support these requirements, Trip.com implemented ClickHouse, a fast open-source column-oriented DBMS that supports the generation of analytical data reports in real-time. However, Trip.com found that ClickHouse could not support the high concurrency of queries that its lodging department required.
Furthermore, holiday peaks increased traffic to Trip.com’s real-time dashboard by a factor of ten – threatening the stability of its server infrastructure. To prevent this calamity, the company had to implement throttling on its front end – a solution that pleased nobody. After testing a variety of alternatives, Trip.com decided to purchase the StarRocks high performance distributed relational columnar database.
StarRocks’ most important characteristic was its MPP framework. Each node in the MPP framework can process up to 10 billion rows of data per second, sufficient to meet Trip.com’s needs during peak traffic times without throttling.In addition, StarRocks’ disaster recovery capabilities provided an additional win for Trip.com, as the platform supported the company’s high availability requirements in the case of failure.
Zepp Health: Tracking Smart Wearable Data
Zepp Health is a cloud-based health service provider that provides smart wearable technologies that generate massive quantities of data every day. Zepp’s business users need the ability to query metrics in one or more dimensions over massive event tracking data sets. Zepp’s Apache HBase computation and analytics system wasn’t able to provide the analytics Zepp required, as HBase stored data as key-value pairs, limiting Zepp’s ability to conduct complex analytics.
To address this limitation, Zepp selected StarRocks because of its superior read/write performance. StarRocks MPP provided Zepp Health with a unified OLAP solution that worked for all the scenarios that Zepp personnel required. The support of StarRocks’ community was also a deciding factor for Zepp Health, as the company leveraged the community for training and support while contributing feature improvements back to it.
SF Technology: Bringing the Smarts to Smart Logistics
SF Technology is a subsidiary of Chinese multinational logistics company SF Holding and the sister company of delivery services firm SF Express. As SF Holding and its subsidiaries grew, SF technology implemented a big data ecosystem for collecting, storing, and analyzing data across the organization.
SF technology built an OLAP system on ClickHouse, with the addition of the ElasticSearch data store, search engine, and analytics solution and the Presto open-source SQL query engine. However, SF Technology ran into compatibility issues among these products, limiting its ability to maintain and upgrade its OLAP solution. These challenges led SF Technology to select StarRocks to provide integrated services for its big data analytics needs. The company’s core requirements centered on real-time data analytics and frequent data updates. StarRocks’ high availability, autoscaling, and real-time capabilities gave SF Technology and its sister companies the analytics capabilities they required.
The Intellyx Take
Achieving real-time data analytics requires that organizations remove bottlenecks that would slow things down – but removing such limitations doesn’t mean that moving to a next generation OLAP like StarRocks requires that companies rip out all of their existing technologies.
On the contrary, as the three examples above illustrate, StarRocks works in conjunction with most existing data processing technologies. Organizations can rest assured that the transition to StarRocks doesn’t require any risky rip and replace strategy. Instead, StarRocks can work with existing data collection and visualization technologies. Over time, perhaps, organizations will eventually replace older technologies across the full data landscape – after all, companies replace legacy gear all the time. Regardless, the stories above illustrate how organizations can implement StarRocks to resolve real-time analytics challenges today.
Copyright © Intellyx LLC. StarRocks is an Intellyx customer. None of the other organizations mentioned in this article is an Intellyx customer. Intellyx retains final editorial control of this article.
Doubling the Analytic Wealth of Unified Historical and Real-Time Data
Real-time data is wired, historical data is tired. Or is it? Real-time data represents the future of analytics. So much so that it seem...
How Expensive is Real-Time Analytics?
Real-time analytics challenges often fall into three buckets: speed, volume, and data freshness. These challenges, however, are often s...