Today, every selfie, social media interaction, financial transaction and webpage visit generates data that businesses need to process and store to be useful. Snowflake architecture delivers a way to do that. We explain how.
The modern era’s changing data needs are putting great strain on traditional data storage solutions. Built on the infrastructure-as-a-service (IaaS) model, these solutions are unable to handle the increasing diverse formats and greater quantity of data that users constantly generate.
However, business needs have changed, too. New business requirements demand more than big data infrastructure’s ability to handle larger volumes of data. Business users need modern data storage solutions that allow them to process, categorize and retrieve data in near real time to meet consumers’ needs for options, rapid access and more.
In short, while traditional data storage solutions deliver many benefits, they also suffer from:
- Limited Storage Capacity: Storage arrays in traditional warehouses also require archiving, causing some data to be unavailable for ongoing analysis.
- High Cost: Larger warehouses lead to higher disk storage and power processing costs.
- Limited Processing Capability: Individual queries run quickly in traditional warehouses, but simultaneous queries from multiple users strain single-core machines or otherwise limited infrastructure.
- No Scalability: As data grows, the systems should be scalable enough to store it. However, conventional databases eventually hit a practical limit.
- Maintenance: Aging hardware leads to hardware failures.
- Skilled Admin Dependency: Traditional warehouses need many highly skilled people to maintain indexes, update metadata, choose data distribution, collect “garbage,” maintain memory and more.
- Data Security: When users access data directly from numerous points, security problems give hackers a chance to move in.
As a result, instead of more or bigger data warehouses, data managers today need cloud-based data storage solutions that enable diverse data-management architectures. That way, you can spend your time managing and fine-tuning data, rather than worrying about your infrastructure setups, node failures, security, day-to-day maintenance, and bad results — while meeting your business’s data needs.
In 2019, Gartner released its report, “Magic Quadrant for Data Management Solutions for Analytics,” announcing a solution called Snowflake had emerged as a market leader. According to the report, the number of Snowflake customers tripled in 2018, fueling 247 percent year-over-year revenue growth.
Gartner’s report was an eye-opener that caught the imagination of the world. What made Snowflake’s data storage solution so successful? Let’s take a deeper look into the Snowflake architecture to find out.
Why Snowflake?
One key to Snowflake’s success is its multiple purposes: data lake, operational data store, data warehouse, and data mart. With increasing demand, people need a data architecture or solution that is more:
- Robust
- Massively scalable
- Democratic and less prone to error
- Free of data silos
- Able to share governed data
- Capable of executing diverse analytics workloads for big data delivery.
Furthermore, Snowflake’s platform enables enterprises to fully automate core business processes by:
- Providing the power of data warehousing: Snowflake collects and aggregates data from one or many sources, delivering a longer view of an organization’s data over time.
- Providing flexibility of big data platforms: Snowflake supports semi-structured data and can ingest it directly into a relational table using the VARIANT data type. Once loaded, you may query JSON data using SQL, making data pipelines much simpler. Users don’t need to modify code every time they add a new column or parameter, which reduces the DevOps operational burden for app developers.
- Handling elasticity over the cloud: The cloud provides the option to resize your virtual warehouse for additional compute resources at a fraction of the cost of traditional solutions.
However, Snowflake’s cloud solution goes beyond elasticity because it is cloud agnostic. Whether data is stored in Amazon Web Services (AWS), Azure or some other cloud solution, Snowflake can manage it in multiple clouds. That gives Snowflake an added edge over its competitors.
Let’s delve deeper and look at Snowflake’s architecture.
Inside Snowflake Architecture – High Level
Snowflake supports a high-level architecture, as depicted in the below diagram. It has three main components, which together make up the Snowflake data warehouse:
- Storage Layer
- Compute Layer
- Cloud Services Layer
At the highest level, Snowflake’s architecture is a hybrid of traditional shared-disk database architectures and shared-nothing database architectures.
In a shared-nothing architecture, data is partitioned and distributed over a number of machines. Each machine has its own access and responsibility for its data without sharing responsibility with other machines. That allows data to be segregated, with each node controlling its own subset of data.
It may seem that shared-disk architecture, as the opposite of shared-nothing, would be preferable. After all, isn’t it better for all those machines, or nodes, to “share” information with each other? Not always! In some use cases, shared-disk can lead to a distributed locking problem because each node has its own lock. But with shared-nothing, data can flow straight through to the disk without any lock mediation. When only one machine has ownership, you only need one lock.
However, in other use cases, a shared-disk architecture is better. That’s because queries come in all shapes and sizes. Some are complex, requiring data from multiple locations, while others are simpler and draw from only one source. In today’s world, we must be able to accommodate both types of requests. A hybrid approach is now best because it makes the data warehouse more agile for the massive quantities of diverse data and queries that users generate today.
Conclusion
Modern problems demand modern solutions. In this post, I’ve given an overview of one modern solution for the modern problem of storage solutions that slow down because of constantly evolving data types. My hope is that it will help you realize that if you are experiencing data-management problems, you have access to new technologies that can help.
In my next blog, I will look at the more technical aspects of Snowflake’s approach.