When launched in October 2014, Snowflake was the first data warehouse solution designed to be delivered on a cloud. The commercial name chosen for the solution was Snowflake Elastic Data Warehouse.
Snowflake Competitors: All You Need To Know
What are some good Snowflake alternatives?
Although Snowflake was the first player to define itself as a provider of a Data Warehouse on the cloud, several products from the Cloud providers themselves have emerged to compete with it.
Amazon Redshift
Redshift is the Data Warehouse product offered by Amazon Web Services. Like Snowflake, Redshift is designed to be fast and scalable.
However, it is based solely on a Shared Nothing architecture, unlike Snowflake which is based on Shared Disk and Share Nothing
In terms of price, Redshift combines storage and compute power but also allows users to scale compute power only.
Customers opt for a Storage & Computing Power package that corresponds to their needs and can exceed this package by paying a supplement per use if a specific need arises.
Redshift is significantly less expensive than Snowflake and even offers significant discounts for a one-year or three-year commitment and prepayment.
Google Big Query
Big Query is the Data Warehouse offering offered by Google Cloud Platform. Like Google Cloud Functions, Big Query is based on a Serverless architecture so that its users do not have to worry about the computing power or the storage space of their cluster. This will automatically adapt to the requests and data sent to it.
Oracle Autonomous Data Warehouse
The Oracle Autonomous Data Warehouse offer, like Google Big Query, is intended to be fully automated, from deployment to data security, including connecting to data creators and scaling up if necessary.
Unlike Snowflake and the competitors listed above, OADW is the only one available On-Premise for customers wishing to continue operating such infrastructure.
How did Snowflake start?
The idea for Snowflake Elastic Data Warehouse was to offer users a solution, in the cloud, which would bring together all the data and processing processes in a single data warehouse, while guaranteeing good performance in data processing, flexibility in their storage and ease of use for users.
What did Snowflake offer?
Data warehousing as a service
Thanks to the Cloud, Snowflake eliminates the problems of infrastructure administration and database management. Like DBaaS, users can focus on processing and storing data. By getting rid of physical infrastructure, the cost of a data warehouse becomes variable and can be adapted to the size and power required by the customer.
Multidimensional elasticity
Unlike products on the market at the time, Snowflake had the ability to scale up in storage space and computing power independently for each user. Thus, it was possible to load data while running requests without having to sacrifice performance because the resources are dynamically allocated according to the needs at each moment.
Single storage destination for all data. Snowflake allows all of the company’s structured and semi-structured data to be stored centrally. Analysts wishing to manipulate this data will be able to access it in a single system without the need for processing before they can do their analytical work.
The unique architecture of Snowflake
A hybrid architecture between Shared Disk and Shared Nothing
Snowflake makes large-scale data storage and analysis possible with its innovative architecture. Being exclusively a cloud product, Snowflake is based on virtual computing instances, such as Elastic Cloud Compute (EC2) at AWS, for calculation and analysis operations, in addition to a storage service, such as Simple Storage Service (S3), to persist the data in the Data Warehouse.
As with any database, a Snowflake cluster has storage resources (or disk memory), RAM, and CPU computing power. Snowflake is based on a hybrid architecture, mixing shared disk (Shared-Disk Architecture) with an isolated architecture (Shared Nothing Architecture).
All the data stored in the Snowflake Data Warehouse is gathered in a single directory, like shared disk architectures, and is accessible by all the computation nodes (or nodes) present in the cluster.
On the other hand, requests made on Snowflake using MPP (Massively Parallel Processing) calculation clusters are processed by nodes where each cluster contains only a portion of the data present in the Data Warehouse.
By mixing the two approaches, Snowflake offers the simplicity of data management thanks to its centralized data space, while combining the performance of a Share Nothing architecture for queries on the data that the warehouse may contain.
What are the three layers of Snowflake?
The Snowflake Data Warehouse is based on 3 layers:
- Data storage
- Request processing
- cloud services
When data is inserted into your Snowflake Warehouse, it compresses it, reorganizes it in its column format and enriches it with metadata and statistics. Raw data will no longer be accessed directly but only via queries (SQL, R or Python) made through Snowflake.
Snowflake also has a processing layer to handle queries on the data. Data queries are executed on “virtual warehouses”, or virtual data warehouses. Each virtual warehouse is an MPP cluster based on a Shared Nothing architecture, having several nodes, but storing only part of the entire data of the Data Warehouse.
Each Virtual Warehouse is capable of processing a multitude of simultaneous requests, and the computing cluster is capable of growing or shrinking depending on the workload at a time T. The different virtual warehouses do not share any resources, neither computing nor of memory, nor of storage, allowing each warehouse to have no resource conflicts or competing requests for the same data.
Finally, cloud services form the top layer of the Snowflake infrastructure, allowing different services to coordinate across the infrastructure of a Data Warehouse. These services allow users to authenticate, launch or optimize data queries, administer clusters and many other features.
Data Protection in Snowflake
Snowflake ensures the integrity of the data it hosts via two features, Time-Travel and Fail-Safe.
The Time Travel allows, when the data is modified, to keep its state for the entire configured duration. Limited to a single day of history in the standard version, Time Travel can be configured for up to 90 days in the Snowflake Enterprise license and allows reverting to a previous state of a table, schema or database complete.
The Fail Safe feature offers a 7-day backup after the end of the Time Travel period in order to recover data that would be corrupted by errors during operations.
Both of these features are themselves data creators and help fill the billed storage space of the Snowflake cluster.
How much does a Snowflake cluster cost?
Snowflake being a product available exclusively on the cloud, its price varies according to your use and is impacted by three parameters, underlying the operation of the service. These are storage costs, compute costs, and desired level of service.
Storage costs, very simple and explicit on Snowflake, are $23 per month, per one terabyte, if you choose to pay in advance for a year of use. If you want to pay monthly, you will have to inflate the monthly price to $40 per terabyte.
Calculation costs are more complex to understand since they can vary depending on the choice of cloud provider (Azure, AWS or GCP) as well as the geographical region where you want to host your cluster.
Snowflake will measure a unit of computing capacity into what they call a credit. This credit represents one full hour of an XS size compute node.
Snowflake will charge you per second of use of your warehouse, while keeping a minimum billing of one minute.
Depending on the support criteria, features and standards you want for your Warehouse, you will need to choose a Snowflake license. For the same credit, it will have a different cost depending on the license you choose.
For an SME that wants to get into big data and is setting up its first Data Warehouse, the budget for a Snowflake cluster is around $7,000 per year with prepayment.
For a larger company, ingesting a lot of data and requiring large processing and calculation capacities, the invoices can go beyond $500,000 annually.
The Snowflake pricing page does not provide a simulation of cluster costs, but there are unofficial calculators to help you estimate your annual costs for a Snowflake Data Warehouse.
What are the benefits of Snowflake?
Today, more than 6,000 companies have chosen Snowflake, collectively paying more than a billion dollars to the American firm.
Snowflake’s hybrid architecture is its real strength, giving users the ability to run complex queries on a large volume of data in the most efficient way possible and simultaneously.
Snowflake also offers multi-cloud and multi-region availability. Whether your IS is based on AWS, GCP or Azure, Snowflake can comply
The adaptability to structured data (ordered in columns like SQL) and semi-structured (XML, JSON, Avro, Parquet etc.) is another strong point of Snowflake. It is capable of ingesting this type of data and readapting it internally so as to respond to future requests involving this type of data.
Snowflake-engineered data accessibility allows administrators to open read-only access directly from their Snowflake cloud interface. These applications or partners external to the company will be able to rely on the volume of data and the processing performances with limited rights on the access to the data of the Warehouse.