Snowflake roles keep data access and privacy policies organized and universal by establishing who has access to what on both a data and compute level in a centralized location. We explain in part five of our blog series.
In the previous entries in our Snowflake security and data privacy blog series, we discussed how to store and organize data, how to identify which data is sensitive, and a variety of ways to apply granular access control. In this entry, we’ll discuss the importance of Snowflake roles and how they drive all the policies and techniques we’ve discussed so far.
Using Snowflake Roles to Determine Access Control
A “Role” in Snowflake controls not only what data a given user can see but also what they can do, which in turn has performance and cost impacts.
In traditional databases, giving someone access to a database server gives them access to both the data on that server and access to use its CPU and RAM to run queries. One of Snowflake’s key features is that it treats storage and compute separately. I can give both finance and data science users access to the same storage – the same data. But I can also give finance access to a standard compute engine to run reports while giving data scientists access to an extra-large compute engine to run complex analysis or machine-learning algorithms.
Both compute engines can operate simultaneously, and both can query the same data without any impact on each other whatsoever – the data science routines do not slow down or interfere with the finance reports.
Snowflake calls these sets of compute power “warehouses,” which is a bit confusing. A warehouse, in this context, simply refers to a set of distributed CPU and RAM – a set of virtual machines – and does not involve any data. If I want to let finance query some data, I first have to give them access to the data itself and also give them access to a warehouse (CPU and RAM) that allows them to do work on that data.
Snowflake, like most cloud data platforms, charges very little for the storage itself and primarily charges based on how much compute power you use. Separate compute allows you not only to isolate workloads but also to track usage and (if appropriate) charge back to the right department.
For this reason, as well as the security considerations described above, you should define a Role based on a common set of activities, responsibilities and behaviors, not solely on a set of permissions.
Snowflake Roles in Practice
For example, if you create a Snowflake role called “Prod_Read_Only,” that could describe a very wide variety of people with very different responsibilities. It also makes maintenance difficult – you have to add and remove specific people from a whole variety of “roles” (Prod_Read_Only, Dev_Read_Write, UAT_Read_Write) as their needs change. Further, it only describes their data access – what if you want both finance and data scientists to have “Prod_Read_Only” access but want them to use different warehouses?
Roles should instead be specific to their real-world position. “Data_Developer,” “Application_Developer,” “Data_Tester,” “Finance_Analyst,” “Fraud_Analyst,” “Broker,” “Supplier” and so on – all have a need to access different, but overlapping, data sets in different ways. Even non-human roles exist, such as “Data_Loader” or “Data_Monitor” for automated routines.
After you define your roles, it’s relatively easy to define and maintain their access to both processing power and data, including all the granular security options we’ve discussed in this series. For example, your role descriptions might look like this:
Data Developer:
- Read/Write access to Dev sales data
- Read/Write access to UAT sales data
- Read access to Prod sales data
- Usage access to a medium-powered warehouse for loading and querying in any environment
- Masking applied to all PII and sensitive brand data in all environments.
Application Tester:
- Read access to Dev sales data
- Read access to UAT sales data
- Read access to Prod sales data
- Usage access on a small warehouse for basic querying anywhere
- Masking applied to all PII, but not sensitive subsidiary data, since they need to be able to test all of it.
Data Monitor (non-human):
- Read access to [all data in all environments]
- Usage access on a large warehouse for frequent, efficient checks of large amounts of information
- No masking applied, so it can identify discrepancies even in sensitive fields.
Several roles may have the same permissions. This is fine and not wasted effort: at any moment, it may make sense to remove Prod read access from the data testers but keep it for the application testers, at which point you’ll be happy you kept them separate. At the same time, perhaps you’ll have all those development roles share a single warehouse until you find conflicting needs demand a second.
Snowflake roles can easily be inherited and have their permissions mixed and matched as needed. Consider a hierarchy like this:
- Finance Analyst [global]
-
- Finance-US
-
- Finance-USEast
- Finance-USMidwest
- Finance-USWest
-
- Finance-EU
-
- Finance-France
- Finance-Germany
- (and so on)
-
- Finance-US
-
In this example, perhaps all members of the “Finance_Analyst” role have access to sensitive financial columns (unmasked), but none have access to PII columns. Members of Finance-US can see US rows, and all US finance analysts share a single medium-powered warehouse for querying. Members of individual US regions have further row filtering but still share the same warehouse with all their US colleagues.
Inheritance also makes it easy to manage temporary access. Perhaps the data support team, who cannot normally see Prod data, needs temporary read access in Prod to investigate a mismatch in sensitive data alerted by the data monitor role. You can temporarily add one person – or the entire role – under “Data_Monitor” and inherit all its access while they investigate the problem. Then, when they finish investigating, you can remove them in a single step.
Moving Forward With Snowflake Roles
Because there are so many available methods for securing data, we need to strike the right balance between security, performance and ease of maintenance. Managing access via Snowflake roles is slightly more bullet-proof than managing access via mapping tables (described in part four) – there is a slightly lower risk of making a mistake in configuration with roles – but roles are noticeably harder to maintain in large quantities.
An appropriate plan for your organization will depend on how many regions, roles, internal and external users, and databases you’re likely to have, along with decisions about automated provisioning and keeping these roles in sync with other parts of your organization (e.g., Active Directory). Here are a few guidelines to help you implement roles and other access control tools efficiently:
- Resort to separate roles for highly sensitive security divisions – i.e., if you need to be able to prove that two organizations could never see each other’s data, this is easier to demonstrate with roles than with mapping tables.
- Use mapping tables for granular security like local-office separation – this is much easier to maintain than dozens or hundreds of roles.
- Create one role per category of user, team or department – “Data_Developer,” “DevOps,” “Prod_Support,” “Data_Scientist,” and so on.
- Consider hierarchy by country or global region, if needed, especially for compliance with data-privacy regulations such as GDPR.
- Consider performance, usage and budgeting expectations – remember roles are not only for data access but also delineate compute power and therefore spend between user groups.
- Avoid any roles specific to an environment (Dev, Prod). These make deployments, maintenance and testing much more complicated and error-prone and can better be managed through the permissions of each role themselves.
Conclusion
Role management has always been an important part of data security but traditionally followed a server-by-server approach. With Snowflake’s scale and flexibility, you can reimagine roles and access to provide far better control and ease of maintenance. Taking a little time to examine our assumptions and make a plan that considers new ways of working will allow us to provide better access to data with better security and lower overhead.
In our final entry, we’ll take a look at the risks and benefits of data access (including legal considerations such as GDPR) and bring together a final list of guiding principles that can help us make the right security decisions.