Keep data secure by using Snowflake security features and privacy control to determine what any given user can see based on their role.
In part one of our Snowflake Security blog series, we discussed how to think about storing and organizing your data, from the Organization level all the way down to individual tables.
Using some of Snowflake’s advanced features, we can take this further and apply rules that control which specific rows and columns from a single table that a given user can see.
Before we dive into leveraging specific Snowflake security features, let’s first review a couple of key concepts that drive our decisions.
Grain / Granularity
In part one of our series, we discussed applying privacy and access controls at increasing levels of detail and specificity. Grain refers to the amount of detail that exists in the data itself as stored. A table with each individual sales transaction is very granular or fine-grained. A table with total spending per customer per month is less granular, and a table with total purchases per month across all customers by country is an even broader grain.
Data granularity is a key nuance to consider when planning your security model. It is easy to say finance analysts at your company should not have access to credit card transaction data, but we really mean they shouldn’t have access to it at the individual customer level. If you roll it up by month across all customers or by category, it may still be proprietary to your company, but it’s no longer private in the same sense.
Even ignoring the security ramifications, it’s important to remember data granularity when we look to correlate information from different sources and for different purposes. For example, the data we have may not be fine-grained enough: individual sales transaction data is very fine-grained but may not give us the information we need for detailed product performance analysis. For that, we might want access to each individual item purchased, not only the order information.
Obviously, the finer-grained the data, the more of it there is – the more it costs to store and the slower it is to query. However, modern cloud platforms like Snowflake make it more affordable to store large amounts of data and query it with vastly better performance, assuming it is organized properly. With the added ability to isolate, aggregate, tag and secure information, the industry best practice has evolved to collecting and storing all available data at the finest-possible grain, so you have it when you need it.
This “grab everything” bias must also acknowledge legal, regulatory and ethical practices regarding collection and retention of data – granular data that may be useful in the future may also be too “expensive” to collect and hold from a legal or ethical point of view. Ideally, we will:
- Exclude sensitive data entirely where we can’t imagine a valid and allowable analytical use for it.
- Example: A Social Security number has no analytical value, and you should only collect it when necessary for business operational purposes, exclude it from all analytical platforms and permanently delete it when and where possible.
- Isolate, de-identify and secure sensitive data that does have an acceptable analytical purpose.
- Example: Birth year can tell us about usage patterns across cohorts. If we separate it from the person’s identifying info (e.g., name and address) and expose only the year portion only to those who need it for analysis, keeping granular birth-year data may become valuable and acceptable.
- Subject to the above limitations, collect and keep original, raw, granular data as well as cleaned and aggregated data so we can:
- Prove the validity of cleaned and aggregated data
- Support audit requirements
- Reprocess, reclean and reaggregate data when needs and rules change.
- Expose the aggregated versions only unless and until a user needs the details.
- Default to least-privilege (only grant access to the specific data needed for a given purpose) and just-in-time (retrieve item details only when and where needed to conduct a detailed analysis).
- Make policies and procedures governing data access and usage clear, including the context, to allow independent judgment of new scenarios.
- Example: A ZIP code is not sensitive or personally identifiable, but it is highly correlated to race in the US. You don’t necessarily need to control access to ZIP code, but you do need to control use.
- And don’t “throw out the baby with the bathwater.” If we can enforce granular security and access controls, we can collect and store granular data responsibly.
Pulling Snowflake Security Features into Your Granular Control Toolbox
Technology developments in the last few years have made this nuanced approach far more feasible than before, especially combined with some careful planning as laid out in the rest of this series. Here are a few Snowflake security features to consider when planning for granular control:
The simplest approach is to create a view, which acts as a lens or filter on a given table. (Note in traditional databases, views could cause performance problems, especially when combined. This is much less of a concern in Snowflake, where they encourage views).
If you create a view that shows everything from the employee table except for private fields and remove all access to the underlying table, your private columns are secure. By itself, this approach can be inflexible – if different people need different access, you may need to create one view that hides all sensitive columns, another that hides all except for birthdate and so on. However, combined with the other tools below, you can avoid this issue.
With a standard view, tech-savvy users might glean some limited information about the underlying tables (such as column names) by looking at the performance plan. If you need to completely prevent this, Snowflake lets us create a Secure View, which obscures all information about the underlying tables. However, there is a performance trade-off as it can no longer optimize for the underlying table structures either.
If query performance on a live view becomes an issue, you can create a materialized view, which runs the query but stores a copy of the results as if it were a table. This can conflict with some of the advanced security concepts that follow, but you can avoid this conflict by choosing a consistent pattern.
Columns: Dynamic Data Masking
In Snowflake, you can apply a policy that masks the data in a given column depending on the specific permissions of the user querying it. A detailed explanation will come in an upcoming blog entry, but you can create a policy such that the query “SELECT SSN FROM Employee” will return:
- ###-##-#### for most users
- ###-##-6789 for members of the HR role
- 123-45-6789 for members of the Payroll role.
This masking can be whatever you’d like, including just a blank result. You can apply column masking policies directly to selected columns in tables or views, and it doesn’t alter the underlying data in any way. Note you must apply masking policies on a column-by-column basis. You can add these to any or all columns at the time you create a table or view, or you can add them later.
Rows: Row Access Policies
Similarly, you can apply a policy to filter the rows that are returned from a query. This is different from a column policy in that:
- There is no masking – the affected rows are simply not returned, and
- You do not choose specific rows (as you do with columns) – you create a filtering rule that is applied based on the contents of each row.
Again, more details about this will come later in the series, but in short: Given a sales transaction table with a “MerchantID” column, you can create a policy such that “SELECT * FROM transactions” will only return rows where the MerchantID in the row = the Merchant ID of the user. As above, you can directly apply this to a table or a view. If needed, you can even nest them: you can apply a filtering rule on a table and another rule on specific views that further filter the data from the underlying table.
If we choose the right combination of tools in our Snowflake security toolbox, we can control access to our data with almost surgical precision without making bespoke copies – which means without the development or maintenance headaches of data feeds and without worrying about inconsistent or out-of-date information.
Further, remember that all the above features apply equally and automatically to Data Shares. If you choose to create a share with access to some of your tables or views, any masking or row-access policies carry forward invisibly, ensuring there is no possibility of data leakage.