We wrap up our Snowflake security and privacy blog series talking about how you can regulate your internal data and protect the reputation of your organization.
The up-and-coming data cloud Snowflake makes providing high-performance data access easier and more efficient than traditional databases, giving us the opportunity to get more value from our information. But, it also requires us to think differently about keeping data safe.
In this six-part blog series, we’ve laid out some best practices for managing information access and Snowflake security, from organizing and isolating data to object tagging. In our final entry, we’ll discuss a balanced approach to internal controls and share some guiding principles that will help us make decisions along the way.
Snowflake Security vs. Privacy
Data security refers to protecting data from outside access or interference (essentially, blocking hackers). In Snowflake, all data is encrypted while in motion and while at rest. This protects data from direct outside theft. If desired, special Snowflake accounts are available for federal government work and for HIPAA compliance, but even the standard Snowflake environment is highly secure.
A well-configured, secure Snowflake environment leveraging tools like Active Directory integration and Single Sign-On will not allow any direct access from outside your organization – employees will be able to log in from within your network, but external users will access data only through interfaces you create or provide, such as dashboards or applications (or tightly-controlled data shares, if desired). Employees using SSO authentication against their Active Directory accounts will automatically lose access if they leave the organization.
Of greater concern is the proper protection of data privacy within the world of authorized users.
Protecting Data Access and Your Reputation
As I mentioned in part one, information is simultaneously valuable and dangerous. Technology companies have (intentionally or unintentionally) been careless about protecting private consumer data since the rise of the internet and social media. Global governments are stepping in to hold companies accountable:
- GDPR regulations in Europe spell out how companies must protect the data of EU citizens wherever it is stored.
- On-soil laws in countries like India and China restrict the movement of data beyond their borders.
- State regulations in Colorado and California (and growing) mimic GDPR in inconsistent ways, and similar US Federal laws are currently under negotiation.
- Separately, HIPAA regulations spell out exactly who may access which health-related data and under which circumstances.
Fines and related business impacts can reach tens of millions of dollars – Google, British Airways, H&M and Marriott have each had GDPR fines over €10M, and Equifax recently reached a settlement of $425M for their 2017 data breach. Earlier this year, one global credit card brand had to stop issuing cards in India entirely due to alleged non-compliance with their on-soil laws.
A potentially greater concern is the damage to an organization’s reputation. In the last few years, large and profitable companies have seen their stock prices plummet when they make the news due to data breaches, and their customers leave in droves. Similar problems are simmering beneath the surface elsewhere: Medical organizations reliably comply with HIPAA regulations outside their walls but may not have the time or expertise to prevent inappropriate access by internal employees.
At many organizations, any employee has full access to private information because it’s too hard to separate by role. These practices may meet legal compliance, but – as we’ve seen with the rise of phishing and ransomware – leave open opportunities for disaster.
Regulating Data Access in Snowflake
Here are a few recommendations for careful and efficient privacy practices.
Consider anonymization, encryption and masking. True anonymization, per historical standards, involves transforming sensitive data to an anonymous format that cannot be reversed. In contrast, standard encryption of fields follows an unknown pattern that can be reversed with the right key, which is desirable in some cases. Anonymization physically replaces values with characters that do not in any way correspond to a pattern related to the underlying data, such as “000-000-0000” for all phone numbers.
You may not need this bar-the-doors approach in your environment once the recommended isolation, masking, role, and row-level policies are combined. Here are some guiding principles to follow:
- Create carefully organized roles that can be used to enforce access rules.
-
- For more granular Snowflake security, create mapping tables (for example, to associate people to specific regions to further limit their data access).
-
- Create tags for all potentially sensitive data domains and entities.
- Store and manage tags and mapping tables in an isolated security database with minimal access allowed.
-
- Hide data that users should not access directly in one or more isolated databases, only giving access to necessary processes.
- Create views and derived tables in another limited-access database, directing users there.
-
- Where feasible, also store sensitive data elements separately, allowing only authorized users to connect the sensitive and non-sensitive portions.
- Apply dynamic data masking to all potentially sensitive columns using the tags above, with a default-nothing approach that masks data for all but explicitly authorized users.
- Apply row access policies to all potentially sensitive tables, again with a default-nothing approach.
For anonymization of production data in lower environments (e.g., for testing), the same masking policies will be applied if you clone data between environments. If you need further anonymization (or true irreversible anonymization), you can create and apply similar masking policies while making an automated copy of the data, resulting in an irreversibly-masked derived table.
Conclusion
The Snowflake security and data privacy tools described in this series do take work to implement properly, but all of this is far easier than it used to be with traditional database platforms. Equally as important is the value side of the equation: your information does you no good locked up in a vault nobody can access.
A safety net of reliable access control means you can unlock the value of your data without putting it – or your organization – at risk. You can empower your employees to analyze the information you have and get creative with new ways of using it. Your IT organization can be faster and more cost-effective, and your business users can be more self-sufficient. Using data shares (and even the Data Marketplace), you can monetize your data with low maintenance efforts and complete control.
Companies are already seeing the opportunities here: cloud CRM vendor Hubspot was able to skip the development and maintenance of an API for their customers to retrieve data, instead configuring a data share and saying, “take what you need.” A major processor of airline tickets opened a new revenue stream by sharing aggregated sales information with banks and market research companies.
We hope you’ll take advantage of this information to build your organization’s safety net and start to explore the possibilities that it brings.