An organization’s volume of generated unstructured or semi-structured data has tremendous value. It may be less valuable say on a per mb basis than the information stored in your data warehouse and it may be some time after beginning to collect the data before the value is realized, but there’s no doubt that the details of the minute interactions between customers and systems can be leveraged to transform a business. Moreover, Hadoop is increasingly positioned as a landing zone for ALL an organization’s data, structured and unstructured, where exploratory analysis can be performed, as well as an archive for aged data from a data warehouse – see this blog post by a colleague to see where Hadoop fits in the IBM Watson Foundations vision. It is obvious then that an organization’s Hadoop store will generally contain sensitive data. It could be sensitive personal information governed by regulation or simply valuable and proprietary information, but it needs to be secured just the same as it would in a traditional relational data store.
As I hinted in my last post, the importance of governance of Big Data initiatives was something that was considered early on in IBM’s BigInsights development. Fortunately, IBM already had leading capabilities in-house for security and data privacy and extended these capabilities to the Big Data space. InfoSphere Data Privacy for Hadoop allows an organization to secure their Hadoop environments by:
- Defining and sharing big data project blueprints, data definitions – define a big data glossary of terms, define sensitive data definitions and policies
- Discovering and classifying sensitive big data – discover sensitive data and classify it
- Masking and redacting sensitive data within and for Hadoop systems – de-identify sensitive data either at the source or within Hadoop, and obfuscate data whether structured or unstructured
- Monitoring Hadoop Data Activity – monitor big data sources and the entire Hadoop stack and issue alerts as necessary, gather audit information for reporting purposes