In this segment of “Office Optional with Larry English,” Larry discusses why having good data is a key part of your AI strategy.
A few years ago, one of Uber’s self-driving cars hit and killed a pedestrian crossing a street outside a crosswalk. What went wrong? When the technologists trained the car to recognize pedestrians, they mostly used images containing a crosswalk. They had inadvertently “taught” the AI system that the crosswalk was the important part.
While most companies implementing AI into their operations aren’t dealing with anything so important as human life, there’s a salient lesson here: Feed AI systems bad data, you’ll get bad results. AI will undoubtedly become the next big business differentiator, but only for companies that can get their data under control.
Bad Data, Bad AI
Responsible AI is such a buzzword these days because so many companies have a serious data problem — they don’t know what data they have. It’s inconsistent and unsecure. And feeding unknown, unmanaged data into an AI system is just asking for a data breach, regulatory violation, misinformed strategic decisions, unintended bias, or reputational damage to happen.
The problem is many companies have a data mess on their hands. Either they have a haphazard strategy or no strategy at all for data governance, the rules and processes for collecting, using and storing data.
Organizations don’t pause to figure out their data strategy, intent on chasing after flashier, revenue-generating projects. However, when companies want to put that data together — say, for an AI tool — they can’t do it because there aren’t any overarching rules around how to handle data. They’re left with a big mess that takes a hefty amount of time and investment to untangle.
Retroactively applying data governance to all the data in an organization is a colossal undertaking. Thankfully, it’s not necessary to go that big to embark on your next AI project.
A Pragmatic Approach To Fixing Your Data
Here’s a pragmatic, just-in-time approach to fixing your data, leveraging the power of AI, and creating value incrementally along the way:
Pick a use case.
Start out by picking a single use case for AI. What’s a major business mandate AI can help with? Where do you know you have proprietary or third-party data that can be mined for AI? You’ll want to channel Goldilocks here, picking a use case that’s neither too big nor too small, ideally something that’s internal. Your first use case should also have limited data domain requirements—in other words, a use case that only requires data from one source.
Then, figure out the state of the data you’ll be working with. What do you need to correct before feeding that data into an AI system?
Fix the data required for that use case.
Once you have a feasible use case and have assessed the state of the data needed to move forward, it’s time for a clean-up job. Your data doesn’t have to be perfect to start creating value from an AI tool, but you do need to understand its flaws before you leverage it.
You’ll need to put as much governance and strategy in place as needed for that single-use case. Non-negotiable data governance components include:
- Data acquisition: How are you going to get your data from the source system and where are you going to store it?
- Data quality: How complete and accurate is your data? Does the data carry the risk of AI bias? Do you need to clean it up before feeding it into an AI system?
- Data privacy: Does your data include private or protected information, such as health information? Will the AI system put together data in a way that causes privacy problems? For example, if AI adds information to generic customer records that makes individuals identifiable in a protected manner, that’s called classification by compilation and needs to be guarded against.
These elements together form data governance, a plan for how you’re going to get your data, how it can be used and the appropriate controls and policies in place to guard against misuse.
Create your overarching data strategy.
At the same time as you’re exploring your initial use case, begin putting together an overarching data framework and strategy. This will inform how you collect, maintain and secure data throughout your entire organization moving forward.
Once your first use case is complete, pick another area to focus on. Build on the successes and lessons of the first use case. How do you need to clear your date for your next project? How do you need to tweak your data strategy? Keep repeating, cleaning up your data along the way.
The mantra for this approach is to think big, start small. By going one use case at a time while getting your organization’s data in order, you’ll incrementally create value with AI while building a solid foundation of data governance to fuel any future AI initiatives.