Machine Learning is commonplace in today’s digital society. Its impact on business practices only increases with its functionality.
In the first installment of this series, my colleague Carmen Fontana talked about why everyone should get excited about Artificial Intelligence and Machine Learning. I agree!
Machine Learning (ML) continues to grow in its impact, providing exciting learning opportunities for technologists like myself.
So what is ML exactly? I’ll explain the basics below.
Computers can learn!
Before getting deep into ML, let’s start with a basic definition.
I have seen many complex definitions, but the one I find most impactful is also one of the simplest: Machine learning “gives computers the ability to learn without being explicitly programmed” (Arthur Samuel, 1959).
ML started in the ’50s and has risen and fallen in fashion over the years. However, ML is in its prime now thanks to the popularity of Cloud technologies.
Cloud enables ML to ingest and compute enormous amounts of data, allowing it to be more powerful. Additionally, new Cloud services allow ML to be much more accessible than previously known.
The predictive features of ML allow it to be highly useful in things like fraud detection, customer services, energy production, healthcare, security, manufacturing, and many others.
Data and Learning
There are two basic types type of ML: Unsupervised and Supervised.
The essential difference between Supervised and Unsupervised Learning are the types of data they ingest and the algorithms they leverage.
Unsupervised Learning uses unlabeled data and “self-guided” learning algorithms. Supervised Learning, on the other hand, uses labeled data and defined training algorithms.
The primary goals are also different. In Supervised Learning, predictive analytics is the main goal. In contrast, Unsupervised Learning focuses on finding data patterns.
When we think about predicting outcomes with ML, we are typically referring to Supervised Learning.
AWS ML Services
Most of AWS ML Services orient towards Supervised Learning. Some of the most commonly used services are:
- Amazon SageMaker
- AWS DeepLens
- Amazon Lex
- Machine Learning
- Amazon Polly
That said, services like Amazon EMR with Spark Machine Learning Library are useful for unlabeled data and Unsupervised Learning.
Machine Learning Workflow
There are five core tasks in the common ML workflow:
1. Get Data
The first step in the Machine Learning process is getting data.
This process depends on your project and data type. For example, are you planning to collect real-time data from an IoT system or static data from an existing database?
You can also use data from internet repositories sites such as Kaggle and others.
2. Clean, Prepare & Manipulate Data
Real-world data often has unorganized, missing, or noisy elements. Therefore, for Machine Learning success, after we chose our data, we need to clean, prepare, and manipulate the data.
This process is a critical step, and people typically spend up to 80% of their time in this stage. Having a clean data set helps with your model’s accuracy down the road.
After getting the data to a state you like, you need to convert the data sets into valid formats for your chosen ML platform. For example, you may need to translate the data into a .CSV file and upload to AWS S3.
Finally, you split your data into training and test data sets. The training set is used to train the model in the next step, while the test data is used to validate the model in the fourth step. The typical default is a 70/30 split between training and test sets.
3. Train Model
This step is where the magic happens! The data set connects to an algorithm, and the algorithm leverages sophisticated mathematical modeling to learn and develop predictions.
These algorithms commonly fall into one of three categories:
- Binary – Classify into two categories
- Classification – Classify into many categories
- Regression – Predict a numeric
4. Test Model
Now, it’s time to validate your trained model. Using the test data from Step 3, we check the model’s accuracy.
If the results are not satisfactory, you need to improve and retrain your ML model (Step 5).
Practice makes perfect! Here are a few things you can do to refine your model and improve accuracy:
- Review your model’s results with your business stakeholders. Are there other data elements worth adding to your model to make it more accurate?
- Reconsider your algorithm choice. Within each class of algorithm, there are dozens of algorithm choices. A different algorithm may perform better for you
- Adjust the parameters of your chosen algorithm to improve performance. Sometimes small adjustments have a significant impact.
These are just high-level steps – ML is as complicated as you choose! That said, by understanding the basic process of developing ML models, you gain a solid foundation for further learning.
Speaking of which, check out the rest of our five-part blog series on Artificial Intelligence and Machine Learning. There is something for everyone from Data Science principles to leveraging your trained models to understanding the Change Management impact of AI.