Categories of Machine Learning 2

Last week we talked about the categories of machine learning based on their learning signal and response. This can basically be equated to the learning style of a person. This week we are going to talk about the other popular system for categorizing machine learning algorithms, categorization based on the desired outcome.

Today we have three main areas of interest in machine learning: classification, regression and clustering. These just happen to be three types of problems that are fairly easy for a person, but quite difficult with a computer system. People can easily pick up on significant details between objects, whereas a computer is only able to look at minute details.

The first category is that of classification. You can think of classification like sorting objects into piles. I like to give the example of working a jigsaw puzzle. Most people start working a jigsaw puzzle by sorting out all the edge pieces and then sorting the pieces further by shape and color. This is an easy problem for people; even a three-year-old is capable of sorting puzzle pieces, but training a computer for such a task requires showing the computer thousands of samples of puzzles pieces, each labeled as an edge or center piece. This step alone took computer scientists several years to accomplish.

Classification algorithms are most widely used for things like spam filters, to send junk e-mail directly to your trash. Training Google’s spam filter required millions of e-mail messages labeled by G-mail users as Spam before the automatic filtering was useable. The other widely accepted use-case for classification algorithms are Facebook’s facial recognition engine that automatically labels people in an image, and “Seek” by iNaturalist, which is an android phone application that uses a classification algorithm to identify plants and animals.

A regression algorithm, for lack of a better example, is an algorithm that is used to predict the next value in a sequence. These were my least favorite problems in algebra, given this sequence of numbers what comes next 1, 4, 9, 16, 25? I will let you enjoy figuring out the answer. There are several different regression algorithms, but I will focus on the simplest, known as linear regression. Linear regression attempts to find the best straight line through the series of known values in order to draw a line as close as possible to all the points. This is usually done using an algorithm known as the least squares regression line algorithm. Least squares measures the error (difference between the real point and the one on the line) and squares the value so all the errors turn out as positive numbers; whether the predicted number was high or low does not matter. The line that results in the lowest total of the square of the errors is used to predict the next value in the sequence. These are the algorithms primarily used by stock investors to attempt prediction of the stock exchange and weather forecasters in predicting the temperature and humidity; of course they are using much more complex algorithms than linear regression because they are predicting curves and fluctuations instead of just lines.

The last category is clustering. Clustering is very similar to classification, but with one major difference. In classification the categories are known from the start and the algorithm is trained with labeled data, just like we learned to work the puzzle by sorting pieces into categories. Clustering is like sorting the toys in your kids’ bedroom. You might start by thinking we have cars, blocks, puzzles and books, then suddenly you find horses and dinosaurs. The longer you clean the room, the more categories for toys you discover. This is the primary algorithm used in marketing to categorize items within the online store, so recommended items and similar items can be displayed while shopping. Next time you are given the task of organizing the kitchen or the junk drawer, remember you are using clustering to sort your items, because you never know what you may find.

Until next week, stay safe and learn something new.

Scott Hamilton is an Expert in Emerging Technologies at ATOS and can be reached with questions and comments via email to shamilton@techshepherd.org or through his website at https://www.techshepherd.org.