What is Machine Learning?

In a previous article we already talked about Machine Learning without defining it as a subset or tool of artificial intelligence (weak).

Machine Learning algorithms use statistics to find patterns in massive amounts of data. When we talk about data we are considering numbers, words, images, sounds, in general anything that can be digitally stored can be fed to a Machine Learning algorithm to draw conclusions.

Later we will discuss the fundamental difference with Deep Learning, which is another term that is often used as a synonym for Artificial Intelligence.

Machine Learning algorithms are classified into three main families: Supervised, Unsupervised and Reinforcement Learning (we will skip this for now). This classification is made on the basis of whether the data set from which we want to draw conclusions is labeled or not, i.e. whether it contains the answer to the conclusion we want to draw about the data.

For example, in the case of the study of the price of apartments as a function of m2 and distance to the center of a city the model would be (surface, distance) ? price.

(100m2, 3km) ? 375,000 €.

Unsupervised models are used when is not possible to have a labeled dataset, a very interesting use case is in anomaly detection, we have a series of observations recorded with several variables that record the performance of an engine model, and we want to find out how we can characterize those manufactured engines that may fail “prematurely”. 

Types of supervised machine learning algorithms

The oldest and simplest algorithm of all is the linear regression: we have a cloud of points and we want to find the shape of a line that makes all the points at minimum distance from that line. The system is said to learn the coefficients of the line that defines our prediction function, the learning reaches a goal when it manages to minimize the error between the prediction and the actual point for each of the training data.

If we define the training data as (X1, Y1), … (Xn, Yn), … (Xn, Yn), … (Xn, Yn), … (Xn, Yn).

We have to find the coefficients a1 and a0 of the prediction function: H(x) = a1*x + a0

In an iterative process called gradient descent, different values of the coefficients are tested so that the total error is minimized, i.e. the function is minimized.

∈(a1, a0) = i=1i=n(Yi-HXi)2

The philosophy is the same when we approach a numerical value prediction problem when the data space has more than 2 variables or we observe that the fit cannot be linear but polynomial.

With Supervised Machine Learning we can, in general, perform two main tasks: Prediction of a numerical value or classification of a data in one of the classes we want to classify it, e.g. Fraudulent, Legal.

Classification problems can be tackled with several types of classification. The simplest of them is logistic regression, very similar to linear regression, the particularity is that it is a binary classifier (2 classes).

We have very elegant ways to tackle the generalization problem when we have to implement a multi-class classifier. One of the most common algorithms is Support Vector Machines where we establish a kind of boundaries between the data and this boundary sets the “region” or class where the data goes, we can also find k-nearest neighbors, Naive Bayes and decision trees. 

Types of Unsupervised Machine Learning Algorithms

Unsupervised learning is a type of machine learning that looks for patterns in unlabeled data. The learning is based on the probability function that is calculated on the inputs.

The main tasks performed with unsupervised learning are on the one hand the formation of groups (Clustering), the detection of anomalies and the reduction of dimensions to represent complex problems in a space of fewer dimensions that allows us to analyze it better, losing the minimum information about the data, the best-known algorithm is the PCA (Principal Component Analysis).

The clearest example of unsupervised learning that we have been living with for years is the recommendation engine of e-commerce sites such as Amazon.

To oversimplify, with a 2-dimensional space (X: Customers, Y: Products) and represent the purchase history of all customers, we will have a cloud of points with a certain structure in groups. Once identified the boundaries of each group, if a new customer (Cn) through a purchase we see that we can associate it to a particular group, there is a high probability of purchase if we recommend the products selected in the orange ellipse.

In future articles we will set the basic concepts of artificial intelligence, Machine Learning and Deep Learning that allow us to address use cases, especially in the field of competitive intelligence in companies.