Maximum Likelihood Estimation is estimating the best possible parameters which maximizes the probability of the event happening.

Let us see this step by step through an example. Then you will understand how maximum likelihood (MLE) applies to machine learning.

## Maximum Likelihood Estimation (MLE) – Example

### Problem:

A box contains 3 balls – could be yellow or red or both. You can’t look inside the box to see what color the balls are. But you get 5 chances to pick one ball at a time and then look at its color. Each time you put the ball back in, then shuffle and pick a new one. This is called “with replacement” method in probability calculation. By this way, the outcomes are independent, and not dependent on previous outcomes.

The outcome of five chances of picking is found to be: red, yellow, red, red, red (RYRRR).

You are asked to guess how many red balls are there in the box.

### Given:

You are told one thing that the box contains red and yellow balls. The contents of the box could be one of the following:

• all 3 red balls
• all 3 yellow balls
• 2 red 1 yellow balls
• 1 red 2 yellow balls

Only one of this is possible.

### Solution:

The below picture will be further broken down and explained in later sections. Feel free to scroll down if it looks a little complex.

As you were allowed five chances to pick one ball at a time, you proceed to chance 1. Let’s say, you pick a ball and it is found to be red. In second chance, you put the first ball back in, and pick a new one. It is found to be yellow ball. Similarly in the next 3 chances, you get red, red, red balls.

So, overall, in five picks you got red, yellow, red, red, red. So, now can you tell what is the color of the 3 balls that were present in the box? One thing we can be sure is it is not all red or all yellow. As our outcome in picking is a mix of colors.

As the outcomes contain both red and yellow (RYRRR), the box contents also has to have both red and yellow. Is it 2 red 1 yellow, or 1 red 2 yellow? In other words, the box contains how many red balls? (Because this is the initial question). How will you approach this problem?

Even though we know that the combination all red or all yellow is not correct, it is good to know how to solve this step by step.

We know that only four combinations are possible for the box contents. YYY, YYR, YRR, RRR. We will analyze each case and find which case gives the highest probability of getting RYRRR. That will be our answer.

### Scenario – 1 (Colors YYY)

Let’s analyze the first possibility.

What if originally the box contained all yellow balls? If that is the case, what is the probability that we got RYRRR in five picks. (We know that it does not make any sense. We can’t get a red ball out of a box containing all yellow balls). Still, we will go by procedure, and calculate it.

Contents of the box: YYY balls

Total number of balls in the box = 3

Number of yellow balls = 3

Number of red balls = 0

Probability of yellow ball P(Y) = Number of yellow balls / Total number of balls

= 3/3 = 1

Probability of red ball P(R) = 0/3 = Number of red balls / Total number of balls

= 0/3 = 0

Probability of getting RYRRR in five picks with replacement is:

P(RYRRR) = P(R) x P(Y) x P(R) x P(R) x P(R)

= 0 x 1 x 0 x 0 x 0

= 0

We have just proved that the box cannot contain all 3 yellow balls when it is possible to get RYRRR in five picks.

Let us calculate probability for rest of the 3 scenarios, and see which scenario has the maximum probability.

### Scenario – 2 (Colors YYR)

Let us analyze what happens if the box had contained 2 yellow and 1 red ball. What are the chances that you get RYRRR in 5 picks?

Contents of the box in this case: YYR balls

Total number of balls in the box = 3

Number of yellow balls = 2

Number of red balls = 1

Probability of yellow ball P(Y) = Number of yellow balls / Total number of balls

= 2/3

Probability of red ball = Number of red balls / Total number of balls

= 1/3

Probability of getting RYRRR in five picks with replacement is:

P(RYRRR) = P(R) x P(Y) X P(R) x P(R) x P(R)

= 1/3 x 2/3 x 1/3 x 1/3 x 1/3 = 2/243

= 0.0082

### Scenario – 3: (Colors YRR)

In this case, we will see what happens if the box contains 1 yellow 2 red balls. What are the chances that you get RYRRR in 5 picks?

Contents of the box in this case: YRR

Total number of balls in the box = 3

Number of yellow balls = 1

Number of red balls = 2

Probability of yellow ball = Number of yellow balls / Total number of balls

= 1/3

Probability of red ball = Number of red balls / Total number of balls

= 2/3

Probability of getting RYRRR in five picks with replacement is:

P(RYRRR) = P(R) x P(Y) x P(R) x P(R) x P(R)

= 2/3 x 1/3 x 2/3 x 2/3 x 2/3 = 16/243

= 0.0658

### Scenario – 4: (Colors RRR)

In this case, we will see what happens when all the balls in the box are red. What is the chance of getting RYRRR in five picks with replacement?

(We know there is no chance of getting a yellow ball from a box of all red balls. Let us still solve this case anyways).

Contents of the box in this case: RRR

Total number of balls in the box = 3

Number of yellow balls = 0

Number of red balls = 3

Probability of yellow balls = Number of yellow balls / Total number of balls

= 0/3 = 0

Probability of red balls = Number of red balls / Total number of balls

= 3/3 = 1

Probability of getting RYRRR in five picks with replacement is:

P(RYRRR) = P(R) x P(Y) x P(R) x P(R) x P(R)

= 1 x 0 x 1 x 1 x 1

= 0

### How many red balls in the box? – Answer

So far we have analyzed four scenarios to find which scenario has the highest likelihood of giving the result RYRRR. The probabilities are found as:

• Scenario – 1 : YYY in box : P(RYRRR) = 0
• Scenario – 2 : YYR in box : P(RYRRR) = 0.0082
• Scenario – 3 : YRR in box : P(RYRRR) = 0.0658
• Scenario – 4 : RRR in box : P(RYRRR) = 0

The third scenario YRR has the highest probability 0.0658. So if you want the outcome as RYRRR, then the input should have been YRR (1 yellow, 2 red balls) in the box. Because this scenario has the maximum chance (maximum likelihood) of giving the output RYRRR.

As we were initially asked the question “How many red balls are present in the box?”, now you know the answer. There are 2 red balls in the box. (Because the scenario YRR gives maximum likelihood).

The variable you are predicting is called theta. Here you are predicting the number of red balls in the box. So theta is the number of red balls in the box, which is found out using maximum likelihood estimation (MLE) as theta = 2.

## Maximum Likelihood in Machine Learning

We have just seen a simple example of predicting the number of red balls in the box.

In machine learning, you do prediction problems. You will be using machine learning models which uses some parameters. You will need to predict the best set of parameters of the model, so that the model will best fit the data.

For example, in linear regression, a best fit line is what the model needs to predict. A simple equation of line is y = mx + c. Here, m is slope and c is the y-intercept. These are the parameters which has to be predicted.

I mentioned it as simple equation because, it has only one dependent variable x. Usually, there will be many dependent variables. So, you will be predicting the coefficient of each variable, and the constant c.

In machine learning problems, what you want is a line which gives the least possible error. You have to estimate which parameters has the maximum chance (maximum likelihood) of giving such an output – similar to the balls in a box example we saw above.

In the example, we just predicted one variable – the number of red balls in the box. In machine learning, there will be many variables to predict.