next up previous index
Next: Properties of Maximum Likelihood Up: Parameter Estimation Using Maximum Previous: Overview   Index

Click for printer friendely version of this HowTo

Method

Maximum Likelihood simply uses all those Max/Min strategies that we learned in high-school calculus and then promptly forgot.

Here's the general strategy in for solving for the value of a parameter that maximizes the probability of the data:

  1. Take the first derivative of the function with respect to the parameter that you want to solve for.
  2. Set the derivative equal to zero and attempt to solve for the parameter.
  3. If you come up with a single solution, take the second derivative of the original equation with respect to the parameter, substitute in your solution for the parameter and then check to see that it is less than zero. If so, then you have found the value that maximizes the function. (This has worked in almost every situation I have encountered.)
  4. If you come up with multiple solutions, check all the solutions and check the endpoints of the range as well. (You almost never have to do this.)

Often times the log of the likelihood function is maximized instead of just the likelihood function. This is because it is almost easier to work with the log of the likelihood function than the likelihood function itself. We can justify this simplification because all probability distributions are non-negative for the domain of $ x$, and the function $ \log[x]$ is an increasing function in $ x$, thus, the solution for the parameter that maximizes the probability distribution given the data is the same as the maximum of the natural logarithm of the distribution given the data. Also, we'll use the notation, $ \mathcal{L}(\theta \vert {\bf X})$ to mean The maximum with respect to $ \theta$ (the parameter that we want to estimate) of the probability of the data, X. It is also worth noting that most statisticians use ``log'' to mean ``natural log'' or ``ln''.

no_titleno_title

Example 3.9.2.2 (no_title)  

From the overview, let's assume that we have X, a vector of $ n$ independent data points, $ x_1, \ldots, x_n$, collected from the same normal distribution where both $ \mu $ and $ \sigma^2$ are unknown. Since each element in X is independent, the probability of the data as a whole is the product of the probability of each element in X.3.5

We will begin by finding an estimate for $ \mu $. To do this we will assume that we know $ \sigma^2$.


$\displaystyle \mathcal{L}(\mu , \sigma^2 \vert {\bf X})$ $\displaystyle =$ $\displaystyle \prod^{n}_{i=1} \frac{1}{\sqrt{2\pi \sigma^2}} e^{\frac{1}{2
\sigma^2}(x_i - \mu)^2}$  
  $\displaystyle =$ $\displaystyle \frac{1}{(2 \pi \sigma^2)^{n/2}} e^{\frac{1}{2 \sigma^2} \sum(x_i
- \mu)^2},$  

and

\begin{multline}
\log\left[\mathcal{L}(\mu , \sigma^2 \vert {\bf X})
\right] =
...
...og[\sigma^2]\\
-
\frac{1}{2\sigma^2}\sum^n_{i = 1}(x_i - \mu)^2.
\end{multline}

The partial derivative with respect to $ \mu $ is,

$\displaystyle \frac{\partial}{\partial \mu}\log\left[\mathcal{L}(\mu,
\sigma^2 ...
...] = \frac{1}{\sigma^2} \sum^n_{i = 1}(x_i -
\mu) \stackrel{\mathrm{set}}{=} 0,
$

Thus,

$\displaystyle \frac{1}{\sigma^2} \sum^n_{i = 1}(x_i - \mu)$ $\displaystyle = 0$    
$\displaystyle \sum^n_{i = 1}(x_i - \mu)$ $\displaystyle = 0$    
$\displaystyle \sum^n_{i = 1}x_i - n\mu$ $\displaystyle = 0$    
$\displaystyle n\mu$ $\displaystyle = \sum^n_{i = 1}x_i$    
$\displaystyle \hat{\mu}$ $\displaystyle = \frac{1}{n} \sum^n_{i = 1}x_i = \bar{X}.$ (3.9.1)

Verifying that $ \hat{\mu}$ is indeed a maximum requires us to take the second derivative of Equation 3.9.1 and make sure it is negative.

$\displaystyle \frac{\partial^2}{(\partial \mu)^2}\log\left[\mathcal{L}(\mu,
\sigma^2 \vert {\bf X})\right] = \frac{-n}{\sigma^2} < 0.
$

Thus, since $ \hat{\mu}$ is the only extreme point, is indeed a maximum.

Now we will solve for $ \hat{\sigma}^2$, the MLE of $ \sigma^2$. Starting from Equation 3.9.1 and substituting in our solution for $ \mu $, we can take the partial derivative with respect to $ \sigma^2$. Thus,

$\displaystyle \frac{\partial}{\partial \sigma^2}\log\left[\mathcal{L}(
\sigma^2...
...rac{1}{2
\sigma^4} \sum^n_{i=1}(x_i - \bar{X})^2 \stackrel{\mathrm{set}}{=} 0,
$

and

$\displaystyle \frac{-n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum^n_{i=1}(x_i - \bar{X})^2$ $\displaystyle = 0$    
$\displaystyle -n\sigma^2 + \sum^n_{i=1}(x_i - \bar{X})^2$ $\displaystyle = 0$    
$\displaystyle \sigma^2$ $\displaystyle = \frac{1}{n}\sum^n_{i=1}(x_i - \bar{X})^2.$ (3.9.2)

To verify that our solution for $ \hat{\sigma}^2$ is indeed a maximum, we have,

$\displaystyle \frac{\partial^2}{(\partial \sigma^2)^2}\log\left[\mathcal{L}(
\s...
...t] =
\frac{n}{2\sigma^4} - \frac{1}{\sigma^6}\sum^n_{i = 1}(x_i - \bar{X})^2.
$

Substituting in our solution for $ \sigma^2$ we have,

\begin{multline*}
\frac{n^3}{2\sum(x_i - \bar{X})2} - \frac{n^3}{\sum(x_i -
\ba...
...\sum(x_i - \bar{X})^2} - \frac{n^3}{\sum(x_i -
\bar{X})^2} < 0,
\end{multline*}

and thus, our solution for $ \sigma^2$ is also a maximum.

One final note before we conclude this example. If we had attempted to solve for the MLE for $ \sigma^2$ before we solved for $ \hat{\mu}$, then we would to have ended up with the solution

$\displaystyle \sigma^2 = \frac{1}{n}\sum^n_{i=1}(x_i - \mu)^2
$

which still contains the unknown parameter $ \mu $. At this point, we would have to pause in our derivation of $ \hat{\sigma}^2$ and solve for $ \hat{\mu}$. Once we had a solution for $ \hat{\mu}$, we would then substitute it in for $ \mu $ to complete our derivation of $ \hat{\sigma}^2$. $ \vert\boldsymbol{\vert}$


next up previous index
Next: Properties of Maximum Likelihood Up: Parameter Estimation Using Maximum Previous: Overview   Index

Click for printer friendely version of this HowTo

Frank Starmer 2004-05-19
>