The steps for obtaining MLE (maximum likelihood estimate) of a model's parameters are described as follows:

- Perform a certain experiment to collect the data.
- Choose a parametric model of the data, with certain modifiable parameters.
- Formulate the likelihood as an objective function to be maximized.
- Maximize the objective function and derive the parameters of the model.
We shall take the case of a discrete variable as an example. Suppose we have an unfair coin and we want to determine the proability of obtaining the head after toss the coin. Suppose after 5 tosses, we have 3 heads and 2 tails. Then what is the probability p of getting the head after a toss? From our basic common sense, the probability of getting the head is 0.6 (=3/5) and the tail 0.4 (2/5). However, here we want to try another more theory oriented approach to this problem.

Supposer the probability of head and tail are p and q, respectively. If we view the 5 tosses as independents events, then the probability of obtaining 3 heads and 2 tails is $$ J(p, q) = p^3q^2, \; with \; p+q=1. $$ Since 3 heads and 2 tails are already observed, it must be an event of big probability. As a result, we can search for $p$ and $q$ such that the above objective function is maximized. Before attempting to maximize the above objective function, we shall know the following inequality: $$ \frac{\sum_{i=1}^n x_i}{n} \geq \left(\prod_{i=1}^n x_i\right)^{1/n}, \; with \; x_i \geq 0 \; \forall i. $$ The equality holds only when $x_1 = x_2 = \cdots = x_n$.

By using the above inequality, we have $$ \frac{p/3+p/3+p/3+q/2+q/2}{5} \geq \left( \left(\frac{p}{3}\right)^3 \left(\frac{q}{2}\right)^2 \right)^{1/5} $$ The equality holds only when p/3=q/2. Since p+q=1, we have p=3/5 and q=2/5.

Similarly, suppose we toss a 3-side die for n times and obtain $n_1$ of side 1, $n_2$ of side 2, and $n_3$ of side 3, then what is the most likely probabilities for sides 1, 2, and 3, respectively? Again, we can formulate the objecitve function as the likelhood: $$ J(p, q, r)=p^{n_1}q^{n_2}r^{n_3}, \; with \; p+q+r=1. $$ By using the above inequality, we can obtain the maximizing values of $p$, $q$, and $r$ as follows: $$ \frac{p}{n_1}=\frac{q}{n_2}=\frac{r}{n_3} \Longrightarrow p=\frac{n_1}{n_1+n_2+n_3}, q=\frac{n_2}{n_1+n_2+n_3}, r=\frac{n_3}{n_1+n_2+n_3}. $$

Data Clustering and Pattern Recognition (資料分群與樣式辨認)