4-2 MLE for discrete event

The steps for obtaining MLE (maximum likelihood estimate) of a model's parameters are described as follows:
  1. Perform a certain experiment to collect the data.
  2. Choose a parametric model of the data, with certain modifiable parameters.
  3. Formulate the likelihood as an objective function to be maximized.
  4. Maximize the objective function and derive the parameters of the model.

We shall take the case of a discrete variable as an example. Suppose we have an unfair coin and we want to determine the proability of obtaining the head after toss the coin. Suppose after 5 tosses, we have 3 heads and 2 tails. Then what is the probability p of getting the head after a toss? From our basic common sense, the probability of getting the head is 0.6 (=3/5) and the tail 0.4 (2/5). However, here we want to try another more theory oriented approach to this problem.

Supposer the probability of head and tail are p and q, respectively. If we view the 5 tosses as independents events, then the probability of obtaining 3 heads and 2 tails is $$ J(p, q) = p^3q^2, \; with \; p+q=1. $$ Since 3 heads and 2 tails are already observed, it must be an event of big probability. As a result, we can search for $p$ and $q$ such that the above objective function is maximized. Before attempting to maximize the above objective function, we shall know the following inequality: $$ \frac{\sum_{i=1}^n x_i}{n} \geq \left(\prod_{i=1}^n x_i\right)^{1/n}, \; with \; x_i \geq 0 \; \forall i. $$ The equality holds only when $x_1 = x_2 = \cdots = x_n$.

By using the above inequality, we have $$ \frac{p/3+p/3+p/3+q/2+q/2}{5} \geq \left( \left(\frac{p}{3}\right)^3 \left(\frac{q}{2}\right)^2 \right)^{1/5} $$ The equality holds only when p/3=q/2. Since p+q=1, we have p=3/5 and q=2/5.

Similarly, suppose we toss a 3-side die for n times and obtain $n_1$ of side 1, $n_2$ of side 2, and $n_3$ of side 3, then what is the most likely probabilities for sides 1, 2, and 3, respectively? Again, we can formulate the objecitve function as the likelhood: $$ J(p, q, r)=p^{n_1}q^{n_2}r^{n_3}, \; with \; p+q+r=1. $$ By using the above inequality, we can obtain the maximizing values of $p$, $q$, and $r$ as follows: $$ \frac{p}{n_1}=\frac{q}{n_2}=\frac{r}{n_3} \Longrightarrow p=\frac{n_1}{n_1+n_2+n_3}, q=\frac{n_2}{n_1+n_2+n_3}, r=\frac{n_3}{n_1+n_2+n_3}. $$


Data Clustering and Pattern Recognition (資料分群與樣式辨認)