[chinese][all] In the previous section, we have covered the mathematics of EM (expectation maximization) which, under the framework of MLE, can be employed to identify the optimum parameters for a GMM. In this section, we shall demonstrate the use of GMM for PDF modeling.

For the first example, we shall use GMM for modeling the probability density function of a 1D data, as follows.

In the previous example, the data is generated via three Gaussian PDFs centered at -2, 0, 2 (please refer to the contents of dcData.m). The first plot is the histogram of the dataset; the second plot is the curve of log probability w.r.t. the number of iterations. From the above example, we have the following observations:

- The identified centers are very close to the means of the three Gaussian PDF.
- Log probability is monotonically nondecreasing throughout the training iterations.
We can use the following example to plot the PDF after training:

From the above example, the identified GMM PDF can match the data histogram closely. This is based on the following three conditions:

- The size of data is large enough. (The above example has 600 data entries.)
- We are able to guess the number of Gaussians correctly.
- The data is indeed governed by GMM.
In practice, the above three conditions do not always hold. The basic remedies are:

- Try to collect as much data as possible.
- Use some heuristic search to find the optimum number of Gaussian PDFs.
- Increase the number of mixtures so we can approximate any PDF using the training data.
In the following example, we should use GMM to model the 2D donut dataset, as follows:

In the above example, you should be able to see the flashy animation during the training process. Moreover, since we have set gmmPrm.useKmeans=0, the training process will randomly select several data points as the initial centers instead of using k-means for determining a set of better centers. Since the initial centers are randomly selected, the program will need more time to adjust these 6 Gaussians.

Not every dataset modeled by GMM will generate satisfactory result. An example follows.

Judging from the scatter plot of the data set, we should have three Gaussians to cover the three clusters. The first two at the upper left corner should be sharper while the third one at the center should be flatter. In practice, it is likely to have the situation with "big circle surrounds small one", indicating the training process was trapped in a local maximum. (Since the data is randomly generated, you should try the program several times to obtain several possible results.)

Here is another example of GMM modeling (using 4 Gaussians) over 2D data:

Data Clustering and Pattern Recognition (資料分群與樣式辨認)