5-6 Quadratic Classifiers (�G��������)

[english][all]

(½Ðª`·N¡G¤¤¤åª©¥»¨Ã¥¼ÀH­^¤åª©¥»¦P¨B§ó·s¡I)

Slides

¦pªG§Ú­Ì±N¨C¤@µ§¸ê®Æµø¬°¦b°ªºûªÅ¶¡¤¤ªº¤@ÂI¡A¦Ó¥B³o¨Ç¦P¤@Ãþ§Oªº¸ê®ÆÂI¬O¥Ñ¤@­Ó°ªºû°ª´µ¾÷²v±K«×¨ç¼Æ©Ò²£¥Í¡A¨º»ò§Ú­Ì´N¥i¥H¨Ï¥Î MLE ªº¤èªk¨Ó¨D¥X³o­Ó°ª´µ±K«×¨ç¼Æªº³Ì¨Î°Ñ¼Æ­È¡C¨Ï¥Î³oºØ¤èªk©Ò¨D±oªº¤ÀÃþ¾¹¡AºÙ¬°¡u¤G¦¸¤ÀÃþ¾¹¡v¡]Quadratic classifier¡^¡A¦]¬°¦¹¤èªk©Ò²£¥Íªº¨Mµ¦¤À¬É½u¡]Decision Boundaries¡^³£¬O¿é¤J¯S¼xªº¤G¦¸¨ç¼Æ¡C¨ä°µªk¥i¥H²­z¦p¤U¡G

  1. °²³]¨C¤@­ÓÃþ§Oªº¸ê®Æ§¡¬O¥Ñ d ºûªº°ª´µ¾÷²v±K«×¨ç¼Æ¡]Gaussian probability density function¡^©Ò²£¥Í¡G¡G
    gi(x, m, S) = (2p)-d/2*det(S)-0.5*exp[-(x-m)TS-1(x-m)/2]
    ¨ä¤¤ m ¬O¦¹°ª´µ¾÷²v±K«×¨ç¼Æªº¥­§¡¦V¶q¡]Mean vector¡^¡AS «h¬O¨ä¦@Åܲ§¯x°}¡]Covariance matrix¡^¡A§Ú­Ì¥i¥H®Ú¾Ú MLE¡A²£¥Í³Ì¨Îªº¥­§¡¦V¶q m ©M¦@Åܲ§¯x°} S¡C
  2. ­Y¦³»Ý­n¡A¥i¥H¹ï¨C¤@­Ó°ª´µ¾÷²v±K«×¨ç¼Æ­¼¤W¤@­ÓÅv­« wi¡C
  3. ¦b¹ê»Ú¶i¦æ¤ÀÃþ®É¡Awi*gi(x, m, S) ¶V¤j¡A«h¸ê®Æ x ÁõÄÝ©óÃþ§O i ªº¥i¯à©Ê´N¶V°ª¡C

¦b¹ê»Ú¶i¦æ¹Bºâ®É¡A§Ú­Ì³q±`¤£¥h­pºâ wi*gi(x, m, S) ¡A¦Ó¬O­pºâ log(wi*gi(x, m, S)) = log(wi) + log(gi(x, m, S))¡A¥H«KÁ׶}­pºâ«ü¼Æ®É¥i¯àµo¥ÍªººØºØ°ÝÃD¡]¦pºë½T«×¤£¨¬¡B­pºâ¯Ó®É¡^¡Alog(gi(x, m, S)) ªº¤½¦¡¦p¤U¡G $$ \ln \left( p(c_i)g(\mathbf{x}|\mathbf{\mu}, \Sigma) \right) = \ln p(c_i) - \frac{d \ln(2\pi)+ \ln |\Sigma|}{2} - \frac{(\mathbf{x}-\mathbf{\mu})^T\Sigma^{-1}(\mathbf{x}-\mathbf{\mu})}{2} $$ The decision boundary between class i and j is represented by the following trajectory: $$ p(c_i)g(\mathbf{x}|\mathbf{\mu}_i, \Sigma_i) = p(c_j)g(\mathbf{x}|\mathbf{\mu}_j, \Sigma_j) $$ Taking the logrithm of both sides, we have $$ \ln p(c_i) - \frac{d \ln(2\pi)+ \ln |\Sigma_i|}{2} - \frac{(\mathbf{x}-\mathbf{\mu}_i)^T\Sigma_i^{-1}(\mathbf{x}-\mathbf{\mu}_i)}{2} = \ln p(c_j) - \frac{d \ln(2\pi)+ \ln |\Sigma_j|}{2} - \frac{(\mathbf{x}-\mathbf{\mu}_j)^T\Sigma_j^{-1}(\mathbf{x}-\mathbf{\mu}_j)}{2} $$ After simplification, we have the decision boundary as the following equation: $$ 2\ln p(c_i) + \ln |\Sigma_i| - (\mathbf{x}-\mathbf{\mu}_i)^T\Sigma_i^{-1}(\mathbf{x}-\mathbf{\mu}_i) = 2\ln p(c_j) + \ln |\Sigma_j| - (\mathbf{x}-\mathbf{\mu}_j)^T\Sigma_j^{-1}(\mathbf{x}-\mathbf{\mu}_j) $$ $$ (\mathbf{x}-\mathbf{\mu}_i)^T\Sigma_i^{-1}(\mathbf{x}-\mathbf{\mu}_i) - (\mathbf{x}-\mathbf{\mu}_j)^T\Sigma_j^{-1}(\mathbf{x}-\mathbf{\mu}_j) = \ln \left( \frac{p^2(c_i)|\Sigma_i|}{p^2(c_j)|\Sigma_j|} \right) $$ where the right-hand side is a constant. Since both $(\mathbf{x}-\mathbf{\mu}_i)^T\Sigma_i^{-1}(\mathbf{x}-\mathbf{\mu}_i)$ and $(\mathbf{x}-\mathbf{\mu}_j)^T\Sigma_j^{-1}(\mathbf{x}-\mathbf{\mu}_j)$ are quadratic, the above equation represents a decision boundary of the quadratic form in the d-dimensional feature space.

In particular, if $\Sigma_i=\Sigma_j=\Sigma$, the decision boundary is reduced to a linear equation, as follows: $$ \underbrace{(\mathbf{\mu}_i-\mathbf{\mu}_j) \Sigma}_{\mathbf{c}} \mathbf{x} = \underbrace{\mathbf{\mu}_i \Sigma \mathbf{\mu}_i - \mathbf{\mu}_j \Sigma \mathbf{\mu}_j - \ln \left( \frac{p^2(c_i)|\Sigma_i|}{p^2(c_j)|\Sigma_j|} \right)}_{constant} \Longrightarrow \mathbf{cx}=constant $$

¦b¥H¤U½d¨Ò¡A§Ú­Ì±N¨Ï¥Î¤G¦¸¤ÀÃþ¾¹¨Ó¹ï IRIS ¸ê®Æªº²Ä¤Tºû¤Î²Ä¥|ºû¶i¦æ¤ÀÃþ¡C­º¥ý¡A§Ú­Ì¥ýµe¥X¸ê®Æªº¤À§G¹Ï¡G

Example 1: qcDataPlot01.mDS = prData('iris'); DS.input=DS.input(3:4, :); % Only take dimensions 3 and 4 for 2d visualization dsScatterPlot(DS); % Scatter plot of the dataset

±µµÛ¡A§Ú­Ì¥i¥H¨Ï¥Î qcTrain ¨Ó«Ø³y¤@­Ó QC¡G

Example 2: qcTrain01.mDS = prData('iris'); DS.input=DS.input(3:4, :); % Only take dimensions 3 and 4 for 2d visualization [qcPrm, logProb, recogRate, hitIndex]=qcTrain(DS); fprintf('Recog. rate = %f%%\n', recogRate*100);Recog. rate = 98.000000%

¥Ñ¤W¨Ò¥iª¾¡A­Y¥u¨Ï¥Î²Ä¤Tºû©M²Ä¥|ºû¯S¼x¡AQC ªº¿ëÃѲv¬O 98%¡A¦¹Ãþ´ú¸ÕºÙ¬°¤º³¡´ú¸Õ¡C

¤W¹Ï¨q¥X¸ê®ÆÂI¡A¥H¤Î¤ÀÃþ¿ù»~ªºÂI¡]¤e¤e¡^¡C¯S§O»Ý­nª`·Nªº¬O¡A¦b¤W­zªºµ{¦¡½X¤¤¡A§Ú­Ì¥Î¨ì classWeight¡A³o¬O¤@­Ó¦V¶q¡A¥Î¨Ó«ü©w¨C¤@­ÓÃþ§OªºÅv­«¡A³q±`¦³¨âºØ°µªk¡G

§Ú­Ì¤]¥i¥H±N¨C­ÓÃþ§Oªº°ª´µ±K«×¨ç¼Æ¥H¤Tºû¦±­±§e²{¡A¨Ãµe¥X¨äµ¥°ª½u¡A½Ð¨£¤U¦C½d¨Ò¡G

Example 3: qcPlot01.mDS=prData('iris'); DS.input=DS.input(3:4, :); [qcPrm, logProb, recogRate, hitIndex]=qcTrain(DS); qcPlot(DS, qcPrm, '2dPdf');

®Ú¾Ú³o¨Ç°ª´µ±K«×¨ç¼Æ¡A§Ú­Ì´N¥i¥Hµe¥X¨C­ÓÃþ§OªºÃä¬É¡A¦p¤U¡G

Example 4: qcPlot02.mDS=prData('iris'); DS.input=DS.input(3:4, :); [qcPrm, logProb, recogRate, hitIndex]=qcTrain(DS); DS.hitIndex=hitIndex; % Attach hitIndex to DS for plotting qcPlot(DS, qcPrm, 'decBoundary');

¨Æ¹ê¤W¡A¦p¤§«e¤§±À¾É¡A³o¨ÇÃä¬É³£¬O¤G¦¸¨ç¼Æ¡A¥i¥Ñ¤W­z¹Ï§ÎÅçÃÒ¤§¡C

¦pªG¨C­ÓÃþ§Oªº¸ê®Æ¶q¬Û®t¤Ó¤j¡A¤G¦¸¤ÀÃþ¾¹¥i¯à±o¨ì¬Ý¦ü¿ù»~ªºµ²ªG¡A¦ý¹ê»Ú¤W«o¬O¹ïªº¡C³oºØ±¡ªp§Ú­Ì¯d«Ý§@·~¨Ó°Q½×¡C

§Ú­Ì¤]¥i¥H®Ú¾Ú false positive ©M false negative ©Ò±a¨Óªº cost ¨Ó¶i¦æÅv­«ªº³]©w¡A«áÄò¸Ô­z¡C

¥t¤@¤è­±¡A¦pªG°V½m¸ê®Æ¤ñ¸û½ÆÂø¡AµLªk¥H²³æªº°ª´µ¾÷²v±K«×¨ç¼Æ¨Ó¶i¦æº¡·Nªº¤ÀÃþ¡A¦¹®É§Ú­Ì´N¥i¥H¿ï¥ÎÃþ¦ü¦ý¸û½ÆÂøªº¤èªk¡A¨Ò¦p Gaussian Mixture Models¡A²ºÙ GMM¡A¸Ô¨£«á­z¡C


Data Clustering and Pattern Recognition (¸ê®Æ¤À¸s»P¼Ë¦¡¿ë»{)