11-2 PCA (主è??†é??†æ?)

english version (½Ðª`·N¡G¤¤¤åª©¥»¨Ã¥¼ÀH­^¤åª©¥»¦P¨B§ó·s¡I)

­º¥ý¡A§Ú­Ì¬Ý¬Ý PCA ©Ò²£¥Íªº¥D¶b¤è¦V¡A¬O§_©M¸ê®Æ¤@­P¡A½Ð¨£¤U¦C½d¨Ò¡G

Example 1: pca01.mclear j dataNum = 1000; data = randn(1,dataNum)+j*randn(1,dataNum)/3; data = data*exp(j*pi/6); % Rotate 30 degree data = data-mean(data); % Mean subtraction plot(real(data), imag(data), '.'); axis image; DS.input=[real(data); imag(data)]; [DS2, v, eigValue] = pca(DS); v1 = v(:, 1); v2 = v(:, 2); arrow = [-1 0 nan -0.1 0 -0.1]+1+j*[0 0 nan 0.1 0 -0.1]; arrow1 = 2*arrow*(v1(1)+j*v1(2))*eigValue(1)/dataNum; arrow2 = 2*arrow*(v2(1)+j*v2(2))*eigValue(2)/dataNum; line(real(arrow1), imag(arrow1), 'color', 'r', 'linewidth', 4); line(real(arrow2), imag(arrow2), 'color', 'k', 'linewidth', 4); title('Axes for PCA');

«Ü©úÅ㪺¡APCA ªº¥D¶b­è¦n¬O©µµÛ¸ê®Æ³Ì¤À´²ªº¤è¦V¡C

¦b¤U­±ªº½d¨Ò¡A§Ú­Ì°w¹ï 150 µ§ IRIS ¸ê®Æ¶i¦æ PCA¡A¦p¤U¡G

Example 2: pcaIris01.mDS=prData('iris'); DS2=pca(DS); DS3=DS2; DS3.input=DS3.input(1:2, :); % Keep the first two dimenions subplot(2,1,1); dsScatterPlot(DS3); axis image xlabel('Input 1'); ylabel('Input 2'); title('IRIS projected onto the first two dim of PCA'); DS3=DS2; DS3.input=DS3.input(end-1:end, :); % Keep the last two dimenions subplot(2,1,2); dsScatterPlot(DS3); axis image xlabel('Input 3'); ylabel('Input 4'); title('IRIS projected onto the last two dim of PCA');

¦b¤W­z½d¨Ò¤¤¡A²Ä¤@­Ó¹Ï¬O§â IRIS ¸ê®Æ§ë¼v©ó²Ä¤@©M²Ä¤G­Ó PCA ¦V¶q¡A²Ä¤G­Ó¹Ï«h¬O§ë¼v©ó²Ä¤T©M²Ä¥|­Ó PCA ¦V¶q¡C«Ü©úÅ㪺¡A¦b²Ä¤@­Ó¹Ï§Î¤¤¡A¸ê®ÆÂI¤À§G«Ü´²¡A¦Ó¦b²Ä¤G­Ó¹Ï§Î¤¤¡A¸ê®ÆÂIªº´²§Gµ{«×´N¤ñ¸û¤p¡C¡]½Ðª`·N¡G²Ä¤G­Ó¹Ïªº½d³ò¤ñ²Ä¤@­Ó¹Ïªº½d³ò¤p«Ü¦h¡C¡^ ¹ï©ó WINE ¸ê®Æ¡A§Ú­Ì¥i¥H¶i¦æÃþ¦üªº­pºâ¡A½d¨Ò¦p¤U¡G

Example 3: pcaWine01.mDS=prData('wine'); DS2=pca(DS); DS3=DS2; DS3.input=DS3.input(1:2, :); % Keep the first two dimensions subplot(2,1,1); dsScatterPlot(DS3); axis image xlabel('Input 1'); ylabel('Input 2'); title('WINE projected onto the first two dim of PCA'); DS3=DS2; DS3.input=DS3.input(end-1:end, :); % Keep the last two dimensions subplot(2,1,2); dsScatterPlot(DS3); axis image xlabel('Input 12'); ylabel('Input 13'); title('WINE projected onto the last two dim of PCA');

«Ü©úÅ㪺¡A²Ä¤@­Ó¹Ïªº´²§Gµ{«×¤ñ²Ä¤G­Ó¹Ï¤j«Ü¦h¡C

Hint
ÁöµM IRIS ©M WINE ¸ê®Æ¶°³£§t¦³Ãþ§O¸ê°T¡A¦ý¬O§Ú­Ì¦b¤W­z½d¨Ò­pºâ PCA ®É¡A¨Ã¨S¦³¥Î¨ì³o¨ÇÃþ§O¸ê°T¡C

PCA ªº·§©À¬O¡u±N¸ê®Æ©Ô¶}¡v¡]©Î¬O¡u±N¸ê®Æ§ë¼v¨ìÅܲ§¶q³Ì¤jªº¤lªÅ¶¡¡v¡^¡A¨Ã¨S¦³¦Ò¼{¨ì¸ê®ÆªºÃþ§O¡A¦]¦¹ÄY®æ¨Ó»¡¡A¨Ã¤£§¹¥þ¾A¥Î©ó¼Ë¦¡¿ë»{ªº°ÝÃD¡C¦ý¥Ñ©ó¡u±N¸ê®Æ©Ô¶}¡v»P¡u±N¤£¦PÃþ§Oªº¸ê®Æ©Ô¶}¡v¦³¤@¨Ç¦@¦P©Ê¡A¦]¦¹¸I¨ì¦³¬Y¤@¨Ç¸ê®Æºû«×¤Ó¤jªº¼Ë¦¡¿ë»{°ÝÃD¡]¨Ò¦p¤HÁy¿ëÃÑ¡^¡APCA ´N±`³Q¨Ï¥Î¡A¥H­°§C¸ê®Æºû«×¤Î¹Bºâ¶q¡C

­Y¥H KNNC ¤Î leave-one-out ¨Ó´ú¸Õ LDA §ë¼vªººû«×¹ï¿ëÃѲvªº¼vÅT¡A¥i¨Ï¥Î¤U¦C½d¨Òµ{¦¡¨Ó´ú¸Õ iris ¸ê®Æ¡G

Example 4: pcaIrisDim01.mDS=prData('iris'); [featureNum, dataNum] = size(DS.input); [recogRate, computed] = knncLoo(DS); fprintf('All data ===> LOO recog. rate = %d/%d = %g%%\n', sum(DS.output==computed), dataNum, 100*recogRate); DS2 = pca(DS); recogRate=[]; for i = 1:featureNum DS3=DS2; DS3.input=DS3.input(1:i, :); [recogRate(i), computed] = knncLoo(DS3); fprintf('PCA dim = %d ===> LOO recog. rate = %d/%d = %g%%\n', i, sum(DS3.output==computed), dataNum, 100*recogRate(i)); end plot(1:featureNum, 100*recogRate, 'o-'); grid on xlabel('No. of projected features based on PCA'); ylabel('LOO recognition rates using KNNC (%)');All data ===> LOO recog. rate = 144/150 = 96% PCA dim = 1 ===> LOO recog. rate = 133/150 = 88.6667% PCA dim = 2 ===> LOO recog. rate = 144/150 = 96% PCA dim = 3 ===> LOO recog. rate = 144/150 = 96% PCA dim = 4 ===> LOO recog. rate = 144/150 = 96%

­Y¥H¬Û¦Pªº¤è¦¡¨Ó´ú¸Õ WINE ¸ê®Æ¡Aµ²ªG¦p¤U¡G

Example 5: pcaWineDim01.mDS=prData('wine'); [featureNum, dataNum] = size(DS.input); [recogRate, computed] = knncLoo(DS); fprintf('All data ===> LOO recog. rate = %d/%d = %g%%\n', sum(DS.output==computed), dataNum, 100*recogRate); DS2 = pca(DS); recogRate=[]; for i = 1:featureNum DS3=DS2; DS3.input=DS3.input(1:i, :); [recogRate(i), computed] = knncLoo(DS3); fprintf('PCA dim = %d ===> LOO recog. rate = %d/%d = %g%%\n', i, sum(DS3.output==computed), dataNum, 100*recogRate(i)); end plot(1:featureNum, 100*recogRate, 'o-'); grid on xlabel('No. of projected features based on PCA'); ylabel('LOO recognition rates using KNNC (%)');All data ===> LOO recog. rate = 137/178 = 76.9663% PCA dim = 1 ===> LOO recog. rate = 126/178 = 70.7865% PCA dim = 2 ===> LOO recog. rate = 128/178 = 71.9101% PCA dim = 3 ===> LOO recog. rate = 130/178 = 73.0337% PCA dim = 4 ===> LOO recog. rate = 135/178 = 75.8427% PCA dim = 5 ===> LOO recog. rate = 136/178 = 76.4045% PCA dim = 6 ===> LOO recog. rate = 137/178 = 76.9663% PCA dim = 7 ===> LOO recog. rate = 137/178 = 76.9663% PCA dim = 8 ===> LOO recog. rate = 137/178 = 76.9663% PCA dim = 9 ===> LOO recog. rate = 137/178 = 76.9663% PCA dim = 10 ===> LOO recog. rate = 137/178 = 76.9663% PCA dim = 11 ===> LOO recog. rate = 137/178 = 76.9663% PCA dim = 12 ===> LOO recog. rate = 137/178 = 76.9663% PCA dim = 13 ===> LOO recog. rate = 137/178 = 76.9663%

¥H¤W­z¨â­Ó½d¨Ò¨Ó¬Ý¡A¦ü¥G¯S¼x­Ó¼Æ¶V¦h¡A®ÄªG¶V¦n¡A´«¥y¸Ü»¡¡APCA¹ï©ó¤ÀÃþ¦Ó¨¥¡A¦ü¥G¨S¿ìªk¿ï¨ú¼Æ­Ó¦³®Äªº¯S¼x¨Ó±o¨ì¿ëÃѲvªº³Ì¤j­È¡A³o¤]¬O¤@­Ó¦X²zªº²{¶H¡A¦]¬°PCA¦b¿ï¨ú§ë¼vªº¤è¦V®É¡A¨Ã¥¼¦Ò¼{¸ê®ÆªºÃþ§O¸ê°T¡C¡]¦b¤U¤@¸`¤¤¡A§A¥i¥H¤ñ¸û¨Ï¥Î LDA ©ó¦P¼Ëªº¸ê®Æ¶°©Ò±o¨ìªºµ²ªG¡C¡^

­Y§Ú­Ì±N¤W­z½d¨Ò¼g¦¨¤@­Ó¨ç¼Æ pcaKnncLoo.m¡A«h¥i¥Î¦¹¨ç¼Æ¨Ó´ú¸Õ¡u¸ê®Æ¥¿³W¤Æ¡v¹ï©ó¿ëÃѲvªº¼vÅT¡A½Ð¨£¤U¦C½d¨Ò¡G

Example 6: pcaWineDim02.mDS=prData('wine'); recogRate1=pcaPerfViaKnncLoo(DS); DS2=DS; DS2.input=inputNormalize(DS2.input); % data normalization recogRate2=pcaPerfViaKnncLoo(DS2); [featureNum, dataNum] = size(DS.input); plot(1:featureNum, 100*recogRate1, 'o-', 1:featureNum, 100*recogRate2, '^-'); grid on legend('Raw data', 'Normalized data'); xlabel('No. of projected features based on LDA'); ylabel('LOO recognition rates using KNNC (%)');

¦³¤W­z½d¨Ò¥i¥H¬Ý¥X¡A¹ï©ó³o­ÓÀ³¥Î¦Ó¨¥¡A¨Ï¥Î¤F¸ê®Æ¥¿³W¤Æ¡A¨Ï±o¿ëÃѲv´£¤É¤£¤Ö¡C¦ý¬O¡A¾ãÅé¿ëÃѲvÁÙ¬O¤p©ó LDA¡A½Ð¨£¤U¤@¸`¹ï©ó LDA ªº¤¶²Ð¥H¤Î¬ÛÃöªº½d¨Ò¡C

Hint
¸ê®Æ¥¿³W¤Æ¹ï©ó¿ëÃѲvªº¼vÅT¡AÀH¤£¦PªºÀ³¥Î»P¤£¦Pªº¤ÀÃþ¾¹¦ÓÅÜ¡A¨Ã«D¤@©w³£¥i¥H´£¤É¿ëÃѲv¡C

Hint
¨ä¹ê¡A¥»³¹©Ò±o¨ìªº¿ëÃѲv¡A·|¦³¤@ÂIÂI¬y©ó¼ÖÆ[¡C¦]¬°§Ú­Ì¬O¥Î©Ò¦³ªº¸ê®Æ¨Ó¶i¦æ PCA¡AµM«á¦A°µ KNNC ªº LOO ¿ëÃѲv´ú¸Õ¡C´«¥y¸Ü»¡¡APCA ¤w¸g¡u°½¬Ý¡v¤F©Ò¦³ªº¸ê®Æ¡A©Ò¥H³o­Ó´ú¸Õ©Ò±o¨ìªº¿ëÃѲv·|°¾°ª¤@ÂIÂI¡C

More info:


Data Clustering and Pattern Recognition (¸ê®Æ¤À¸s»P¼Ë¦¡¿ë»{)