2-3 �ϥ�GPU�ӥ[�t�B��

GPU (graphic processing units) ªº¥X²{©M´¶¤Î¡A¥i»¡¬Oªñ¦~¨Ó¬ì¾Ç­pºâªº³Ì¤jÅÜ­²¡A¥Ñ©ó GPU ¨ã¦³¤j¶q¥­¦æ³B²zªº¯à¤O¡A©Ò¥H¹ï©ó¬Y¤@¨Ç¾A¦X¥­¦æ¹BºâªºÀ³¥Î¡AGPU ¥i»¡¬O³Ì¬O¾A¦X¤£¹L¤F¡C

¦ý¬O GPU ªºµ{¦¡³]­p¡A­Y¥H C ©Î C++ ¨Ó¶i¦æ¡A·|¤ñ¤@¯ëµ{¦¡³]­p½ÆÂø¤@¨Ç¡A§A¥²¶·­nÁA¸Ñ GPU ¥»¨­ªº³]­p·§©À¥H¤ÎµwÅéµ²ºc¡A¤~¯à°÷¥R¤Àµo´§ GPU ªº­pºâ¯à¤O¡C¦ý­Y­n¦b MATLAB ¨Ó¨Ï¥Î GPU ¥[³t¦UºØ¹Bºâ¡A«h¬O¬Û·í®e©ö¡A¦]¬°¬ÛÃöªº½ÆÂø²Ó¸`³£¤w¸g³Q¥]§t¦b MATLAB ²³æªº«ü¥O¤º¤F¡C

MATLAB »P GPU ³Ì¬Û¬ÛÃöªº¨â­Ó°ò¥»«ü¥O¦p¤U¡G¤º´ú¸Õ§Aªº¾÷¾¹¦³´X±i

¨Ò¦p¡A¥H¤U½d¨Ò¥i¥HÅã¥Ü§Aªº¾÷¾¹¤W¦³´X±i GPU ¥d¡A¥H¤Î¹w³]¤§ GPU ¥dªº¬ÛÃö¸ê°T¡G

Example 1: 02-µ{¦¡½X»P°O¾ÐÅ餧³Ì¨Î¤Æ/gpuDevice01.md = gpuDeviceCount g = gpuDevice d = 1 g = <a href="matlab:helpPopup parallel.gpu.CUDADevice" style="font-weight:bold">CUDADevice</a> with properties: Name: 'GeForce GTX 970M' Index: 1 ComputeCapability: '5.2' SupportsDouble: 1 DriverVersion: 7 ToolkitVersion: 6.5000 MaxThreadsPerBlock: 1024 MaxShmemPerBlock: 49152 MaxThreadBlockSize: [1024 1024 64] MaxGridSize: [2.1475e+09 65535 65535] SIMDWidth: 32 TotalMemory: 3.2212e+09 AvailableMemory: 2.9349e+09 MultiprocessorCount: 10 ClockRateKHz: 1038000 ComputeMode: 'Default' GPUOverlapsTransfers: 1 KernelExecutionTimeout: 1 CanMapHostMemory: 1 DeviceSupported: 1 DeviceSelected: 1

¦b¤W­z½d¨Ò¤¤¡AÅã¥Ü¤F§Úªº¾÷¾¹¥u¦³¤@±iÅã¥d¡A¨ÃÅã¥Ü¦¹Åã¥dªº¦UºØ¬ÛÃö©Ê½è¡C

¦b¨Ï¥ÎÅã¥d¶i¦æ¹Bºâ®É¡A§Ú­Ì³q±`¥²¶·¿í´`¤U¦C°ò¥»¨BÆJ¡G

  1. ¨Ï¥Î gpuArray «ü¥O¡A±N MATLAB ¤u§@ªÅ¶¡ªºÅܼƷh²¾¨ì GPU ªº°O¾ÐÅ餤¡C
  2. ¨Ï¥Î GPU °O¾ÐÅ餤ªºÅܼƨӰõ¦æ¦UºØ¦b GPU ªº¹Bºâ¡C
  3. ¨Ï¥Î gather «ü¥O¡A±N¦s©ñ¦b GPU ªºÅܼƷh²¾¦Ü MATLAB ¤u§@ªÅ¶¡¤¤¡C

¦b¥H¤U½d¨Ò¤¤¡A§Ú­Ì¥H²³æªº¯x°}¬Û­¼¨Ó»¡©ú¦p¦ó¾Þ§@¥H¤W³o´X­Ó¨BÆJ¡G

Example 2: 02-µ{¦¡½X»P°O¾ÐÅ餧³Ì¨Î¤Æ/gpuStep01.ma=rand(100, 10000); b=rand(100, 10000)'; tic c=a*b; fprintf('CPU time = %g sec\n', toc); A=gpuArray(a); % Put a to GPU's memory B=gpuArray(b); % Put b to GPU's memory tic C=A*B; % Multiplication via GPU fprintf('GPU time = %g sec\n', toc); c2=gather(C); % Put C to MATLAB's workspace fprintf('isequal(c, c2) = %g\n', isequal(c, c2)); fprintf('Mean deviation = %g\n', mean(mean(abs(c-c2))));CPU time = 0.00463387 sec GPU time = 0.000350486 sec isequal(c, c2) = 0 Mean deviation = 5.55428e-13

¦b¤W­z½d¨Ò¤¤¡A§Ú­Ì¥i¥HÆ[¹î¨ì¤U¦C²{¶H¡G
  1. GPU ªº­pºâ®É¶¡¡]¤£¥]§t¸ê®Æ·h²¾ªº®É¶¡¡^¤j¬ù¥u¦³ CPU ­pºâ®É¶¡ªº 1/20¡C
  2. GPU ­pºâµ²ªG©M CPU ¤£§¹¥þ¬Û¦P¡A¦ý¨âªÌªº®t²§©Ê·¥¤p¡C

¯S§O­nª`·Nªº¬O¡A¤W­z GPU ªº­pºâ®É¶¡¡A¨Ã¤£¥]§t¸ê®Æ·h²¾®É¶¡¡C¤@¯ë¦Ó¨¥¡A§Ú­ÌÀ³¸ÓºÉ¶q´î¤Ö¸ê®Æ·h²¾¡A¨ÃºÉ¶q¦b GPU ¶i¦æ¥­¦æ¹Bºâ¡A§_«h¤Ï¦Ó·|±o¤£Àv¥¢¡C

¦b«e¤@­Ó½d¨Ò¤¤¡A­pºâ¥[³tªº´T«×©M¯x°}ªººû«×¦³«Ü¤jªºÃö«Y¡A¤U­±³o­Ó½d¨Ò±N±´°Q³o­ÓÃö«Y¡G

Example 3: 02-µ{¦¡½X»P°O¾ÐÅ餧³Ì¨Î¤Æ/gpuSpeedup01.mfprintf('computer = %s\n', computer); fprintf('version = %s\n', version); % Speed test step=10000; colCounts=step*(1:1:20); for i=1:length(colCounts) fprintf('%d/%d\n', i, length(colCounts)); n=colCounts(i); a=rand(100, n); b=rand(100, n)'; myTic=tic; c=a*b; cpuTime(i)=toc(myTic); A=gpuArray(a); B=gpuArray(b); myTic=tic; C=A*B; gpuTime(i)=toc(myTic); end subplot(211); plot(colCounts, cpuTime, '.-', colCounts, gpuTime, '.-'); legend('CPU time', 'GPU time', 'location', 'northwest'); title('CPU & GPU time'); ylabel('Time (sec)'); subplot(212); plot(colCounts, cpuTime./gpuTime, 'o-'); title('GPU speedup ratio'); ylabel('Ratios'); xlabel('No. of columns');computer = PCWIN64 version = 9.3.0.651671 (R2017b) Prerelease 1/20 2/20 3/20 4/20 5/20 6/20 7/20 8/20 9/20 10/20 11/20 12/20 13/20 14/20 15/20 16/20 17/20 18/20 19/20 20/20

¦b¤W­z½d¨Ò¤¤¡A¥[³t´T«×«Ü¤j¡AGPU ³t«×¥i¹F CPU ³t«×ªº 500 ­¿¥H¤W¡C¦ý½Ðª`·N¡A¤W­z½d¨Òªº­pºâ¨Ã¤£¥]§t¸ê®Æ·h²¾©Ò»Ýªº®É¶¡¡C

Hint
½Ð­×§ï¤W­z½d¨Ò¡A±N¸ê®Æ·h²¾ªº®É¶¡¤]ºâ¤JÁ`­pºâ®É¶¡¡A¬Ý¬Ý·|¤£·|¥X²{¡u±o¤£Àv¥¢¡vªº±¡ªp¡C

¡]«ÝÄò¡^


MATLABµ{¦¡³]­p¡G¶i¶¥½g