10-2 Discrete Fourier Transform (聽葚鉥?

在前一節中，我們可以使用「離散時間傅立葉轉換」（簡稱 DTFT）來將一段數位訊號轉換成各個頻譜的分量，但這是一個連續的函數，並不適合在電腦中處理，因此本節將介紹「離散傅立葉轉換」（Discrete Fourier Transform），簡稱 DFT，其功能是將一段數位訊號轉換成其各個離散頻率的弦波分量，以便後續使用電腦進行各種處理。

如果我們的訊號可以表示成 x[n], n = 0~N-1，那麼 DFT 的公式如下：

X[k]=(1/N)*S_n=0^N-1 x[n]*exp(-j*2p*n*k/N), k=0, ..., N-1 這些傅立葉係數 X[k] 所代表的資訊是 k 的函數，而 k 直接和頻率有正比關係，因此這些係數 X[k] 通稱為「頻譜」（Spectrum），而對於 X[k] 的分析，我們通稱為「頻譜分析」（Spectral Analysis）。我們也可以由這些傅立葉係數 X[k]，來反推原始訊號 x[n]，如下： x[n]=S_k=0^N-1 X[k]*exp(j*2p*n*k/N), n=0, ..., N-1

Hint

DTFT 和 DFT 很類似，兩者都用來處理離散時間的訊號，只是前者產生一個角頻率的連續函數，而後者產生離散頻率的訊號，更適合使用電腦來進行處理。

這邊有幾點要說明：

如果原始訊號 x[n] 有 N 點，那麼轉換出來的訊號 X[k] 也會有 N 點。
一般而言，X[k] 是一個複數，其大小是 |(X[k])| （abs(X[k]) in MATLAB），相位是 ∠X[k] (angle(X[k]) or atan(imag(X[k])/real(X[k]) in MATLAB)。
如果原始訊號 x[n] 都是實數，那麼 X[k] 和 X[N-k] 會是共軛複數，滿足 |(X[k])| = |(X[N-k])| 以及 ∠X[k] = -∠X[N-k]。

如果 x[n] 是實數，我們可以將 x[n] 表示如下： x[n] = X[0]
+ X[1]*exp(j*2p*n*1/N) + X[N-1]*exp(j*2p*n*(N-1)/N)
+ X[2]*exp(j*2p*n*2/N) + X[N-2]*exp(j*2p*n*(N-2)/N)
+ X[3]*exp(j*2p*n*3/N) + X[N-3]*exp(j*2p*n*(N-3)/N)
+ ... 對上述第 k 項而言，我們有 X[k]*exp(j*2p*n*k/N) + X[N-k]*exp(j*2p*n*(N-k)/N)
= X[k]*exp(j*2p*n*k/N) + X[N-k]*exp(j*2p*n)*exp(-j*2p*n*k/N)
= X[k]*exp(j*2p*n*k/N) + X[N-k]*exp(-j*2p*n*k/N)
= 2*Re(X[k]*exp(j*2p*n*k/N)) 如果我們以 m_k 表示 X[k] 的大小，以 p_k 表示 X[k] 的相位，那麼上式可以化簡如下： 2*Re(m_k*exp(j*p_k)*exp(j*2p*n*k/N))
= 2*Re(m_k*exp(j*(2p*n*k/N + p_k))
= 2*m_k*cos(2p*n*k/N + p_k) 一般而言，N 是 2 的倍數，因此 x[n] 可以表示成 x[n] = X[0] + 2*S_k=1^N/2-1m_k*cos(2p*n*k/N + p_k) + m_N/2*cos(p*n + p_N/2) 換句話說，我們可以將原始訊號拆解成一個直流訊號 X[0] 再加上 N/2 個弦波的組合，這些弦波的震幅就是 X[k] 的大小，而相位則是 X[k] 的相位。因此對於實數的 x[n] 而言，我們只需要看單邊的 X[k]，k = 0 ~ N/2，此種可稱為「單邊頻譜」。組成這些單邊頻譜的弦波共有 1+N/2 個，頻率由小到大分別是 fs/N*(0:N/2)，這些弦波稱為「基本弦波」。

若是套用上述公式來計算 DFT，所需要的複雜度是 O(n²)，但在 1965 年，有兩位學者提出來一套更精簡的演算法，所需的複雜度只有 O(n log n)，這一套演算法稱為「快速傅立葉轉換」（Fast Fourier Transform，簡稱 FFT），換句話說，FFT 是用來計算 DFT 的快速方法。若使用 MATLAB，相關的指令也是 fft。

首先，我們使用 MATLAB 的 fft 指令來驗證實數訊號之 DFT 係數的共軛性，如下：

Example 1: fftSym01.m% This example demonstrates the pair-wise conjugate of DFT (此範例展示實數訊號之 DFT 係數的共軛性) N=64; % Length of vector x=randn(N, 1); z=fft(x); plot(z, 'o'); grid on %compass(z); % Connect conjugate pairs (將上下對稱的共軛點連起來) for i=2:N/2+1 twoPoint=z([i, N-i+2]); line(real(twoPoint), imag(twoPoint), 'color', 'r'); end

由上圖可以看出，DFT 係數會對稱於複數平面的實數軸，代表這些係數都會以共軛複數的方式成對出現。

Hint

對於實數訊號 x[n] 而言：

當 N 為偶數時，只有 X[0] 和 X[N/2] 是單一實數，其餘均為成對出現之共軛複數。
當 N 為奇數時，只有 X[0] 是單一實數，其餘均為成對出現之共軛複數。

如果 x[n] 恰巧是這些基本弦波中的其中一個，那麼計算出來的雙邊頻譜，應該只有兩個係數不為零，我們可用 MATLAB 驗證如下：

Example 2: fft01.m% This example demonstrates the two-side DFT of a sinusoidal function % (此範例展示一個簡單正弦波的傅立葉轉換，以雙邊頻譜來顯示) % Since the sinusoidal function has a frequency to be a multiple of fs/N, the two-side DFT have only two nonzero terms. % (此正弦波的頻率恰巧是 freqStep 的整數倍，所以雙邊頻譜應該只有兩個非零點) N = 256; % length of vector (點數) fs = 8000; % sample rate (取樣頻率) freqStep = fs/N; % freq resolution in spectrum (頻域的頻率的解析度) f = 10*freqStep; % freq of the sinusoid (正弦波的頻率，恰是 freqStep 的整數倍) time = (0:N-1)/fs; % time resolution in time-domain (時域的時間刻度) y = cos(2*pi*f*time); % signal to analyze Y = fft(y); % spectrum Y = fftshift(Y); % put zero freq at the center (將頻率軸的零點置中) % Plot time data subplot(3,1,1); plot(time, y, '.-'); title('Sinusoidal signals'); xlabel('Time (seconds)'); ylabel('Amplitude'); axis tight % Plot spectral magnitude freq = freqStep*(-N/2:N/2-1); % freq resolution in spectrum (頻域的頻率的解析度) subplot(3,1,2); plot(freq, abs(Y), '.-b'); grid on xlabel('Frequency)'); ylabel('Magnitude (Linear)'); % Plot phase subplot(3,1,3); plot(freq, angle(Y), '.-b'); grid on xlabel('Frequency)'); ylabel('Phase (Radian)');

由上圖可以看出：

能量頻譜只有在兩點不為零，其他各點理論上應該是零，但由於計算精度之故，實際值很接近於零，但不一定是零。
相位頻譜沒有一定規則，像是亂數。這是由於大部分的能量頻譜很小，因此實際的相位頻譜並沒有任何意義。

如果 x[n] 的頻率不是這些弦波中的其中一個，那麼 DFT 還是會將此訊號拆解成基本弦波的組合，因此能量頻譜就會分散在各個基本弦波，如下所示：

Example 3: fft02.m% This example demonstrates the one-side DFT of a sinusoidal function (此範例展示一個簡單正弦波的傅立葉轉換，以雙邊頻譜來顯示) % Since the sinusoidal function has a frequency not a multiple of fs/N, the two-side DFT smears. (此正弦波的頻率不是 freqStep 的整數倍，所以雙邊頻譜會「散開」(Smearing)) N = 256; % length of vector (點數) fs = 8000; % sample rate (取樣頻率) freqStep = fs/N; % freq resolution in spectrum (頻域的頻率的解析度) f = 10.5*freqStep; % freq of the sinusoid (正弦波的頻率，不是 freqStep 的整數倍) time = (0:N-1)/fs; % time resolution in time-domain (時域的時間刻度) signal = cos(2*pi*f*time); % signal to analyze [mag, phase, freq]=fftTwoSide(signal, fs, 1); % compute and plot the two-side DFT

在上述範例中，我們使用 fftTwoSide.m 函數（位於 SAP Toolbox）來進行雙邊頻譜的計算，並將同時畫出時域訊號、頻譜能量、頻譜相位三個圖。

由於對於實數的 x[n] 而言，雙邊頻譜能量圖會左右對稱，而雙邊頻譜相位圖則是左右反對稱，因此我們可以只看單邊頻譜即可，範例如下：

Example 4: fft03.m% Same as fft02.m but use one-side DFT instead (同 fft02.m，但以單邊頻譜來顯示) N = 256; % length of vector (點數) fs = 8000; % sample rate (取樣頻率) freqStep = fs/N; % freq resolution in spectrum (頻域的頻率的解析度) f = 10.5*freqStep; % freq of the sinusoid (正弦波的頻率，不是 freqStep 的整數倍) time = (0:N-1)/fs; % time resolution in time-domain (時域的時間刻度) signal = cos(2*pi*f*time); % signal to analyze [mag, phase, freq]=fftOneSide(signal, fs, 1); % Compute and plot one-side DFT

在上述範例中，我們使用 fftOneSide.m 函數（位於 SAP Toolbox）來進行單邊頻譜的計算，並將同時畫出時域訊號、頻譜能量、頻譜相位三個圖。

Hint

一般實際世界中的訊號，都是實數訊號，因此我們只要使用 fftOneSide.m 來計算單邊頻譜即可。

下面這個範例，顯示一個語音音框的單邊頻譜：

Example 5: fft04.m% Display DFT of a frame in a real-world audio signal (顯示一個語音音框的單邊頻譜) au=myAudioRead('welcome.wav'); frame=au.signal(2047:2047+237-1); [mag, phase, freq]=fftOneSide(frame, au.fs, 1);

在上述範例中，我們可以看到很明顯的諧波出現在頻譜能量，這是由於訊號的週期性所造成的結果。（為了凸顯諧波現象，我們的音框剛好包含了三個完整的基本週期。）

當 k 越大時，代表對應的弦波是高頻訊號，因此我們可以捨棄高頻訊號，只用幾個低頻弦波，來合成原始訊號，達到下列兩項目的：

資料壓縮
濾除高頻訊號

以下這個範例，顯示如何以弦波來合成一個方波：

Example 6: fftApproximate01.m% This example demos the effect of square wave approximation by DFT figure L = 15; N = 25; x = [ones(1,L), zeros(1,N-L)]; frameSize=length(x); runNum=3; for i=1:runNum, pointNum=ceil(frameSize/(2*runNum)*i); % Actually 2*pointNum-1 coefs are taken X = fft(x); magX = abs(X); remainIndex=[1:pointNum, frameSize-pointNum+2:frameSize]; X2=0*X; X2(remainIndex)=X(remainIndex); x2=ifft(X2); x2=real(x2); subplot(3,2,2*i-1); stem(x); hold on plot(x2, 'r'); hold off title(sprintf('x[n] and %d-points approximation', 2*pointNum-1)); axis([-inf,inf,-0.5,1.5]) subplot(3,2,2*i); shiftedMagX=fftshift(magX); plot(shiftedMagX, '.-'); axis tight title('DFT of x[n]') hold on temp=ifftshift(1:frameSize); ind=temp(remainIndex); plot(ind, shiftedMagX(ind), 'or'); grid on hold off end

由此可見當我們所用的 DFT 係數越多，所合成的波形就會越接近原來的波形。

下面這個範例，則是以弦波來合成一個實際說話聲音的波形：

Example 7: fftApproximate02.m% Display the effect of FFT approximation au=myAudioRead('welcome.wav'); x=au.signal(2047:2047+237-1); figure frameSize=length(x); runNum=3; for i=1:runNum, pointNum=ceil(frameSize/(8*runNum)*i); % Actually 2*pointNum-1 coefs are taken X = fft(x); magX = abs(X); remainIndex=[1:pointNum, frameSize-pointNum+2:frameSize]; X2=0*X; X2(remainIndex)=X(remainIndex); x2=ifft(X2); x2=real(x2); subplot(3,2,2*i-1); plot(x, '.-'); hold on plot(x2, 'r'); hold off title(sprintf('x[n] and %d-points approximation', 2*pointNum-1)); set(gca, 'xlim', [-inf inf]); subplot(3,2,2*i); shiftedMagX=fftshift(magX); plot(shiftedMagX, '.-'); title('DFT of x[n]') hold on temp=ifftshift(1:frameSize); ind=temp(remainIndex); plot(ind, shiftedMagX(ind), 'or'); grid on hold off set(gca, 'xlim', [-inf inf]); end

由此可見在實際的聲音波形中，我們只要少數幾個低頻的係數，就可以合成類似原來的訊號，由此也可以知道很多高頻訊號是沒有作用的。

對於週期性的波形，DFT 會插入零點，如下：

Example 8: fftRepeat01.m% This example demos the effect of FFT for purely periodic signals au=myAudioRead('welcome.wav'); x=au.signal(2047:2126); % A full fundamental period runNum=5; for i=1:runNum repeatedX = x*ones(1,i); signal = repeatedX(:)+0.1; % signal=zeros(runNum*length(x), 1); % Zero-padding version % signal(1:length(repeatedX))=repeatedX(:); % Zero-padding version [mag, phase, freq, powerDb]=fftOneSide(signal, au.fs); mag=mag/length(signal); % Divided by vector length to normalize magnitude (due to the formula used by MATLAB) subplot(runNum,2,2*i-1); plot(signal, '.-'); grid on title('x[n]'); set(gca, 'xlim', [-inf inf]); subplot(runNum,2,2*i); plot(freq, mag, '.-'); grid on; % set(gca, 'yscale', 'log'); title('DFT of x[n]'); axis tight; end

為什麼 DFT 會插入零點呢？若不使用詳細的數學分析，各位同學是否可以以直覺的方式來推導出這個結果呢？

若在時域波形加上零點，則在頻域所產生的效果是內差以增加點數，如下所示：

Example 9: fftZeroPadding01.m% This example demos the effect of zero-padding of DFT for i=1:3 L = 5; N = 20*i; x = [ones(1,L), zeros(1,N-L)]; subplot(3,3,i*3-2); stem(x); title(sprintf('x[n] with N=%d',N)); set(gca, 'xlim', [-inf inf]); omega=((1:N)-ceil((N+1)/2))*(2*pi/N); X = fft(x); magX = fftshift(abs(X)); subplot(3,3,i*3-1); plot(omega, magX, '.-'); title('Magnitude of DFT of x[n]') set(gca, 'xlim', [-inf inf]); phase=fftshift(angle(X)); subplot(3,3,i*3); plot(omega, phase, '.-'); title('Phase of DFT of x[n]') set(gca, 'xlim', [-inf inf]); set(gca, 'ylim', [-pi pi]); end

換句話說，加上零點，在實質上並沒有增加訊號的資訊，但是 DFT 的點數必須隨之增加，因此在頻譜的效應則是進行內差以增加點數。

Hint

一般在進行音訊處理時，如果音框的點數不是 2ⁿ 的格式，我們通常就是使用補零的方式，使最後的點數變成 2ⁿ，這樣在進行 DFT 時，才比較方便採取比較快速的 FFT 計算方式。

若在時域進行 Downsample，則在頻域所產生的效果如下：

Example 10: fftReSample01.m% This example demos the effect of FFT approximation au=myAudioRead('welcome.wav'); x=au.signal(2047:2126); x=au.signal(2047:2326); n=length(x); F = (0:n/2)*au.fs/n; runNum=5; for i=1:runNum, newX=x(1:2^(i-1):length(x)); newFs=au.fs/(2^(i-1)); X = fft(newX); magX = abs(X); frameSize=length(newX); subplot(runNum,2,2*i-1); plot(newX, '.-'); title('x[n]'); set(gca, 'xlim', [-inf inf]); subplot(runNum,2,2*i); freq = (0:frameSize/2)*newFs/frameSize; magX = magX(1:length(freq)); M=nan*F; M(1:length(magX))=magX; plot(F, M, '.-'); title('DFT of x[n]') axis tight; end

當我們 Downsample 進行越多次，曲線會越平滑，代表高頻部分已經慢慢不見了！

此外，我們可以將原先的時域訊號表示成數個弦波函數的線性組合，並使用最小平方法（lease-squares estimate）來進行曲線擬合（Curve fitting）以找出最佳的線性係數，這些線性係數和 fft 所得到的結果是一樣的，範例如下：

Example 11: fftViaLse01.m% FFT via least-squares estimate N=8; fs=1; time=(0:N-1)'/fs; x=rand(N,1)*2-1; A=ones(N,1); for i=1:N/2 A=[A, cos(2*pi*(i*fs/N)*time), sin(2*pi*(i*fs/N)*time)]; end th=A\x; plotNum=fix(N/2)+2; subplot(plotNum, 1, 1); N2=(N-1)*5+1; % Insert 4 points between every 2 points for better observation (兩點間插入四點，以便觀察波形) timeFine=linspace(min(time), max(time), N2); x2=th(1)*ones(N,1); plot(timeFine, th(1)*ones(N2,1), '.-', time, x2, 'or'); % Plot the first term (畫出第一項) ylabel('Term 0 (DC)'); axis([0 N/fs -1 1]); grid on for i=1:N/2 % Plot terms 2 to 1+N/2 (畫出第二至第 1+N/2 項) freq=i*fs/N; y=th(2*i)*cos(2*pi*freq*time)+th(2*i+1)*sin(2*pi*freq*time); % a term (每一項) x2=x2+y; fprintf('i=%d, sse=%f\n', i, norm(x-x2)/sqrt(N)); subplot(plotNum, 1, i+1); yFine=th(2*i)*cos(2*pi*(i*fs/N)*timeFine)+th(2*i+1)*sin(2*pi*(i*fs/N)*timeFine); % Finer verison for plotting plot(timeFine, yFine, '.-', time, y, 'or'); ylabel(sprintf('Term %d', i)); axis([0 N/fs -1 1]); grid on end % Plot the original signal (畫出原來訊號) subplot(plotNum, 1, plotNum) plot(time, x, 'o-'); axis([0 N/fs -1 1]); grid on xlabel('Time'); ylabel('Orig signals'); % Transform LSE result back to fft format for comparison (將 th 轉回 fft 並比較結果) F=fft(x); F2=[]; F2(1)=th(1)*N; for i=1:N/2 F2(i+1)=(th(2*i)-sqrt(-1)*th(2*i+1))*N/2; if (i==N/2) F2(i+1)=2*F2(i+1); end end % symmetric of DFT (DFT 的對稱性) for i=N/2+2:N F2(i)=F2(N-i+2)'; end error1=sum(abs(F2-F.')); % F.' is simple transpose (F.' 是不進行共軛轉換的轉置) error2=sum(abs(F2-F')); % F' is conjugate transpose (F' 是進行共軛轉換的轉置) fprintf('Errors after transforming LSE to DFT coefficients (將 LSE 轉換成 DFT 係數的誤差)：error1=%f, error2=%f\n', error1, error2); fprintf('Due to the symmetry of DFT, one of the above error terms should be zero. (由於 DFT 的對稱性，上述誤差應該有一項為零。)\n');i=1, sse=0.626965 i=2, sse=0.306336 i=3, sse=0.008109 i=4, sse=0.000000 Errors after transforming LSE to DFT coefficients (將 LSE 轉換成 DFT 係數的誤差)：error1=0.000000, error2=22.709014 Due to the symmetry of DFT, one of the above error terms should be zero. (由於 DFT 的對稱性，上述誤差應該有一項為零。)

Audio Signal Processing and Recognition (音訊處理與辨識)