10-2 Discrete Fourier Transform (聽葚鉥?

In the previous section, we have introducde the discrete-time Fourier transform (DTFT) that transforms a discrete-time signal into a continuous function of its frequency components. Since DTFT is continuous in frequency, it is not suitable for digital processing by computers. In this section, we shall introduce discrete Fourier transform (DFT for short) that converts a discrete-time signals into its discrete-frequency components, which are suitable for further computer processing.

Suppose that the discrete-time signal can be expressed as x[n], n = 0~N-1. Then the formula for DFT is: $$ X[k]=\frac{1}{N} \sum_{n=0}^{N-1} x[n] e^{-j2 \pi \frac{nk}{N}}, k=0, ..., N-1 $$ The coefficient X[k] represents the amplitude and phase of the component of x[n], n = 0~N-1 at a specific frequency (to be detailed later). Therefore X[k], k = 0~N-1, are usually referred to as spectrum.

From X[k], we can also compute the original signal x[n], as follows: $$ x[n]=\sum_{k=0}^{N-1}X[k] e^{j2\pi \frac{nk}{N}}, n=0, ..., N-1 $$

Hint

DTFT and DFT are very similar; both are used to convert a discrete-time signal into its components at different frequencies. However, the former generates a continuous-frequency spectrum while the later produces a discrete-frequency spectrum.

Here we have several important facts about X[k]:

If the length of the original signal x[n] is N, then the length of X[k] is also N.
In general, X[k] is a complex number with a magnitude of |(X[k])| （abs(X[k]) in MATLAB） and a phase of ∠X[k] (angle(X[k]) or atan(imag(X[k])/real(X[k]) in MATLAB).
If the time-domain signal x[n] is real, then X[k] and X[N-k] are complex conjugate staisfying |(X[k])| = |(X[N-k])| and ∠X[k] = -∠X[N-k].

If x[n] is real, we can express it as follows: x[n] = X[0]
+ X[1]e^j2pn/N + X[N-1]e^j2pn(N-1)/N
+ X[2]e^j2pn2/N + X[N-2]e^j2pn(N-2)/N
+ X[3]e^j2pn3/N + X[N-3]e^j2pn(N-3)/N
+ ... For the terms involving k and N-k, we have X[k]e^j2pnk/N + X[N-k]e^j2pn(N-k)/N
= X[k]e^j2pnk/N + X[N-k]e^j2pne^-j2pnk/N
= X[k]e^j2pnk/N + X[N-k]e^-j2pnk/N
= 2 Re{X[k]e^j2pnk/N} If we use m_k and p_k to represent the magnitude and phase of X[k], respectively, then the above equation can be simplified as follows: 2 Re{m_kexp(jp_k)exp(j2pnk/N)}
= 2 Re{m_kexp(j(2pnk/N + p_k)}
= 2m_kcos(2pnk/N + p_k) In general, N is even and we can express x[n] as x[n] = X[0] + 2S_k=1^N/2-1m_kcos(2pnk/N + p_k) + m_N/2cos(pn + p_N/2) The above equation states that we can decompose x[n] into a DC component X[0] plus the summation of N/2 sinusoidal functions, where the amplitude and phase are determined by those of X[k], respectively. Therefore for a given real sequence x[n], we only need to have to observe X[k]，k = 0 ~ N/2 for spectrum analysis. This "one-side spectrum" consists of 1+N/2 sinusoidal functions, with frequencies located at 0, f_s/N, 2f_s/N, 3f_s/N, ..., N/2f_s/N (=f_s/2), with f_s as the sample rate. These sinusoidal functions are referred as the fundamental sinusoids.

If we employ the above direct method for computing DFT, the time complexity if O(n²). In 1965, a smart method based on divide-and-conquer was proposed to compute DFT with a time complexity of O(n log n). The method is called fast Fourier transform (FFT for short). In other words, FFT is an efficient method for computing DFT.

First of all, let us use the MATLAB command fft to verify the conjugate property of the DFT coefficients:

Example 1: fftSym01.m% This example demonstrates the pair-wise conjugate of DFT (此範例展示實數訊號之 DFT 係數的共軛性) N=64; % Length of vector x=randn(N, 1); z=fft(x); plot(z, 'o'); grid on %compass(z); % Connect conjugate pairs (將上下對稱的共軛點連起來) for i=2:N/2+1 twoPoint=z([i, N-i+2]); line(real(twoPoint), imag(twoPoint), 'color', 'r'); end

From the above plot, it can be seen that the DFT coefficients appear as complex conjugate pairs, which have the x-axis as the symmetric axis.

Hint

For real sequence x[n], n = 0~N-1:

If N is even, then X[0] and X[N/2] are real and all the other terms are complex conjugate pairs.
When N is odd, only X[0] is real and all the other terms are complex conjugate pairs.

If x[n] happens to be one of the fundamental sinusoids, then the DFT coefficients should have only two non-zero terms, as shown in the following example:

Example 2: fft01.m% This example demonstrates the two-side DFT of a sinusoidal function % (此範例展示一個簡單正弦波的傅立葉轉換，以雙邊頻譜來顯示) % Since the sinusoidal function has a frequency to be a multiple of fs/N, the two-side DFT have only two nonzero terms. % (此正弦波的頻率恰巧是 freqStep 的整數倍，所以雙邊頻譜應該只有兩個非零點) N = 256; % length of vector (點數) fs = 8000; % sample rate (取樣頻率) freqStep = fs/N; % freq resolution in spectrum (頻域的頻率的解析度) f = 10*freqStep; % freq of the sinusoid (正弦波的頻率，恰是 freqStep 的整數倍) time = (0:N-1)/fs; % time resolution in time-domain (時域的時間刻度) y = cos(2*pi*f*time); % signal to analyze Y = fft(y); % spectrum Y = fftshift(Y); % put zero freq at the center (將頻率軸的零點置中) % Plot time data subplot(3,1,1); plot(time, y, '.-'); title('Sinusoidal signals'); xlabel('Time (seconds)'); ylabel('Amplitude'); axis tight % Plot spectral magnitude freq = freqStep*(-N/2:N/2-1); % freq resolution in spectrum (頻域的頻率的解析度) subplot(3,1,2); plot(freq, abs(Y), '.-b'); grid on xlabel('Frequency)'); ylabel('Magnitude (Linear)'); % Plot phase subplot(3,1,3); plot(freq, angle(Y), '.-b'); grid on xlabel('Frequency)'); ylabel('Phase (Radian)');

We can observe that:

In theory, the magnitude spectrum should have only two nonzero points. In practice, those zero points are not zero exactly, but very close to zero due to truncation or round-off errors.
The phase spectrum appears to be random. This is simply because of the fact that most of the spectrum is zero, and hence the phase does not bear any significant meanings at all.

If x[n] is not one of the fundamental sinusoids, DFT will still decompose x[n] into the combination of fundamental sinusoids with spreaded magnitude spectrum:

Example 3: fft02.m% This example demonstrates the one-side DFT of a sinusoidal function (此範例展示一個簡單正弦波的傅立葉轉換，以雙邊頻譜來顯示) % Since the sinusoidal function has a frequency not a multiple of fs/N, the two-side DFT smears. (此正弦波的頻率不是 freqStep 的整數倍，所以雙邊頻譜會「散開」(Smearing)) N = 256; % length of vector (點數) fs = 8000; % sample rate (取樣頻率) freqStep = fs/N; % freq resolution in spectrum (頻域的頻率的解析度) f = 10.5*freqStep; % freq of the sinusoid (正弦波的頻率，不是 freqStep 的整數倍) time = (0:N-1)/fs; % time resolution in time-domain (時域的時間刻度) signal = cos(2*pi*f*time); % signal to analyze [mag, phase, freq]=fftTwoSide(signal, fs, 1); % compute and plot the two-side DFT

In the above example, we have used the function fftTwoSide.m (in the SAP Toolbox) for the computation of two-side spectrum, and then plot the time-domain signal, magnitude spectrum and phase spectrum.

If x[n] is real, the magnitude spectrum is symmetric while the phase spectrum is anti-symmetric. Since we are dealing with audio signals which are all real, so we only need to have one-side spectrum for our analysis. In the following example, we shall use the function fftOneSide.m (in the SAP Toolbox) to plot the time-domain signal and one-side magnitude/phase spectrum:

Example 4: fft03.m% Same as fft02.m but use one-side DFT instead (同 fft02.m，但以單邊頻譜來顯示) N = 256; % length of vector (點數) fs = 8000; % sample rate (取樣頻率) freqStep = fs/N; % freq resolution in spectrum (頻域的頻率的解析度) f = 10.5*freqStep; % freq of the sinusoid (正弦波的頻率，不是 freqStep 的整數倍) time = (0:N-1)/fs; % time resolution in time-domain (時域的時間刻度) signal = cos(2*pi*f*time); % signal to analyze [mag, phase, freq]=fftOneSide(signal, fs, 1); % Compute and plot one-side DFT

Hint

Signals from the real world are all real, so we only need to use fftOneSide.m to obtain one-side spectrum for spectrum analysis.

In the following example, we shall demonstrate the one-side spectrum for a frame of an audio signal:

Example 5: fft04.m% Display DFT of a frame in a real-world audio signal (顯示一個語音音框的單邊頻譜) au=myAudioRead('welcome.wav'); frame=au.signal(2047:2047+237-1); [mag, phase, freq]=fftOneSide(frame, au.fs, 1);

In the above example, we can observe that there are harmonics in the magnitude spectrum. This is the result due to the fact that the time-domain signal is quasi-periodic. (To make the harmonics more obvious, we have carefully selected a time-domain signal containing three fundamental periods.)

When the value of k is getting bigger, we will have the high-frequency components which are more likely to be noises in the original time-domain signal. We can actually delete high-frequency components and use only low-frequency components to approximate and synthesize the original signal, to achieve the following goals:

Data compression
Low-pass filter

In the next example, we shall demonstrate how to synthesize a square wave using sinusolids:

Example 6: fftApproximate01.m% This example demos the effect of square wave approximation by DFT figure L = 15; N = 25; x = [ones(1,L), zeros(1,N-L)]; frameSize=length(x); runNum=3; for i=1:runNum, pointNum=ceil(frameSize/(2*runNum)*i); % Actually 2*pointNum-1 coefs are taken X = fft(x); magX = abs(X); remainIndex=[1:pointNum, frameSize-pointNum+2:frameSize]; X2=0*X; X2(remainIndex)=X(remainIndex); x2=ifft(X2); x2=real(x2); subplot(3,2,2*i-1); stem(x); hold on plot(x2, 'r'); hold off title(sprintf('x[n] and %d-points approximation', 2*pointNum-1)); axis([-inf,inf,-0.5,1.5]) subplot(3,2,2*i); shiftedMagX=fftshift(magX); plot(shiftedMagX, '.-'); axis tight title('DFT of x[n]') hold on temp=ifftshift(1:frameSize); ind=temp(remainIndex); plot(ind, shiftedMagX(ind), 'or'); grid on hold off end

It is obvious that when we use more DFT coefficients for the synthesis, the reconstructed waveform is closer to the original one.

The following example uses sinusoids to approximate the waveform of an utterance:

Example 7: fftApproximate02.m% Display the effect of FFT approximation au=myAudioRead('welcome.wav'); x=au.signal(2047:2047+237-1); figure frameSize=length(x); runNum=3; for i=1:runNum, pointNum=ceil(frameSize/(8*runNum)*i); % Actually 2*pointNum-1 coefs are taken X = fft(x); magX = abs(X); remainIndex=[1:pointNum, frameSize-pointNum+2:frameSize]; X2=0*X; X2(remainIndex)=X(remainIndex); x2=ifft(X2); x2=real(x2); subplot(3,2,2*i-1); plot(x, '.-'); hold on plot(x2, 'r'); hold off title(sprintf('x[n] and %d-points approximation', 2*pointNum-1)); set(gca, 'xlim', [-inf inf]); subplot(3,2,2*i); shiftedMagX=fftshift(magX); plot(shiftedMagX, '.-'); title('DFT of x[n]') hold on temp=ifftshift(1:frameSize); ind=temp(remainIndex); plot(ind, shiftedMagX(ind), 'or'); grid on hold off set(gca, 'xlim', [-inf inf]); end

From the above example, it is known that we only need a few DFT coefficents to synthesize the original signal with satisfactory quality. This fact demonstrate that the high-frequency components are not important in reconstructing the original signal.

For perfectly periodic signals, DFT will insert zeros inbetween, as follows:

Example 8: fftRepeat01.m% This example demos the effect of FFT for purely periodic signals au=myAudioRead('welcome.wav'); x=au.signal(2047:2126); % A full fundamental period runNum=5; for i=1:runNum repeatedX = x*ones(1,i); signal = repeatedX(:)+0.1; % signal=zeros(runNum*length(x), 1); % Zero-padding version % signal(1:length(repeatedX))=repeatedX(:); % Zero-padding version [mag, phase, freq, powerDb]=fftOneSide(signal, au.fs); mag=mag/length(signal); % Divided by vector length to normalize magnitude (due to the formula used by MATLAB) subplot(runNum,2,2*i-1); plot(signal, '.-'); grid on title('x[n]'); set(gca, 'xlim', [-inf inf]); subplot(runNum,2,2*i); plot(freq, mag, '.-'); grid on; % set(gca, 'yscale', 'log'); title('DFT of x[n]'); axis tight; end

Why is it so? Without rigorous analysis, can you explain this phonomenon by intuition?

If we pad the signal x[n] with zeros, the corresponding effect in frequency domain is interpolation, as shown in the next example:

Example 9: fftZeroPadding01.m% This example demos the effect of zero-padding of DFT for i=1:3 L = 5; N = 20*i; x = [ones(1,L), zeros(1,N-L)]; subplot(3,3,i*3-2); stem(x); title(sprintf('x[n] with N=%d',N)); set(gca, 'xlim', [-inf inf]); omega=((1:N)-ceil((N+1)/2))*(2*pi/N); X = fft(x); magX = fftshift(abs(X)); subplot(3,3,i*3-1); plot(omega, magX, '.-'); title('Magnitude of DFT of x[n]') set(gca, 'xlim', [-inf inf]); phase=fftshift(angle(X)); subplot(3,3,i*3); plot(omega, phase, '.-'); title('Phase of DFT of x[n]') set(gca, 'xlim', [-inf inf]); set(gca, 'ylim', [-pi pi]); end

In other words, zero padding does not add new information to x[n]. However, DFT need to have the same length and the net result is the interpolation to have a dense sampling in the frequency domain.

Hint

For common audio signal processing, if the frame size is not 2ⁿ, then we need to use zero padding until the frame size is 2ⁿ. This is also easier for DFT since a fast method based on 2-radix FFT can be invoked directly.

If we down sample the signal in the time domain, then the result in the spectrum is shown next:

Example 10: fftReSample01.m% This example demos the effect of FFT approximation au=myAudioRead('welcome.wav'); x=au.signal(2047:2126); x=au.signal(2047:2326); n=length(x); F = (0:n/2)*au.fs/n; runNum=5; for i=1:runNum, newX=x(1:2^(i-1):length(x)); newFs=au.fs/(2^(i-1)); X = fft(newX); magX = abs(X); frameSize=length(newX); subplot(runNum,2,2*i-1); plot(newX, '.-'); title('x[n]'); set(gca, 'xlim', [-inf inf]); subplot(runNum,2,2*i); freq = (0:frameSize/2)*newFs/frameSize; magX = magX(1:length(freq)); M=nan*F; M(1:length(magX))=magX; plot(F, M, '.-'); title('DFT of x[n]') axis tight; end

The more times we do down sampling, the smoother the time-domain signal will be. Therefore the high-frequency components in the spectrum are missing gradually.

Moreover, we can express the time-domain signal as a linear combination of the fundamental sinusoids and then apply the least-square estimate to find the best coefficients of these sinusoids. It turns out that the coefficients identified by the least-square method are the same as those by fft, as shown next:

Example 11: fftViaLse01.m% FFT via least-squares estimate N=8; fs=1; time=(0:N-1)'/fs; x=rand(N,1)*2-1; A=ones(N,1); for i=1:N/2 A=[A, cos(2*pi*(i*fs/N)*time), sin(2*pi*(i*fs/N)*time)]; end th=A\x; plotNum=fix(N/2)+2; subplot(plotNum, 1, 1); N2=(N-1)*5+1; % Insert 4 points between every 2 points for better observation (兩點間插入四點，以便觀察波形) timeFine=linspace(min(time), max(time), N2); x2=th(1)*ones(N,1); plot(timeFine, th(1)*ones(N2,1), '.-', time, x2, 'or'); % Plot the first term (畫出第一項) ylabel('Term 0 (DC)'); axis([0 N/fs -1 1]); grid on for i=1:N/2 % Plot terms 2 to 1+N/2 (畫出第二至第 1+N/2 項) freq=i*fs/N; y=th(2*i)*cos(2*pi*freq*time)+th(2*i+1)*sin(2*pi*freq*time); % a term (每一項) x2=x2+y; fprintf('i=%d, sse=%f\n', i, norm(x-x2)/sqrt(N)); subplot(plotNum, 1, i+1); yFine=th(2*i)*cos(2*pi*(i*fs/N)*timeFine)+th(2*i+1)*sin(2*pi*(i*fs/N)*timeFine); % Finer verison for plotting plot(timeFine, yFine, '.-', time, y, 'or'); ylabel(sprintf('Term %d', i)); axis([0 N/fs -1 1]); grid on end % Plot the original signal (畫出原來訊號) subplot(plotNum, 1, plotNum) plot(time, x, 'o-'); axis([0 N/fs -1 1]); grid on xlabel('Time'); ylabel('Orig signals'); % Transform LSE result back to fft format for comparison (將 th 轉回 fft 並比較結果) F=fft(x); F2=[]; F2(1)=th(1)*N; for i=1:N/2 F2(i+1)=(th(2*i)-sqrt(-1)*th(2*i+1))*N/2; if (i==N/2) F2(i+1)=2*F2(i+1); end end % symmetric of DFT (DFT 的對稱性) for i=N/2+2:N F2(i)=F2(N-i+2)'; end error1=sum(abs(F2-F.')); % F.' is simple transpose (F.' 是不進行共軛轉換的轉置) error2=sum(abs(F2-F')); % F' is conjugate transpose (F' 是進行共軛轉換的轉置) fprintf('Errors after transforming LSE to DFT coefficients (將 LSE 轉換成 DFT 係數的誤差)：error1=%f, error2=%f\n', error1, error2); fprintf('Due to the symmetry of DFT, one of the above error terms should be zero. (由於 DFT 的對稱性，上述誤差應該有一項為零。)\n');i=1, sse=0.626965 i=2, sse=0.306336 i=3, sse=0.008109 i=4, sse=0.000000 Errors after transforming LSE to DFT coefficients (將 LSE 轉換成 DFT 係數的誤差)：error1=0.000000, error2=22.709014 Due to the symmetry of DFT, one of the above error terms should be zero. (由於 DFT 的對稱性，上述誤差應該有一項為零。)

Audio Signal Processing and Recognition (音訊處理與辨識)