In the previous section, we have introducde the discrete-time Fourier transform (DTFT) that transforms a discrete-time signal into a continuous function of its frequency components. Since DTFT is continuous in frequency, it is not suitable for digital processing by computers. In this section, we shall introduce

discrete Fourier transform(DFTfor short) that converts a discrete-time signals into its discrete-frequency components, which are suitable for further computer processing.Suppose that the discrete-time signal can be expressed as x[n], n = 0~N-1. Then the formula for DFT is: $$ X[k]=\frac{1}{N} \sum_{n=0}^{N-1} x[n] e^{-j2 \pi \frac{nk}{N}}, k=0, ..., N-1 $$ The coefficient X[k] represents the amplitude and phase of the component of x[n], n = 0~N-1 at a specific frequency (to be detailed later). Therefore X[k], k = 0~N-1, are usually referred to as

spectrum.From X[k], we can also compute the original signal x[n], as follows: $$ x[n]=\sum_{k=0}^{N-1}X[k] e^{j2\pi \frac{nk}{N}}, n=0, ..., N-1 $$

Here we have several important facts about X[k]:

If x[n] is real, we can express it as follows:

- If the length of the original signal x[n] is N, then the length of X[k] is also N.
- In general, X[k] is a complex number with a magnitude of |(X[k])| ¡]abs(X[k]) in MATLAB¡^ and a phase of ¡çX[k] (angle(X[k]) or atan(imag(X[k])/real(X[k]) in MATLAB).
- If the time-domain signal x[n] is real, then X[k] and X[N-k] are complex conjugate staisfying |(X[k])| = |(X[N-k])| and ¡çX[k] = -¡çX[N-k].
x[n] = X[0] For the terms involving k and N-k, we have

+ X[1]e^{j2pn/N}+ X[N-1]e^{j2pn(N-1)/N}

+ X[2]e^{j2pn2/N}+ X[N-2]e^{j2pn(N-2)/N}

+ X[3]e^{j2pn3/N}+ X[N-3]e^{j2pn(N-3)/N}

+ ...X[k]e If we use m^{j2pnk/N}+ X[N-k]e^{j2pn(N-k)/N}

= X[k]e^{j2pnk/N}+ X[N-k]e^{j2pn}e^{-j2pnk/N}

= X[k]e^{j2pnk/N}+ X[N-k]e^{-j2pnk/N}

= 2 Re{X[k]e^{j2pnk/N}}_{k}and p_{k}to represent the magnitude and phase of X[k], respectively, then the above equation can be simplified as follows:2 Re{m In general, N is even and we can express x[n] as_{k}exp(jp_{k})exp(j2pnk/N)}

= 2 Re{m_{k}exp(j(2pnk/N + p_{k})}

= 2m_{k}cos(2pnk/N + p_{k})x[n] = X[0] + 2S The above equation states that we can decompose x[n] into a DC component X[0] plus the summation of N/2 sinusoidal functions, where the amplitude and phase are determined by those of X[k], respectively. Therefore for a given real sequence x[n], we only need to have to observe X[k]¡Ak = 0 ~ N/2 for spectrum analysis. This "one-side spectrum" consists of 1+N/2 sinusoidal functions, with frequencies located at 0, f_{k=1}^{N/2-1}m_{k}cos(2pnk/N + p_{k}) + m_{N/2}cos(pn + p_{N/2})_{s}/N, 2f_{s}/N, 3f_{s}/N, ..., N/2f_{s}/N (=f_{s}/2), with f_{s}as the sample rate. These sinusoidal functions are referred as the fundamental sinusoids.If we employ the above direct method for computing DFT, the time complexity if O(n

^{2}). In 1965, a smart method based on divide-and-conquer was proposed to compute DFT with a time complexity of O(n log n). The method is calledfast Fourier transform(FFTfor short). In other words, FFT is an efficient method for computing DFT.First of all, let us use the MATLAB command fft to verify the conjugate property of the DFT coefficients:

From the above plot, it can be seen that the DFT coefficients appear as complex conjugate pairs, which have the x-axis as the symmetric axis.

If x[n] happens to be one of the fundamental sinusoids, then the DFT coefficients should have only two non-zero terms, as shown in the following example:

We can observe that:If x[n] is not one of the fundamental sinusoids, DFT will still decompose x[n] into the combination of fundamental sinusoids with spreaded magnitude spectrum:

- In theory, the magnitude spectrum should have only two nonzero points. In practice, those zero points are not zero exactly, but very close to zero due to truncation or round-off errors.
- The phase spectrum appears to be random. This is simply because of the fact that most of the spectrum is zero, and hence the phase does not bear any significant meanings at all.

In the above example, we have used the function fftTwoSide.m (in the SAP Toolbox) for the computation of two-side spectrum, and then plot the time-domain signal, magnitude spectrum and phase spectrum.If x[n] is real, the magnitude spectrum is symmetric while the phase spectrum is anti-symmetric. Since we are dealing with audio signals which are all real, so we only need to have one-side spectrum for our analysis. In the following example, we shall use the function fftOneSide.m (in the SAP Toolbox) to plot the time-domain signal and one-side magnitude/phase spectrum:

In the following example, we shall demonstrate the one-side spectrum for a frame of an audio signal:

In the above example, we can observe that there areharmonicsin the magnitude spectrum. This is the result due to the fact that the time-domain signal is quasi-periodic. (To make the harmonics more obvious, we have carefully selected a time-domain signal containing three fundamental periods.)When the value of k is getting bigger, we will have the high-frequency components which are more likely to be noises in the original time-domain signal. We can actually delete high-frequency components and use only low-frequency components to approximate and synthesize the original signal, to achieve the following goals:

In the next example, we shall demonstrate how to synthesize a square wave using sinusolids:

- Data compression
- Low-pass filter

It is obvious that when we use more DFT coefficients for the synthesis, the reconstructed waveform is closer to the original one.The following example uses sinusoids to approximate the waveform of an utterance:

From the above example, it is known that we only need a few DFT coefficents to synthesize the original signal with satisfactory quality. This fact demonstrate that the high-frequency components are not important in reconstructing the original signal.For perfectly periodic signals, DFT will insert zeros inbetween, as follows:

Why is it so? Without rigorous analysis, can you explain this phonomenon by intuition?If we pad the signal x[n] with zeros, the corresponding effect in frequency domain is interpolation, as shown in the next example:

In other words, zero padding does not add new information to x[n]. However, DFT need to have the same length and the net result is the interpolation to have a dense sampling in the frequency domain.

If we down sample the signal in the time domain, then the result in the spectrum is shown next:

The more times we do down sampling, the smoother the time-domain signal will be. Therefore the high-frequency components in the spectrum are missing gradually.Moreover, we can express the time-domain signal as a linear combination of the fundamental sinusoids and then apply the least-square estimate to find the best coefficients of these sinusoids. It turns out that the coefficients identified by the least-square method are the same as those by fft, as shown next:

Audio Signal Processing and Recognition (µ°T³B²z»P¿ëÃÑ)