3-1 Introduction to Audio Signals (音訊基本介紹)

[chinese][all]

Slides

Audio signals are generally referred to as signals that are audible to humans. Audio signals usually come from a sound source which vibrates in the audible frequency range. The vibrations push the air to form pressure waves that travels at about 340 meters per second. Our inner ears can receive these pressure signals and send them to our brain for further recognition.

There are numerous ways to classify audio signals. If we consider the source of audio signals, we can classify them into two categories:

If we consider repeated patterns within audio signals, we can classify them into another two categories:

In principle, we can divide each short segment (also known as frame, with a length of about 20 ms) of human's voices into two types:

It is very easy to distinguish between these two types of sound. When you pronunce an utterance, just put your hand on your throat to see if you feel the vibration of your vocal cords. If yes, it is voiced; otherwise it is unvoiced. You can also observe the waveforms to see if you can identify the fundamental periods. If yes, it is voiced; otherwise, it is unoviced.

The following figure shows the voiced sound of "ay" in the utterance "sunday".

Example 1: voicedFrame01.mfigure; waveFile='sunday.wav'; au=myAudioRead(waveFile); y=au.signal; fs=au.fs; nbits=au.nbits; y=y*2^nbits/2; subplot(2,1,1) time=(1:length(y))/fs; plot(time, y); axis([min(time), max(time), -2^nbits/2, 2^nbits/2]); xlabel('Time (seconds)'); ylabel('Amplitude'); title('Waveforms of "sunday"'); frameSize=512; index1=0.606*fs; index2=index1+frameSize-1; line(time(index1)*[1, 1], 2^nbits/2*[-1 1], 'color', 'r'); line(time(index2)*[1, 1], 2^nbits/2*[-1 1], 'color', 'r'); subplot(2,1,2); time2=time(index1:index2); y2=y(index1:index2); plot(time2, y2, '.-'); axis([min(time2), max(time2), -2^nbits/2, 2^nbits/2]); xlabel('Time (seconds)'); ylabel('Amplitude'); title('Waveforms of the voiced "ay" in "sunday"');

You can easiy identify the fundamental period in the closeup plot.

On the other hand, we can also observe the unvoiced sound of "s" in the utterance "sunday", as shown in the following example:

Example 2: unvoicedFrame01.mwaveFile='sunday.wav'; au=myAudioRead(waveFile); y=au.signal; fs=au.fs; nbits=au.nbits; y=y*2^nbits/2; subplot(2,1,1) time=(1:length(y))/fs; plot(time, y); axis([min(time), max(time), -2^nbits/2, 2^nbits/2]); xlabel('Time (seconds)'); ylabel('Amplitude'); title('Waveforms of "sunday"'); frameSize=512; index1=0.18*fs; index2=index1+frameSize-1; line(time(index1)*[1, 1], 2^nbits/2*[-1 1], 'color', 'r'); line(time(index2)*[1, 1], 2^nbits/2*[-1 1], 'color', 'r'); subplot(2,1,2); time2=time(index1:index2); y2=y(index1:index2); plot(time2, y2, '.-'); axis([min(time2), max(time2), -inf inf]); xlabel('Time (seconds)'); ylabel('Amplitude'); title('Waveforms of the unvoiced "s" in "sunday"');

In contract, there is no fundamental periods and the waveform is noise-like.

Hint
You can also use CoolEdit for simple recording, replay and observation of audio signals.
若要對聲音進行簡單的錄音、播放、觀察及處理,可以使用 CoolEdit 軟體。

Audio signals actually represent the air pressure as a function of time, which is a continuous in both time and signal amplitude. When we want to digitize the signals for storage in a computer, there are several parameter to consider.

Let take my utterance of sunday for example. It is a mono recording with a sample rate of 16000 (16 KHz) and a bit resolution of 16 bits (2 bytes). It also contains 15716 sample points, corresponding to a time duration of 15716/16000 = 0.98 seconds. Therefore the file size is about 15716*2 = 31432 bytes = 31.4 KB. In fact, the file size for storing audio signals is usually quite big without compression. For instance:


Audio Signal Processing and Recognition (音訊處理與辨識)