3-1 Introduction to Audio Signals (音訊基本介紹)

[chinese][english]

Slides

Audio signals are generally referred to as signals that are audible to humans. Audio signals usually come from a sound source which vibrates in the audible frequency range. The vibrations push the air to form pressure waves that travels at about 340 meters per second. Our inner ears can receive these pressure signals and send them to our brain for further recognition.

所謂「聲音訊號」(Audio Signals)簡稱「音訊」,泛指由人耳聽到的各種聲音的訊號。一般來說,發音體會產生震動,此震動會對空氣產生壓縮與伸張的效果,形成聲波,以每秒大約 340 公尺的速度在空氣中傳播,當此聲波傳遞到人耳,耳膜會感覺到一伸一壓的壓力訊號,內耳神經再將此訊號傳遞到大腦,並由大腦解析與判讀,來分辨此訊號的意義。

There are numerous ways to classify audio signals. If we consider the source of audio signals, we can classify them into two categories:

音訊可以有很多不同的分類方式,例如,若以發音的來源,可以大概分類如下:

If we consider repeated patterns within audio signals, we can classify them into another two categories:

若以訊號的規律性,又可以分類如下:

In principle, we can divide each short segment (also known as frame, with a length of about 20 ms) of human's voices into two types:

It is very easy to distinguish between these two types of sound. When you pronunce an utterance, just put your hand on your throat to see if you feel the vibration of your vocal cords. If yes, it is voiced; otherwise it is unvoiced. You can also observe the waveforms to see if you can identify the fundamental periods. If yes, it is voiced; otherwise, it is unoviced.

以人聲而言,我們可以根據其是否具有音高而分為兩類,如下:

要分辨這兩種聲音,其實很簡單,你只要在發音時,將手按在喉嚨上,若有感到震動,就是 voiced sound,如果沒有感到震動,那就是 unvoiced sound。

The following figure shows the voiced sound of "ay" in the utterance "sunday".

下圖顯示在 "sunday" 發音中的 "ay" 部分波形,這是一個 voiced sound。

Example 1: voicedFrame01.mfigure; waveFile='sunday.wav'; au=myAudioRead(waveFile); y=au.signal; fs=au.fs; nbits=au.nbits; y=y*2^nbits/2; subplot(2,1,1) time=(1:length(y))/fs; plot(time, y); axis([min(time), max(time), -2^nbits/2, 2^nbits/2]); xlabel('Time (seconds)'); ylabel('Amplitude'); title('Waveforms of "sunday"'); frameSize=512; index1=0.606*fs; index2=index1+frameSize-1; line(time(index1)*[1, 1], 2^nbits/2*[-1 1], 'color', 'r'); line(time(index2)*[1, 1], 2^nbits/2*[-1 1], 'color', 'r'); subplot(2,1,2); time2=time(index1:index2); y2=y(index1:index2); plot(time2, y2, '.-'); axis([min(time2), max(time2), -2^nbits/2, 2^nbits/2]); xlabel('Time (seconds)'); ylabel('Amplitude'); title('Waveforms of the voiced "ay" in "sunday"');

You can easiy identify the fundamental period in the closeup plot.

你可以輕易地由目視來看出在放大波形中的基本頻率。

On the other hand, we can also observe the unvoiced sound of "s" in the utterance "sunday", as shown in the following example:

此外,你也可以觀察在發音 "sunday" 中的 unoviced sound "s",如以下範例所示:

Example 2: unvoicedFrame01.mwaveFile='sunday.wav'; au=myAudioRead(waveFile); y=au.signal; fs=au.fs; nbits=au.nbits; y=y*2^nbits/2; subplot(2,1,1) time=(1:length(y))/fs; plot(time, y); axis([min(time), max(time), -2^nbits/2, 2^nbits/2]); xlabel('Time (seconds)'); ylabel('Amplitude'); title('Waveforms of "sunday"'); frameSize=512; index1=0.18*fs; index2=index1+frameSize-1; line(time(index1)*[1, 1], 2^nbits/2*[-1 1], 'color', 'r'); line(time(index2)*[1, 1], 2^nbits/2*[-1 1], 'color', 'r'); subplot(2,1,2); time2=time(index1:index2); y2=y(index1:index2); plot(time2, y2, '.-'); axis([min(time2), max(time2), -inf inf]); xlabel('Time (seconds)'); ylabel('Amplitude'); title('Waveforms of the unvoiced "s" in "sunday"');

In contract, there is no fundamental periods and the waveform is noise-like.

我們在其放大波形中並無法觀察到基本週期的存在,其波形比較像是雜訊,並無週期性。

Hint
You can also use CoolEdit for simple recording, replay and observation of audio signals.
若要對聲音進行簡單的錄音、播放、觀察及處理,可以使用 CoolEdit 軟體。

Audio signals actually represent the air pressure as a function of time, which is a continuous in both time and signal amplitude. When we want to digitize the signals for storage in a computer, there are several parameter to consider.

聲音代表了空氣的密度隨時間的變化,基本上是一個連續的函數,但是若要將此訊號儲存在電腦裡,就必須先將此訊號數位化。一般而言,當我們將聲音儲存到電腦時,有下列幾個參數需要考慮:

Let take my utterance of sunday for example. It is a mono recording with a sample rate of 16000 (16 KHz) and a bit resolution of 16 bits (2 bytes). It also contains 15716 sample points, corresponding to a time duration of 15716/16000 = 0.98 seconds. Therefore the file size is about 15716*2 = 31432 bytes = 31.4 KB. In fact, the file size for storing audio signals is usually quite big without compression. For instance:

以我所錄的「sunday」來說,這是單聲道的聲音,取樣頻率是 16000(16 KHz),解析度是 16 Bits(2 Byte),總共包含了 15716 點(等於 15716/16000 = 0.98 秒),所以檔案大小就是 15716*2 = 31432 bytes = 31.4 KB 左右。由此可以看出聲音資料的龐大,例如:


Audio Signal Processing and Recognition (音訊處理與辨識)