3-3 Human Voice Production (Hn)

[chinese][english]

The procedure from human voice production to voice recognition involves the following steps:

  1. Rapid open and close of your vocal cords (or glottis) to generate the vibration in air flow.
  2. Resonance of the pharyngeal cavity, nasal cavity, and oral cavity.
  3. The vibration of air.
  4. The vibration of the ear drum (or tympanum).
  5. The reception of the inner ear.
  6. The recognition by the brain.
The following diagram demonstrate the production mechanism for human voices.

人聲的發音與接收流程,可以列出如下:

  1. 聲門的快速打開與關閉
  2. 聲道、口腔、鼻腔的共振
  3. 空氣的波動
  4. 接收者耳膜的振動
  5. 內耳神經的接收
  6. 大腦的辨識
下列圖形說明人聲的發音機制:

The production mechanism of human voices.
人聲的發音機制
Due to the pressure of the glottis and the air pushed from the lungs, the vocal cords can open and close very quickly, which generates vibrations in the air. The vibration is modulated by the resonances of pharyngeal/nasal/oral cavities, forming different timbre of your voices. In other words:

由於聲門(Glottis)的肌肉張力,加上由肺部壓迫出來的空氣,就會造成聲門的快速打開與關閉,這個一疏一密的空氣壓力,就是人聲的源頭,在經由聲道、口腔、鼻腔的共振,就會產生不同的聲音(音色)。換句話說:

The following figure demonstrates the airflow velocity around the glottis and the voice signals measured around the mouth.

下面這一張圖,顯示聲門附近的空氣流速,以及最後在嘴巴附近所量測到的聲波:


Airflow velocity around the glottis and the resultant voices signals

You can observe the movement of the vocal cords from the following link:

經由下面這個連結,可以看到聲門運動的現象:

http://www.humnet.ucla.edu/humnet/linguistics/faciliti/demos/vocalfolds/vocalfolds.htmlocal copy

In fact, it is not easy to capture the movements of vocal cords due to its high frequency in movement. So we need to have high-speed cameras for such purpose, for instance:

要拍到聲門運動,是相當不容易,必須使用高速的攝影機,例如

http://www.kayelemetrics.com/Product%20Info/9700/9700.htmlocal copy

We can conceive the production of human voices as a source-filter model where the source is the airflow caused by the vocal cords, and the filter includes the pharyngeal/nasal/oral cavities. The following figure shows the representative spectrum for each stage:

所以人發音的過程,是由訊號源(聲門),經過濾波器(口腔、鼻腔、嘴型等),才得到最後的聲音,這個過程可以和頻譜訊號一一對應如下:


Source-filter model and the corresponding spectra
人聲發音過程與與頻譜的對應

We can also use the following block diagram to represent the source-filter model of human voice production:

若用數學模型表示,可用下列方塊圖:


Block diagram representation of source-filter model
人聲發音過程的數學模型

In general, a regular vibration of the glottis will generate quasi-periodic voiced sounds. On the other hand, if the source is irregular airflow, then we will have unvoiced sounds. Take the utterance of "six" for example:

一般來說,當訊號源是間隔規律的波形時,通常代表有聲音,如果訊號源是雜亂的訊號,則得到氣音,以下列的發音「six」為例:


Unvoiced and voiced sounds
氣音和有聲音

We can clearly observe that "s" and "k" are unvoiced sounds, while "i" is a voiced sound.

其中「s」和「k」都是無聲的氣音,只有「i」是有聲音。

For Mandarin, almsot all unvoiced sounds happen at the beginning of a syllable. Take the utterance of "清" as in "清華大學" for example:

  1. No vibration from the glottis. Close your teech and push forward your tongue tip against the lower teeth to generate the unvoiced sound "ㄑ" by a jet of airflow.
  2. Keep almost the sampe position but start glottis vibration to pronunce the voiced "ㄧ".
  3. Keep glottis vibrate but retract your tongue to pronuced the final voiced "ㄥ".

一般而言,中文的氣音只發生在字頭,不會是在字尾。以「清華大學」的「清」為例:

  1. 聲門不震動,上下顎咬合,舌頭前伸,完全是氣音,發出「ㄑ」
  2. 姿勢類似,聲門震動,發出「ㄧ」。
  3. 聲門維持同樣的震動,但是舌頭後縮,發出「ㄥ」。

Hint
Just put your hand on your throat, you can feel the vibration of the glottis.
若要判斷你的聲門是否有震動,只要將手放在你的喉嚨位置,就可以感覺到聲門是否有震動。

Here are some terminologies in both English and Chinese for your reference:

以下是一些名詞的中英對照表:

  1. Cochlea:耳蝸
  2. Phoneme:音素、音位
  3. Phonics:聲學;聲音基礎教學法(以聲音為基礎進而教拼字的教學法)
  4. Phonetics:語音學
  5. Phonology:音系學、語音體系
  6. Prosody:韻律學;作詩法
  7. Syllable:音節
  8. Tone:音調
  9. Alveolar:齒槽音
  10. Silence:靜音
  11. Noise:雜訊
  12. Glottis:聲門
  13. larynx:喉頭
  14. Pharynx:咽頭
  15. Pharyngeal:咽部的,喉音的
  16. Velum:軟顎
  17. Vocal chords:聲帶
  18. Glottis: 聲門
  19. Esophagus:食管
  20. Diaphragm:橫隔膜
  21. Trachea:氣管

Audio Signal Processing and Recognition (音訊處理與辨識)