3-3 Human Voice Production (人聲?產??


The procedure from human voice production to voice recognition involves the following steps:

  1. Rapid open and close of your vocal cords (or glottis) to generate the vibration in air flow.
  2. Resonance of the pharyngeal cavity, nasal cavity, and oral cavity.
  3. The vibration of air.
  4. The vibration of the ear drum (or tympanum).
  5. The reception of the inner ear.
  6. The recognition by the brain.
The following diagram demonstrate the production mechanism for human voices.

The production mechanism of human voices.
Due to the pressure of the glottis and the air pushed from the lungs, the vocal cords can open and close very quickly, which generates vibrations in the air. The vibration is modulated by the resonances of pharyngeal/nasal/oral cavities, forming different timbre of your voices. In other words:

The following figure demonstrates the airflow velocity around the glottis and the voice signals measured around the mouth.

Airflow velocity around the glottis and the resultant voices signals

You can observe the movement of the vocal cords from the following link:

http://www.humnet.ucla.edu/humnet/linguistics/faciliti/demos/vocalfolds/vocalfolds.htm ]local copy^

In fact, it is not easy to capture the movements of vocal cords due to its high frequency in movement. So we need to have high-speed cameras for such purpose, for instance:

http://www.kayelemetrics.com/Product%20Info/9700/9700.htm ]local copy^

We can conceive the production of human voices as a source-filter model where the source is the airflow caused by the vocal cords, and the filter includes the pharyngeal/nasal/oral cavities. The following figure shows the representative spectrum for each stage:

Source-filter model and the corresponding spectra

We can also use the following block diagram to represent the source-filter model of human voice production:

Block diagram representation of source-filter model

In general, a regular vibration of the glottis will generate quasi-periodic voiced sounds. On the other hand, if the source is irregular airflow, then we will have unvoiced sounds. Take the utterance of "six" for example:

Unvoiced and voiced sounds

We can clearly observe that "s" and "k" are unvoiced sounds, while "i" is a voiced sound.

For Mandarin, almsot all unvoiced sounds happen at the beginning of a syllable. Take the utterance of "M" as in "Mؤj" for example:

  1. No vibration from the glottis. Close your teech and push forward your tongue tip against the lower teeth to generate the unvoiced sound "" by a jet of airflow.
  2. Keep almost the sampe position but start glottis vibration to pronunce the voiced "".
  3. Keep glottis vibrate but retract your tongue to pronuced the final voiced "".

Just put your hand on your throat, you can feel the vibration of the glottis.

Here are some terminologies in both English and Chinese for your reference:

  1. CochleaGս
  2. PhonemeGB
  3. PhonicsGnǡFn¦оǪk]Hn¦iӱЫrоǪk^
  4. PhoneticsGy
  5. PhonologyGtǡByt
  6. ProsodyG߾ǡF@֪k
  7. SyllableG`
  8. ToneG
  9. AlveolarGѭ
  10. SilenceGR
  11. NoiseGT
  12. GlottisGn
  13. larynxGY
  14. PharynxG|Y
  15. PharyngealG|Aﭵ
  16. VelumGnE
  17. Vocal chordsGna
  18. Glottis: n
  19. EsophagusG
  20. DiaphragmGj
  21. TracheaG

Audio Signal Processing and Recognition (TBzP)