3-3 Human Voice Production (人聲?產??

[chinese][english]

The procedure from human voice production to voice recognition involves the following steps:

  1. Rapid open and close of your vocal cords (or glottis) to generate the vibration in air flow.
  2. Resonance of the pharyngeal cavity, nasal cavity, and oral cavity.
  3. The vibration of air.
  4. The vibration of the ear drum (or tympanum).
  5. The reception of the inner ear.
  6. The recognition by the brain.
The following diagram demonstrate the production mechanism for human voices.

HnoPy{AiHCXpUG

  1. nֳt}P
  2. nDBfġBĪ@
  3. Ů𪺪i
  4. ̦ս
  5. կg
  6. j
UCϧλHnoG

The production mechanism of human voices.
Hno
Due to the pressure of the glottis and the air pushed from the lungs, the vocal cords can open and close very quickly, which generates vibrations in the air. The vibration is modulated by the resonances of pharyngeal/nasal/oral cavities, forming different timbre of your voices. In other words:

ѩn]Glottis^٦ױiOA[WѪͳXӪŮAN|ynֳt}PAoӤ@@KŮOANOHnYAbgnDBfġBĪ@AN|ͤPn]^CyܻG

The following figure demonstrates the airflow velocity around the glottis and the voice signals measured around the mouth.

Uo@iϡAn񪺪ŮytAHγ̫bLڪҶq쪺niG


Airflow velocity around the glottis and the resultant voices signals

You can observe the movement of the vocal cords from the following link:

gѤUoӳsAiHݨnBʪ{HG

http://www.humnet.ucla.edu/humnet/linguistics/faciliti/demos/vocalfolds/vocalfolds.htm ]local copy^

In fact, it is not easy to capture the movements of vocal cords due to its high frequency in movement. So we need to have high-speed cameras for such purpose, for instance:

nnBʡAO۷eAϥΰtvAҦp

http://www.kayelemetrics.com/Product%20Info/9700/9700.htm ]local copy^

We can conceive the production of human voices as a source-filter model where the source is the airflow caused by the vocal cords, and the filter includes the pharyngeal/nasal/oral cavities. The following figure shows the representative spectrum for each stage:

ҥHHoL{AOѰT]n^AgLoi]fġBġBL^A~o̫᪺nAoӹL{iHMWаT@@pUG


Source-filter model and the corresponding spectra
HnoL{PPWЪ

We can also use the following block diagram to represent the source-filter model of human voice production:

YμƾǼҫܡAiΤUCϡG


Block diagram representation of source-filter model
HnoL{ƾǼҫ

In general, a regular vibration of the glottis will generate quasi-periodic voiced sounds. On the other hand, if the source is irregular airflow, then we will have unvoiced sounds. Take the utterance of "six" for example:

@ӻATOjWߪiήɡAq`NnApGTOêTAho𭵡AHUCousixvҡG


Unvoiced and voiced sounds
𭵩Mn

We can clearly observe that "s" and "k" are unvoiced sounds, while "i" is a voiced sound.

䤤usvMukvOLn𭵡AuuivOnC

For Mandarin, almsot all unvoiced sounds happen at the beginning of a syllable. Take the utterance of "M" as in "Mؤj" for example:

  1. No vibration from the glottis. Close your teech and push forward your tongue tip against the lower teeth to generate the unvoiced sound "" by a jet of airflow.
  2. Keep almost the sampe position but start glottis vibration to pronunce the voiced "".
  3. Keep glottis vibrate but retract your tongue to pronuced the final voiced "".

@ӨA媺𭵥uoͦbrYA|ObrCHuMؤjǡvuMvҡG

  1. n_ʡAWUErXAYeAO𭵡AoXuv
  2. An_ʡAoXuvC
  3. nP˪_ʡAOYYAoXuvC

Hint
Just put your hand on your throat, you can feel the vibration of the glottis.
YnP_AnO__ʡAunNbAVmANiHPınO__ʡC

Here are some terminologies in both English and Chinese for your reference:

HUO@ǦW^ӪG

  1. CochleaGս
  2. PhonemeGB
  3. PhonicsGnǡFn¦оǪk]Hn¦iӱЫrоǪk^
  4. PhoneticsGy
  5. PhonologyGtǡByt
  6. ProsodyG߾ǡF@֪k
  7. SyllableG`
  8. ToneG
  9. AlveolarGѭ
  10. SilenceGR
  11. NoiseGT
  12. GlottisGn
  13. larynxGY
  14. PharynxG|Y
  15. PharyngealG|Aﭵ
  16. VelumGnE
  17. Vocal chordsGna
  18. Glottis: n
  19. EsophagusG
  20. DiaphragmGj
  21. TracheaG

Audio Signal Processing and Recognition (TBzP)