5-4 Pitch (?喲?)

(請注意：中文版本並未隨英文版本同步更新！)

「音高」（Pitch）是另一個音訊裡面很重要的特徵，直覺地說，音高代表聲音頻率的高低，而此頻率指的是「基本頻率」（Fundamental Frequency），也就是「基本週期」（Fundamental Period）的倒數。

若直接觀察音訊的波形，只要聲音穩定，我們並不難直接看到基本週期的存在，以一個 3 秒的音叉聲音來說，我們可以取一個 256 點的音框，將此音框畫出來後，就可以很明顯地看到基本週期，請見下列範例：

Example 1: framePitchDisp4tuningFork01.mwaveFile='tuningFork01.wav'; au=myAudioRead(waveFile); y=au.signal; fs=au.fs; index1=11000; frameSize=256; index2=index1+frameSize-1; frame=y(index1:index2); subplot(2,1,1); plot(y); grid on xlabel('Sample index'); ylabel('Amplitude'); title(['Waveform of ', waveFile]); axis([1, length(y), -1 1]); subplot(2,1,2); plot(frame, '.-'); grid on xlabel('Sample index within frame'); ylabel('Amplitude'); point=[7, 226]; % Peaks axis([1, length(frame), -1 1]); periodCount=6; fp=((point(2)-point(1))/periodCount); % fundamental period ff=fs/fp; % fundamental frequency pitch=69+12*log2(ff/440); fprintf('Fundamental period (fp) = (%g-%g)/%g = %g points\n', point(2), point(1), periodCount, fp); fprintf('Fundamental frequency (ff) = %g/%g = %g Hz\n', fs, fp, ff); fprintf('Pitch = %g semitone\n', pitch); % === For plotting arrows, etc % ====== Frame boundary subplot(211); line(index1*[1 1], [-1 1], 'color', 'r', 'linewidth', 1); line(index2*[1 1], [-1 1], 'color', 'r', 'linewidth', 1); % ====== FP coverage subplot(212); line(point, frame(point), 'marker', 'o', 'color', 'red'); % ====== Axis locations subplot(211); loc1=get(gca, 'position'); subplot(212); loc2=get(gca, 'position'); % ====== arrow 1 x1=[loc1(1)+(index1(1)-1)/(length(y)-1)*loc1(3), loc2(1)]; y1=[loc1(2), loc2(2)+loc2(4)]; ah=annotation('arrow', x1, y1, 'color', 'r', 'linewidth', 1); % ======= arrow 2 x2=[loc1(1)+(index2-1)/(length(y)-1)*loc1(3), loc2(1)+loc2(3)]; y2=[loc1(2), loc2(2)+loc2(4)]; ah=annotation('arrow', x2, y2, 'color', 'r', 'linewidth', 1); % ====== Texts indicating start/end indices h1=text(point(1), frame(point(1)), [' \leftarrow index=', int2str(point(1))], 'rotation', 30); h2=text(point(2), frame(point(2)), [' \leftarrow index=', int2str(point(2))], 'rotation', 30); Fundamental period (fp) = (226-7)/6 = 36.5 points Fundamental frequency (ff) = 16000/36.5 = 438.356 Hz Pitch = 68.9352 semitone

在上述範例中，上圖紅線的位置代表音框的位置，下圖即是 256 點的音框，其中紅線部分包含了 5 個基本週期，總共佔掉了 182 單位點，因此對應的基本頻率是 fs/(182/5) = 16000/(182/5) = 439.56 Hz，相當於 68.9827 半音（Semitone），其中由基本頻率至半音的轉換公式如下：

semitone = 69 + 12*log₂(frequency/440)

換句話說，當基本頻率是 440 Hz 時，對應到的半音差是 69，這就是鋼琴的「中央 La」或是「A4」，請見下圖。

Hint

The fundamental frequency of the tuning fork is designed to be 440 Hz. Hence the tuning fork are usually used to fine tune the pitch of a piano.

一般音叉的震動頻率非常接近 440 Hz，因此我們常用音叉來校正鋼琴的音準。

上述公式所轉換出來的半音差，也是 MIDI 音樂檔案所用的標準。從上述公式也可以看出：

每個全音階包含 12 個半音（七個白鍵和五個黑鍵）。
每向上相隔一個全音階，頻率會變成兩倍。例如，中央 la 是 440 Hz（69 Semitones），向上平移一個全音階之後，頻率就變成 880 Hz（81 Semitones）。
人耳對音高的「線性感覺」是隨著基本頻率的對數值成正比。

音叉的聲音非常乾淨，整個波形非常接近弦波，所以基本週期顯而易見。若以我的聲音「清華大學資訊系」來說，我們可以將「華」的部分放大，也可以明顯地看到基本週期，請見下列範例：

Example 2: framePitchDisp4speech01.mwaveFile='csNthu.wav'; au=myAudioRead(waveFile); y=au.signal; fs=au.fs; index1=11050; frameSize=512; index2=index1+frameSize-1; frame=y(index1:index2); subplot(2,1,1); plot(y); grid on xlabel('Sample index'); ylabel('Amplitude'); title(['Waveform of ', waveFile]); axis([1, length(y), -1 1]); subplot(2,1,2); plot(frame, '.-'); grid on xlabel('Sample index within frame'); ylabel('Amplitude'); point=[83, 485]; % Peaks point=[75, 477]; % Valleys axis([1, length(frame), -1 1]); periodCount=3; fp=((point(2)-point(1))/periodCount); % fundamental period ff=fs/fp; % fundamental frequency pitch=69+12*log2(ff/440); fprintf('Fundamental period (fp) = (%g-%g)/%g = %g points\n', point(2), point(1), periodCount, fp); fprintf('Fundamental frequency (ff) = %g/%g = %g Hz\n', fs, fp, ff); fprintf('Pitch = %g semitone\n', pitch); % === For plotting arrows, etc % ====== Frame boundary subplot(211); line(index1*[1 1], [-1 1], 'color', 'r', 'linewidth', 1); line(index2*[1 1], [-1 1], 'color', 'r', 'linewidth', 1); % ====== FP coverage subplot(212); line(point, frame(point), 'marker', 'o', 'color', 'red'); % ====== Axis locations subplot(211); loc1=get(gca, 'position'); subplot(212); loc2=get(gca, 'position'); % ====== arrow 1 x1=[loc1(1)+(index1(1)-1)/(length(y)-1)*loc1(3), loc2(1)]; y1=[loc1(2), loc2(2)+loc2(4)]; ah=annotation('arrow', x1, y1, 'color', 'r', 'linewidth', 1); % ======= arrow 2 x2=[loc1(1)+(index2-1)/(length(y)-1)*loc1(3), loc2(1)+loc2(3)]; y2=[loc1(2), loc2(2)+loc2(4)]; ah=annotation('arrow', x2, y2, 'color', 'r', 'linewidth', 1); % ====== Texts indicating start/end indices h1=text(point(1), frame(point(1)), [' \leftarrow index=', int2str(point(1))], 'rotation', -10); h2=text(point(2), frame(point(2)), [' \leftarrow index=', int2str(point(2))], 'rotation', -10); Fundamental period (fp) = (477-75)/3 = 134 points Fundamental frequency (ff) = 16000/134 = 119.403 Hz Pitch = 46.42 semitone

上列範例的下圖，是從「華」的韻母附近抓出來的 512 點的音框，其中紅線部分包含了 3 個基本週期，總共佔掉了 402 單位點，因此對應的基本頻率是 fs/(402/3) = 16000/(402/3) = 119.403 Hz，相當於 46.420 半音，與「中央 La」差了 22.58 個半音，接近但還不到兩個全音階（24 個半音）。

在觀察音訊波形時，每一個基本週期的開始點，我們稱為「音高基準點」（Pitch Marks，簡稱 PM），PM 大部分是波形的局部最大點或最小點，例如在上述音叉的範例中，我們抓取的兩個 PM 是局部最大點，而在我的聲音的範例中，由於 PM 在局部最大點並不明顯，因此我們抓取了兩個局部最小點的 PM 來計算音高。PM 通常用來調節一段聲音的音高，在語音合成方面很重要。

由於生理構造不同，男女生的音高範圍並不相同，一般而言：

男生的音高範圍約在 35 ~ 72 半音，對應的頻率是 62 ~ 523 Hz。
女生的音高範圍約在 45 ~ 83 半音，對應的頻率是 110 ~ 1000 Hz。

但是我們分辨男女的聲並不是只憑音高，而還是依照音色（共振峰），詳見後續說明。

使用「觀察法」來算出音高，並不是太難的事，但是若要電腦自動算出音高，就需要更深入的研究。有關音高追蹤的各種方法，會在後續章節詳細介紹。

Audio Signal Processing and Recognition (音訊處理與辨識)