Chapter 7: Exercises

Old Chinese version

欲對下列的歌聲求取音高，何者可能需要較長之音框？

聲音富有磁性的啟田
聲音溫柔婉約的小虎
聲音高亢激昂的宇升
聲音低沉如牛的大牛

對一首樂曲進行處理，下列何者最難以電腦自動達成？

找出樂曲的速度 (BPM, beats per minutes)
將大鼓聲音去除
分離節奏吉他與主奏吉他之音軌
將歌曲剪輯成十秒的手機鈴聲

(*)Frame size and peak picking of ACF: Suppose we are dealing with an audio signal of sample rate 16000Hz, and the range of human fundamental frequency (in Hz) is [100, 1000].

What is the reasonable minimum frame size (to cover at least two fundamental periods)?
What is the reasonalbe range of index (zero-based) to find the peak in ACF in order to compute the pitch?

(*)PDF formulas: Assume a frame of an audio is represented by $s(t)$, $t=0, 1, \dots, n-1$, where $n$ is the frame size. Express the formulas of the following PDFs in terms of s(t).

ACF (auto-correlation function)
AMDF (average magnitude difference function)

(*)Comparison between ACF and AMDF: What are the two primary advantages of AMDF over ACF in term of computation.
(*)Tapering effect of ACF:

What is the tapering effect of the ACF function?
Give a modified ACF formula to avoid the tapering effect, assuming an audio frame is represented by $s(t)$, $t=0, 1, \dots, n-1$.

(*)Frequency to semitone conversion: Write down the formula that converts a fundamental frequency (in Hz) to pitch in semitone.
(**)Frame-to-ACF computation: Write an m-file function myFrame2acf.m to compute ACF from a given frame, with the following usage:
acf = myFrame2acf(frame);
where frame is the input frame, and acf is the output ACF. The length of acf should be the same as that of frame. (Hint: You can check out the function frame2acf.m in the SAP Toolbox. You can also use xcorr.m in the Signal Processing Toolbox to complete this exercise.)
(**)Frame-to-AMDF computation: Write an m-file function myFrame2amdf.m to compute AMDF from a given frame, with the following usage:
amdf = myFrame2amdf(frame);
where frame is the input frame and amdf is the output AMDF. The length of amdf should be the same as that of frame. (Hint: You can check out the function frame2amdf.m in the SAP Toolbox.)
(**)Frame-to-AMDF/ACF computation: Write an m-file function myFrame2amdfOverAcf to compute AMDF over ACF from a input frame, with the following usage:
amdfOverAcf = myFrame2amdfOverAcf(frame);
Where frame is the input frame, and amdfOverAcf is the output AMDF/ACF. The length of amdfOverAcf should be the same as that of frame. Use this function to plot the curve of AMDF/ACF similar to this example. Does this method better than ACF or AMDF alone? If not, what methods can be used to improve the performance? (Hint: Please refer to frame2acfOverAmdf.m in the SAP Toolbox.)
(**)ACF-to-pitch computation: Write an m-file function myAcf2pitch.m which computes the pitch from a given vector of ACF, with the usage:
pitch = myAcf2pitch(acf, fs, plotOpt);
where acf is the input ACF vector, fs is the sample rate, pitch is the output pitch value in semitones. If plotOpt is not zero, your function needs to plot ACF and the selected pitch point.
(**)AMDF-to-pitch computation: Write an m-file function myAmdf2pitch.m which computes the pitch from a given vector of AMDF, with the usage:
pitch = myAmdf2pitch(amdf, fs, plotOpt);
where amdf is the input AMDF vector, fs is the sample rate, pitch is the output pitch value in semitone. If plotOpt is not zero, your function needs to plot AMDF and the selected pitch point.
(**)Frame-to-pitch computation: Write an m-file function myFrame2pitch which computes the pitch from a given frame using various methods, with the usage:
pitch = myFrame2pitch(frame, fs, method, plotOpt);
where frame is the vector of input frame, fs is the sample rate, method is the used pitch tracking method ('acf' for ACF, 'amdf' for AMDF, etc), pitch is the output pitch value in semitone. If plotOpt is not zero, your function needs to plot the frame, the ACF or AMDF curve, and the selected pitch point. Moreover, you function should have the capability for self demo. Please refer to frame2acf.m or frame2amdf.m in the SAP Toolbox. (Hint: You will use several functions in the SAP Toolbox, including freq2pitch.m癒Bframe2acf.m癒Bframe2amdf.m, etc. Moreover, you will also use myAcf2pitch.m and myAmdf2pitch.m in the previous exercises.)
(**)Computation, display, and playback of pitch by ACF: Before trying this exercise, you should fully understand this example since this exercise follow the example closely. In this exercise, you are request to write an m-file script myWave2PitchByAcf01.m which accomplish the following tasks:

Record your own singing for 8 seconds, with 16 KHz, 16 bits, mono, and save it to a file test.wav. (You can use speech instead of singing, but the pitch tracking will be harder.)
Read test.wav and do frame blocking with a frame size of 512 and a overlap of 0.
Compute the volume of each frame and find the volume threshold.
Compute ACF for each frame.
Identify the pitch point and compute the pitch in semitone.
Process the identify pitch vector to make it smooth, and to remove unlikely pitch. Possible methods include:

If a frame has a volume lower than the volume threshold, set the corresponding pitch to zero.
If an element in the pitch vector is out of the range of human voices, set it to zero.
If an element in the pitch vector goes too high or too low compared with its neighbors, set it to the avarage of its neighbors (assuming its neighbors have similar pitch.)
Smooth the pitch vector using median filter. (The corresponding command is median.)
Any other methods that you can think of to do better post-processing on the pitch vector.

Plot the result with three subplots in a figure:

The first subplot is the original waveform of the audio signals.
The second subplot is the volume.
The third subplot is the identified pitch vector.
All three subplots should have the same time axis in terms of second.
Play the identify pitch vector. (Hint: you can use pvPlay.m in the SAP Toolbox.)
You need to fine-tune your program such that the identify pitch should be as close as possible to the original singing. You need to demo the following items to TA:

Playback of test.wav.
Plots mentioned earlier.
Playback of the pitch vector.

(***)Computation, display, and playback of pitch by AMDF: Write an m-file script myWave2PitchByAmdf01.m to repeat the previous exercise, but use AMDF instead. Compare your result with that of the previous example. (Hint: This is harder than the previous one since there might be several equally good minimum points to select.)
(***)Computation, display, and playback of pitch by ACF/AMDF: Write an m-file script myWave2PitchByAmdf01.m to repeat the previous exercise, but use ACF/AMDF instead. Compare your result with that of the previous example. (Hint: This is harder than the previous one since there might be several equally good minimum points to select.)
(***)Wave-to-pitch computation: Write an m-file function myWave2pitch.m which compute a pitch vector from a given stream of audio signals, with the usage:
pitch = myWave2pitch(wave, fs, frameSize, overlap, method);
where wave is input audio signals, fs is the sample rate, frameSize is the frame size in samples, overlap is the overlap in samples, and method specifies the method used for pitch tracking: 'acf' for ACF and 'amdf' for AMDF. If there is no pitch in a given frame, the corresponding element in the output pitch vector should be zero. (Hint: you should try the previous exercise before attemping this one.)
(**)Pitch verification via simultaneous playback: Please follow the steps to do pitch verification via simultaneous playback:

Record a 10-sec clip of your singing, with 16-kHz sample rate and 16-bit resolution.
Perform pitch tracking on the clip using ptByPf.m (in SAP toolbox).
Convert the pitch vector into audio signals using pv2wave.m (in SAP toolbox).
Create a stereo wav file where left channel is the original singing, and right channel is the synthesized signals from the pitch vector. Play the file to see if the pitch is correct. If not, modify the parameters to ptByPf.m such that a correct pitch vector can be identified.

(*)Recording task: Children's songs: This recording task requires you to do recordings of children's songs and to manually label the pitch of the recordings. Please refer to this page for more details.
(*)Recording task: Youtube pop song recording: This recording task requires you to do recordings of pop songs that are available on Youtube. Please refer to this page for more details.
(*)Recording task: Tapping recording: This recording task requires you to do recordings of your tapping at the onset of each music note of a song. And then you need to manually the onset using CoolEdit. Please refer to this page for more details.
(***)Programming contest: Pitch tracking: Detailed description.
(***)Programming contest: SU/V detection: Detailed description.

Audio Signal Processing and Recognition (音訊處理與辨識)