5-2 Zero Crossing Rate

[chinese][english]

Zero-crossing rate (ZCR) is another basic acoustic feature that can be computed easily. It is equal to the number of zero-crossing of the waveform within a given frame. ZCR has the following characteristics:

「過零率」(Zero Crossing Rate,簡稱 ZCR)是在每個音框中,音訊通過零點的次數,具有下列特性:

To avoid DC bias, usually we need to perform mean subtraction on each frame. Here is an straightforward example of ZCR:

Example 1: zcr01.mwaveFile='csNthu.wav'; frameSize=256; overlap=0; au=myAudioRead(waveFile); y=au.signal; fs=au.fs; frameMat=enframe(y, frameSize, overlap); frameNum=size(frameMat, 2); for i=1:frameNum frameMat(:,i)=frameMat(:,i)-mean(frameMat(:,i)); % mean justification end zcr=sum(frameMat(1:end-1, :).*frameMat(2:end, :)<0); sampleTime=(1:length(y))/fs; frameTime=((0:frameNum-1)*(frameSize-overlap)+0.5*frameSize)/fs; subplot(2,1,1); plot(sampleTime, y); ylabel('Amplitude'); title(waveFile); subplot(2,1,2); plot(frameTime, zcr, '.-'); xlabel('Time (sec)'); ylabel('Count'); title('ZCR');

We can use the function "frame2zcr" to simplify the above example:

Example 2: zcr02.mwaveFile='csNthu.wav'; frameSize=256; overlap=0; au=myAudioRead(waveFile); y=au.signal; fs=au.fs; frameMat=enframe(y, frameSize, overlap); frameNum=size(frameMat, 2); zcr=frame2zcr(frameMat); sampleTime=(1:length(y))/fs; frameTime=frame2sampleIndex(1:frameNum, frameSize, overlap)/fs; subplot(2,1,1); plot(sampleTime, y); ylabel('Amplitude'); title(waveFile); subplot(2,1,2); plot(frameTime, zcr, '.-'); xlabel('Time (sec)'); ylabel('Count'); title('ZCR');

In the above example, methods 1 and 2 return similar ZCR curves. In order to use ZCR to distinguish unvoiced sounds from environmental noise, we can shift the waveform before computing ZCR. This is particular useful is the noise is not too big. Example follows:

Example 3: zcrWithShift.mwaveFile='csNthu.wav'; frameSize=256; overlap=0; au=myAudioRead(waveFile); y=au.signal; fs=au.fs; frameMat=enframe(y, frameSize, overlap); frameNum=size(frameMat,2); volume=frame2volume(frameMat); [minVolume, index]=min(volume); shiftAmount=2*max(abs(frameMat(:,index))); % shiftAmount is equal to twice the max. abs. sample value within the frame of min. volume method=1; zcr1=frame2zcr(frameMat, method); zcr2=frame2zcr(frameMat, method, shiftAmount); sampleTime=(1:length(y))/fs; frameTime=frame2sampleIndex(1:frameNum, frameSize, overlap)/fs; subplot(2,1,1); plot(sampleTime, y); ylabel('Amplitude'); title(waveFile); subplot(2,1,2); plot(frameTime, zcr1, '.-', frameTime, zcr2, '.-'); xlabel('Time (sec)'); ylabel('Count'); title('ZCR'); legend('ZCR without shift', 'ZCR with shift');

In this example, the shift amount is equal to twice the maximal absolute sample value within the frame of the minimum volume. Therefore the ZCR of the silence is reduced drastically, making it easier to tell unvoiced sounds from silence ones using ZCR.

Moreover, we should be aware of the following facts:

  1. If a sample is exactly located at zero, should we count it as zero crossing? Depending on the answer to this question, we have two methods for ZCR implementation.
  2. Most ZCR computation is based on integer values of audio signals. If we want to do mean subtraction, the mean value should be rounded to the nearest integer too.

一般而言,在計算過零率時,需注意下列事項:

  1. 由於有些訊號若恰好位於零點,此時過零率的計算就有兩種,出現的效果也會不同。因此必須多加觀察,才能選用最好的作法。
  2. 大部分都是使用音訊的原始整數值來進行,才不會因為使用浮點數訊號,在減去直流偏移(DC Bias)時,造成過零率的增加。
In the following, we use the above-mentioned two methods for ZCR computation of the wave file csNthu8b.wav:

在以下範例中,我們使用兩種不同的方法來計算過零率:

Example 4: zcrOn8bit.mwaveFile='csNthu8b.wav'; frameSize=256; overlap=0; au=myAudioRead(waveFile); y=au.signal; fs=au.fs; nbits=au.nbits; y=y*2^nbits/2; frameMat=enframe(y, frameSize, overlap); frameNum=size(frameMat, 2); for i=1:frameNum frameMat(:,i)=frameMat(:,i)-round(mean(frameMat(:,i))); % Zero justification end zcr1=sum(frameMat(1:end-1, :).*frameMat(2:end, :)<0); % Method 1 zcr2=sum(frameMat(1:end-1, :).*frameMat(2:end, :)<=0); % Method 2 sampleTime=(1:length(y))/fs; frameNum=size(frameMat, 2); frameTime=((0:frameNum-1)*(frameSize-overlap)+0.5*frameSize)/fs; subplot(2,1,1); plot(sampleTime, y); ylabel(waveFile); subplot(2,1,2); plot(frameTime, zcr1, '.-', frameTime, zcr2, '.-'); title('ZCR'); xlabel('Time (sec)'); legend('Method 1', 'Method 2');

From the above example, it is obvious that these two methods generate different ZCR curves. The first method does not count "zero sitting" as "zero crossing", there the corresponding ZCR values are smaller. Moreover, silence is likely to have low ZCR of method 1 and high ZCR for method 2 since there are likely to have many "zero sitting" in the silence region. However, this observation is only true for low sample rate (8 KHz in this case). For the same wave file with 16 KHz (csNthu.wav), the result is shown next:

在上述的範例中,我們使用了兩種方式來計算過零率,得到的效果雖然不同,但趨勢是一致的。(另外有一種情況,當錄音環境很安靜時,靜音的訊號值都在零點或零點附近附近跳動時,此時是否計算位於零點的過零率,就會造成很大的差別。)如果取樣頻率提高,得到的結果也會不同:

Example 5: zcrOn16bit.mwaveFile='csNthu.wav'; frameSize=256; overlap=0; au=myAudioRead(waveFile); y=au.signal; fs=au.fs; nbits=au.nbits; y=y*2^nbits/2; frameMat=enframe(y, frameSize, overlap); frameNum=size(frameMat, 2); for i=1:frameNum frameMat(:,i)=frameMat(:,i)-round(mean(frameMat(:,i))); % Zero justification end zcr1=sum(frameMat(1:end-1, :).*frameMat(2:end, :)<0); % Method 1 zcr2=sum(frameMat(1:end-1, :).*frameMat(2:end, :)<=0); % Method 2 sampleTime=(1:length(y))/fs; frameTime=((0:frameNum-1)*(frameSize-overlap)+0.5*frameSize)/fs; subplot(2,1,1); plot(sampleTime, y); ylabel(waveFile); subplot(2,1,2); plot(frameTime, zcr1, '.-', frameTime, zcr2, '.-'); title('ZCR'); xlabel('Time (sec)'); legend('Method 1', 'Method 2');

If we want to detect the meaningful voice activity of a stream of audio signals, we need to perform end-point detection or speech detection. The most straightforward method for end-point detection is based on volume and ZCR. Please refer to the next chapter for more information.

若要偵測聲音的開始和結束,通常稱為「端點偵測」(Endpoint Detection)或「語音偵測」(Speech Detection),最簡單的方法就是使用音量和過零率來判別,相關細節會在後續章節說明。


Audio Signal Processing and Recognition (音訊處理與辨識)