5-3 Zero Crossing Rate (Lsv)

[chinese][english]
Zero-crossing rate (ZCR) is another basic acoustic feature that can be computed easily. It is equal to the number of zero-crossing of the waveform within a given frame. ZCR has the following characteristics:

ZCR of unvoiced sounds and environmental noise are usually larger than voiced sounds, which has observable fundamental periods.
It is hard to distinguish unvoiced sounds from environmental noise by using ZCR alone since they have similar ZCR values.
ZCR is often used in conjunction with energy (or volume) for end-point detection. In particular, ZCR is used for detecting the start and end positions of unvoiced sounds.
Some people use ZCR for rough fundamental frequency estimation, but it is highly unreliable unless further refine procedure is taken for post-processing.

「過零率」（Zero Crossing Rate，簡稱 ZCR）是在每個音框中，音訊通過零點的次數，具有下列特性：

一般而言，雜訊及氣音的過零率均大於有聲音（具有清晰可辨之音高，例如母音）。
是雜訊和氣音兩者較難從過零率來分辨，會依照錄音情況及環境雜訊而互有高低。但通常氣音的音量會大於雜訊。
通常用在端點偵測，特別是用在估測氣音的啟始位置及結束位置。
可用來預估訊號的基頻，但很容易出錯，所以必須先進行前處理。
To avoid DC bias, usually we need to perform mean subtraction on each frame. Here is an straightforward example of ZCR:
Example 1: zcr01.m

We can use the function "frame2zcr" to simplify the above example:
Example 2: zcr02.m

In the above example, methods 1 and 2 return similar ZCR curves. In order to use ZCR to distinguish unvoiced sounds from environmental noise, we can shift the waveform before computing ZCR. This is particular useful is the noise is not too big. Example follows:
Example 3: zcrWithShift.m

In this example, the shift amount is equal to twice the maximal absolute sample value within the frame of the minimum volume. Therefore the ZCR of the silence is reduced drastically, making it easier to tell unvoiced sounds from silence ones using ZCR.
Moreover, we should be aware of the following facts:

If a sample is exactly located at zero, should we count it as zero crossing? Depending on the answer to this question, we have two methods for ZCR implementation.
Most ZCR computation is based on integer values of audio signals. If we want to do mean subtraction, the mean value should be rounded to the nearest integer too.

一般而言，在計算過零率時，需注意下列事項：

由於有些訊號若恰好位於零點，此時過零率的計算就有兩種，出現的效果也會不同。因此必須多加觀察，才能選用最好的作法。
大部分都是使用音訊的原始整數值來進行，才不會因為使用浮點數訊號，在減去直流偏移（DC Bias）時，造成過零率的增加。
In the following, we use the above-mentioned two methods for ZCR computation of the wave file csNthu8b.wav:
在以下範例中，我們使用兩種不同的方法來計算過零率：
Example 4: zcrOn8bit.m

From the above example, it is obvious that these two methods generate different ZCR curves. The first method does not count "zero sitting" as "zero crossing", there the corresponding ZCR values are smaller. Moreover, silence is likely to have low ZCR of method 1 and high ZCR for method 2 since there are likely to have many "zero sitting" in the silence region. However, this observation is only true for low sample rate (8 KHz in this case). For the same wave file with 16 KHz (csNthu.wav), the result is shown next:
在上述的範例中，我們使用了兩種方式來計算過零率，得到的效果雖然不同，但趨勢是一致的。（另外有一種情況，當錄音環境很安靜時，靜音的訊號值都在零點或零點附近附近跳動時，此時是否計算位於零點的過零率，就會造成很大的差別。）如果取樣頻率提高，得到的結果也會不同：
Example 5: zcrOn16bit.m

If we want to detect the meaningful voice activity of a stream of audio signals, we need to perform end-point detection or speech detection. The most straightforward method for end-point detection is based on volume and ZCR. Please refer to the next chapter for more information.
若要偵測聲音的開始和結束，通常稱為「端點偵測」（Endpoint Detection）或「語音偵測」（Speech Detection），最簡單的方法就是使用音量和過零率來判別，相關細節會在後續章節說明。
Audio Signal Processing and Recognition (音訊處理與辨識)