Tutorial on Anomaly Sound Detection (by Roger Jang)

This tutorial covers the basics of using HMM (Hidden Markov Models) for anomaly sound detection (ASD) for a pounding machine. The anomaly sounds were collected when there is no object to be pounded under the machine. Here are the videos to show the situation when the audio data was collected.

Preprocessing
Dataset collection and feature extraction
Data analysis and visualization
Model training and test using HMM
Summary
Appendix

Preprocessing

Before we start, let's add necessary toolboxes to the search path of MATLAB:

addpath d:/users/jang/matlab/toolbox/utility
addpath d:/users/jang/matlab/toolbox/sap
addpath d:/users/jang/matlab/toolbox/machineLearning

All the above toolboxes can be downloaded from the author's toolbox page. Make sure you are using the latest toolboxes to work with this script.

For compatibility, here we list the platform and MATLAB version that we used to run this script:

fprintf('Platform: %s\n', computer);
fprintf('MATLAB version: %s\n', version);
fprintf('Date & time: %s\n', char(datetime));
scriptStartTime=tic;	% Timing for the whole script

Platform: PCWIN64
MATLAB version: 9.6.0.1214997 (R2019a) Update 6
Date & time: 20-Jan-2020 00:25:22

Most of the modifiable options for ASD are set in asdOptSet.m:

type asdOptSet

function asdOpt=asdOptSet
% vdOptSet: Set options for ASD (anomaly sound detection)
%
%	Usage:
%		asdOpt=asdOptSet;
%
%	Description:
%		asdOpt=asdOptSet returns the default options for ASD.
%
%	Example:
%		asdOpt=asdOptSet

%	Category: Options for anomaly sound detection
%	Roger Jang, 20191213

%% === Function for feature extraction and plotting
asdOpt.feaType='mfcc';		% 'MFCC' or 'spectrum';
asdOpt.feaExtractFcn=@asdFeaExtractFromFile;
asdOpt.hmmPlotFcn=@asdHmmPlot;
%% === Folder for wave files
asdOpt.audioDir='D:\users\jang\app\anomalySoundDetect-iii\anomalySoundDetect\dataset';
%% === Parameters for ASD
asdOpt.frameDuration=32;	% ms
asdOpt.overlapDuration=16;		% Frame step/shift
asdOpt.outputName={'silence', 'normal', 'abnormal'};
asdOpt.gaussianNum=3;	% No. of Gaussians for each class

If you want to run this script, you need to change asOpt.audioDir such that it points to a folder of sound files containing audio files with normal or abnormal pounding sounds. The dataset used in the script can be downloaded from this link.

Dataset collection and feature extraction

First of all, we can collect all the sound files once they are downloaded. We can use the commmand "mmDataCollect" to collect all the file information:

asdOpt=asdOptSet;
opt=mmDataCollect('defaultOpt');
opt.extName='wav';
auSet=mmDataCollect(asdOpt.audioDir, opt, 1);

Collecting 2 files with extension "wav" from "D:\users\jang\app\anomalySoundDetect-iii\anomalySoundDetect\dataset"...

For each audio file, we assume its pounding sounds have been labeled as cues by using Cooledit. Then we can perform frame-based feature extraction and obtain the groundtruth for each frame. This is achieved by the function "asdFeaExtractFromFile.m", as shown by the following example:

auFile='dataSet/abnormal-cue.wav';
au=myAudioRead(auFile);
au.signal=au.signal(:,1);	% Use a single channle for feature extraction
asdOpt=asdOptSet;
asdOpt.feaType='mfcc';
fea=asdFeaExtractFromFile(au, asdOpt, 1);

Now we can read audio contents from all audio files. We need to do so since we want to use channel 1 as the training set and channel 2 as the test set.

for i=1:length(auSet)
	au=myAudioRead(auSet(i).path);
%	auSet(i)=structCopy(auSet(i), au);
	auSet(i).signal=au.signal;
	auSet(i).fs=au.fs;
	auSet(i).nbits=au.nbits;
	auSet(i).file=au.file;
end

Create the training set from channel 1:

auSetTrain=auSet;
for i=1:length(auSetTrain)
	auSetTrain(i).signal=auSetTrain(i).signal(:,1);
end

Create the test set from channel 2:

auSetTest=auSet;
for i=1:length(auSetTest)
	auSetTest(i).signal=auSetTest(i).signal(:,2);
end

Now we can perform feature extraction and attach the features to auSet:

myTic=tic;
fprintf('Feature extraction from auSetTrain:\n');
auSetTrain=auSetFeaExtract(auSetTrain, asdOpt, 1);
fprintf('Feature extraction from auSetTest:\n');
auSetTest=auSetFeaExtract(auSetTest, asdOpt, 1);
fprintf('Saving auSetTrain & auSetTest to asdAuSet.mat...\n');
save asdAuSet auSetTrain auSetTest
fprintf('time=%g sec\n', toc(myTic));

Feature extraction from auSetTrain:
1/2: file=D:\users\jang\app\anomalySoundDetect-iii\anomalySoundDetect\dataset/abnormal-cue.wav, duration=11.3546 sec, time=0.862437 sec

2/2: file=D:\users\jang\app\anomalySoundDetect-iii\anomalySoundDetect\dataset/normal-cue.wav, duration=114.335 sec, time=7.8572 sec

Total time=16.4878 sec
Feature extraction from auSetTest:
1/2: file=D:\users\jang\app\anomalySoundDetect-iii\anomalySoundDetect\dataset/abnormal-cue.wav, duration=11.3546 sec, time=0.803897 sec

2/2: file=D:\users\jang\app\anomalySoundDetect-iii\anomalySoundDetect\dataset/normal-cue.wav, duration=114.335 sec, time=7.89879 sec

Total time=16.494 sec
Saving auSetTrain & auSetTest to asdAuSet.mat...
time=36.5891 sec

We can also create a variable DS for all kinds of data visualization and static classification:

feature=[auSetTrain.feature];
output=[auSetTrain.tOutput];
ds.input=feature;
ds.output=output;
%ds.inputName=asdOpt.featureName;
ds.outputName=asdOpt.outputName;
ds2=ds; ds2.input=inputNormalize(ds2.input);	% input normalization
fprintf('ds and ds2 created.\n');

ds and ds2 created.

Data analysis and visualization

We can display data count for each class:

figure; [classSize, classLabel]=dsClassSize(ds, 1);

39 features
7848 instances
3 classes

We can plot feature distribution among different classes:

figure; dsBoxPlot(ds);

We can also plot all features in each class with the same scale:

figure; dsFeaVecPlot(ds);

Model training and test using HMM

Using the collected auSet, we can start HMM training for ASD:

fprintf('Start HMM training...\n');
figure;
myTic=tic;
asdHmmModel=hmmTrain4audio(auSetTrain, asdOpt, 1);
fprintf('time=%g sec\n', toc(myTic));
fprintf('Saving asdHmmModel...\n');
save asdHmmModel asdHmmModel	% Obtain asdHmmModel

Start HMM training...
time=1.58787 sec
Saving asdHmmModel...

Now we can perform inside test us the training set:

fprintf('HMM inside test using channel 1:\n');
for i=1:length(auSetTrain)
	fprintf('%d/%d: file=%s\n', i, length(auSetTrain), auSetTrain(i).file);
	myTic=tic;
	figure; au=hmmEval4audio(auSetTrain(i), asdOpt, asdHmmModel, 1);
	fprintf('\tTime=%g sec, accuracy=%g%%\n', toc(myTic), au.rr*100);
	snapnow;
end

HMM inside test using channel 1:
1/2: file=D:\users\jang\app\anomalySoundDetect-iii\anomalySoundDetect\dataset/abnormal-cue.wav

	Time=4.52382 sec, accuracy=98.1638%

2/2: file=D:\users\jang\app\anomalySoundDetect-iii\anomalySoundDetect\dataset/normal-cue.wav

	Time=24.4576 sec, accuracy=96.3445%

And the outside test using the test set:

fprintf('HMM outside test using channel 2:\n');
for i=1:length(auSetTest)
	fprintf('%d/%d: file=%s\n', i, length(auSetTest), auSetTest(i).file);
	myTic=tic;
	figure; au=hmmEval4audio(auSetTest(i), asdOpt, asdHmmModel, 1);
	fprintf('\tTime=%g sec, accuracy=%g%%\n', toc(myTic), au.rr*100);
	snapnow;
end

HMM outside test using channel 2:
1/2: file=D:\users\jang\app\anomalySoundDetect-iii\anomalySoundDetect\dataset/abnormal-cue.wav

	Time=3.87035 sec, accuracy=98.1638%

2/2: file=D:\users\jang\app\anomalySoundDetect-iii\anomalySoundDetect\dataset/normal-cue.wav

	Time=25.2628 sec, accuracy=96.3445%

Summary

This is a brief tutorial on using HMM for ASD. As shown in the above examples, HMM can achieve a decent result on the training dataset. In particular, most of the errors occurs at boundaries with possibly ambiguous human labeling. As a result, the accuracy should be even higher. However, since we do not have extra data for test, the capability of HMM cannot be evaluated objectively.

There are several directions for further improvement:

Acquire more audio files for objective evaluation of all classifiers.
Investigate new features for ASD.
Change the configuration of the GMM used in HMM.
Use other classifiers for ASD.

Appendix

List of functions, scripts, and datasets used in this script:

Date and time when finishing this script:

fprintf('Date & time: %s\n', char(datetime));

Date & time: 20-Jan-2020 00:27:18

Overall elapsed time:

toc(scriptStartTime)

Elapsed time is 116.175418 seconds.

Jyh-Shing Roger Jang, created on

datetime

ans = 

  datetime

   20-Jan-2020 00:27:18

If you are interested in the original MATLAB code for this page, you can type "grabcode(URL)" under MATLAB, where URL is the web address of this page.