Home > asr > waveAssess.m

waveAssess

PURPOSE ^

waveAssess: Wave assessment which generates an CM (confidence measure) file (and a pitch file, if necessary) from a given wave file and the corresponding text

SYNOPSIS ^

function [cmObj, dosCmd, time, exeStatus, exeResult, pitchObj0, pitchObj]=waveAssess(waveFile, text, language, plotOpt, pitchFile, cmFile, labFile, plpFile)

DESCRIPTION ^

 waveAssess: Wave assessment which generates an CM (confidence measure) file (and a pitch file, if necessary) from a given wave file and the corresponding text
    Usage:  [cmObj, dosCmd, time, exeStatus, exeResult]=waveAssess(waveFile, text, language)
        [cmObj, dosCmd, time, exeStatus, exeResult]=waveAssess(waveFile, text, language, plotOpt)
        [cmObj, dosCmd, time, exeStatus, exeResult]=waveAssess(waveFile, text, language, plotOpt, pitchFile)
        [cmObj, dosCmd, time, exeStatus, exeResult]=waveAssess(waveFile, text, language, plotOpt, pitchFile, cmFile, labFile, plpFile)
        
    Inputs:
        waveFile: input wave file (can also take .raw audio file with 16 bits, 16 kHz, little endian)
        text: input text to be aligned (or a file name with .txt suffix that contains the text)
        language: a string for language option ('english' or 'chinese'), or a structure for recog. parameters
        plotOpt: 1 for plotting the results
        pitchFile: output pitch file ([] if don't care)
        cmFile: output CM (confidence measure) file ([] if don't care)
        labFile: output label file ([] if don't care)
        plpFile: output PLP (phone lob prob) file ([] if don't care)

    Output:
        cmObj: Object of CM (confidence measure)
            cmObj.word(i).phone(j).pitch: pitch of phone j within word i
            cmObj.word(i).phone(j).volume: volume of phone j within word i
            cmObj.word(i).phone(j).pitch0: This is identified via UPDUDP, which is generally not used.
        dosCmd: The DOS command used for generating the result.
            You can copy dosCmd and run it within {asrRoot}\exe to see the result of DOS command.

    If efficiency is a concern, you should choose the simplest form:
        cmObj=waveAssess(waveFile, text, language)

    Example of English assessment:
        waveFile='what_would_you_like_to_know.wav';
        text='what would you like to know';
        language='english';
        plotOpt=1;
        pitchFile='test.pitch';
        [cmObj, dosCmd]=waveAssess(waveFile, text, language, plotOpt, pitchFile)

    Example of Chinese assessment:
        waveFile='yi_cuen_xiang_s_yi_cuen_huei.wav';
        text='一寸想思一寸灰';
        language='chinese';
        plotOpt=1;
        pitchFile='test.pitch';
        [cmObj, dosCmd]=waveAssess(waveFile, text, language, plotOpt, pitchFile)

    Example of Japanese assessment:
        waveFile='ka_ko_no_o_mo_i_de_o_hu_ri_ka_e_tte_mi_ru.wav';
        text='X';            % This is "don't care" since we don't know how to display Japanese characters
        plotOpt=1;
        rp=saParamSet('japanese');
        asrRoot=fileparts(which('waveAssess'));
        rp.sylFile=[asrRoot, '\japanese0001.syl'];    % This is required since we don't have character-to-pinyin conversion for Japanese
        [cmObj, dosCmd, time]=waveAssess(waveFile, text, rp, plotOpt)

    Note that whenever this function breaks, you should run the corresponding DOS command with {asrRoot}\exe for debugging.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [cmObj, dosCmd, time, exeStatus, exeResult, pitchObj0, pitchObj]=waveAssess(waveFile, text, language, plotOpt, pitchFile, cmFile, labFile, plpFile)
0002 % waveAssess: Wave assessment which generates an CM (confidence measure) file (and a pitch file, if necessary) from a given wave file and the corresponding text
0003 %    Usage:  [cmObj, dosCmd, time, exeStatus, exeResult]=waveAssess(waveFile, text, language)
0004 %        [cmObj, dosCmd, time, exeStatus, exeResult]=waveAssess(waveFile, text, language, plotOpt)
0005 %        [cmObj, dosCmd, time, exeStatus, exeResult]=waveAssess(waveFile, text, language, plotOpt, pitchFile)
0006 %        [cmObj, dosCmd, time, exeStatus, exeResult]=waveAssess(waveFile, text, language, plotOpt, pitchFile, cmFile, labFile, plpFile)
0007 %
0008 %    Inputs:
0009 %        waveFile: input wave file (can also take .raw audio file with 16 bits, 16 kHz, little endian)
0010 %        text: input text to be aligned (or a file name with .txt suffix that contains the text)
0011 %        language: a string for language option ('english' or 'chinese'), or a structure for recog. parameters
0012 %        plotOpt: 1 for plotting the results
0013 %        pitchFile: output pitch file ([] if don't care)
0014 %        cmFile: output CM (confidence measure) file ([] if don't care)
0015 %        labFile: output label file ([] if don't care)
0016 %        plpFile: output PLP (phone lob prob) file ([] if don't care)
0017 %
0018 %    Output:
0019 %        cmObj: Object of CM (confidence measure)
0020 %            cmObj.word(i).phone(j).pitch: pitch of phone j within word i
0021 %            cmObj.word(i).phone(j).volume: volume of phone j within word i
0022 %            cmObj.word(i).phone(j).pitch0: This is identified via UPDUDP, which is generally not used.
0023 %        dosCmd: The DOS command used for generating the result.
0024 %            You can copy dosCmd and run it within {asrRoot}\exe to see the result of DOS command.
0025 %
0026 %    If efficiency is a concern, you should choose the simplest form:
0027 %        cmObj=waveAssess(waveFile, text, language)
0028 %
0029 %    Example of English assessment:
0030 %        waveFile='what_would_you_like_to_know.wav';
0031 %        text='what would you like to know';
0032 %        language='english';
0033 %        plotOpt=1;
0034 %        pitchFile='test.pitch';
0035 %        [cmObj, dosCmd]=waveAssess(waveFile, text, language, plotOpt, pitchFile)
0036 %
0037 %    Example of Chinese assessment:
0038 %        waveFile='yi_cuen_xiang_s_yi_cuen_huei.wav';
0039 %        text='一寸想思一寸灰';
0040 %        language='chinese';
0041 %        plotOpt=1;
0042 %        pitchFile='test.pitch';
0043 %        [cmObj, dosCmd]=waveAssess(waveFile, text, language, plotOpt, pitchFile)
0044 %
0045 %    Example of Japanese assessment:
0046 %        waveFile='ka_ko_no_o_mo_i_de_o_hu_ri_ka_e_tte_mi_ru.wav';
0047 %        text='X';            % This is "don't care" since we don't know how to display Japanese characters
0048 %        plotOpt=1;
0049 %        rp=saParamSet('japanese');
0050 %        asrRoot=fileparts(which('waveAssess'));
0051 %        rp.sylFile=[asrRoot, '\japanese0001.syl'];    % This is required since we don't have character-to-pinyin conversion for Japanese
0052 %        [cmObj, dosCmd, time]=waveAssess(waveFile, text, rp, plotOpt)
0053 %
0054 %    Note that whenever this function breaks, you should run the corresponding DOS command with {asrRoot}\exe for debugging.
0055 
0056 %    Roger Jang, 20070103, 20070405, 20081112
0057 
0058 if nargin<1, selfdemo; return; end
0059 if nargin<4, plotOpt=0; end
0060 if nargin<5, pitchFile=[]; end
0061 if nargin<6, cmFile=[]; end
0062 if nargin<7, labFile=[]; end
0063 if nargin<8, plpFile=[]; end
0064 
0065 if exist(waveFile)~=2
0066     error(sprintf('Cannot find %s within %s!\n', waveFile, mfilename));
0067 end
0068 
0069 % ====== If the extension is raw or flv, save it as a wave file
0070 [a, b, waveFileExtName, d]=fileparts(waveFile);
0071 switch lower(waveFileExtName)
0072     case '.raw'    % waveFile is with "raw" extension
0073         fs=16000; nbits=16;    % By default
0074         wave=rawRead(waveFile, nbits);
0075         waveFile=[tempname, '.wav'];
0076         wavwrite(wave/32768, fs, nbits, waveFile);
0077     case {'.wav', '.flv', '.wa1'}
0078         % Do nothing
0079     otherwise
0080         error(sprintf('Unknown audio file extension = %s!\n', waveFileExtName));
0081 end
0082 
0083 % ====== Create rp if necessary
0084 if isstr(language)    % language='chinese' or 'chinese'
0085     rp=saParamSet(language);
0086 else
0087     rp=language;
0088 end
0089 
0090 % ===== Create text file if necessary
0091 [parentDir, mainName, textFileExtName, junk]=fileparts(text);
0092 if strcmp(lower(textFileExtName), '.txt')        % text is actually the text file to be aligned
0093     textFile=text;
0094 else
0095     textFile=[tempname, '.txt'];
0096     if strcmp(rp.language, 'chinese')
0097         text=text(isTradChinese(text));        % Get rid of 半形
0098     end
0099     fid=fopen(textFile, 'w'); fprintf(fid, '%s\n', text); fclose(fid);
0100     tempTxtFile=textFile;        % For later deletion
0101 end
0102 
0103 if ~isempty(pitchFile) & ~isnan(pitchFile)
0104     rp.getPitch=1;
0105 end
0106 
0107 % ====== Find the path to the executables
0108 [parentDir, junk, junk, junk]=fileparts(which(mfilename));
0109 exeDir=[parentDir, '\exe'];
0110 
0111 % ====== Execute the executable
0112 if ~isAbsPath(waveFile)
0113     waveFile=which(waveFile);    % Convert into an absolute path
0114 end
0115 dosCmd=sprintf('"assess.exe" "%s" "%s" "%s" %d "%s" "%s" "%s" "%s" %d', rp.file, waveFile, textFile, rp.useEpd, rp.outputDir, rp.sylFile, rp.netFile, rp.wpaFile, rp.getPitch);
0116 
0117 currDir=pwd;
0118 cd(exeDir);
0119 cd('output'); delete('*.*');        % Delete previous output files
0120 cd(exeDir);
0121 %fprintf('dosCmd=%s\n', dosCmd); keyboard
0122 t0=clock; [exeStatus, exeResult]=dos(dosCmd); time=etime(clock, t0);
0123 cd(currDir);
0124 
0125 debug=0;
0126 if isempty(findstr('Done!', exeResult))
0127     debug=1;
0128     fprintf('dosCmd=%s\n', dosCmd);
0129     fprintf('exeStatus=%d\n', exeStatus);
0130     fprintf('exeResult=%s\n', exeResult);
0131     fprintf('Something went wrong!\nYou can copy the dosCmd and run it under DOS window within {asrToolbox}\\exe directory to see the results for debugging.\n');
0132     cmObj=[];
0133     pitchObj0=[];
0134     pitchObj=[];
0135     return;
0136 end
0137 
0138 % ====== Read cm.xml directly!
0139 xmlFile=[exeDir, '\output\output.xml'];
0140 output=asraOutputXmlRead(xmlFile);
0141 cmObj=output.confidenceMeasure;
0142 
0143 %{
0144 % ====== Create the CM file is necessary
0145 origCmFile=[exeDir, '\output\phone.cm'];
0146 if ~exist(origCmFile)
0147     msg=sprintf('Cannot find %s!\n', origCmFile);
0148     error(msg);
0149 end
0150 cmObj=cmRead(origCmFile);
0151 if ~(isempty(cmFile) | isnan(cmFile))
0152 %    copyfile(origCmFile, cmFile);
0153     myCopyFile(origCmFile, cmFile);
0154 else
0155     cmFile=origCmFile;
0156 end
0157 
0158 % ====== Create the LAB file if necessary
0159 origLabFile=[exeDir, '\output\phone.lab'];
0160 if ~exist(origLabFile)
0161     msg=sprintf('Cannot find %s!\n', origLabFile);
0162     error(msg);
0163 end
0164 if ~(isempty(labFile) | isnan(labFile))
0165 %    copyfile(origLabFile, labFile);
0166     myCopyFile(origLabFile, labFile);
0167 else
0168     labFile=origLabFile;
0169 end
0170 
0171 % ====== Create the plp file if necessary
0172 origPlpFile=[exeDir, '\output\phone.plp'];
0173 if ~exist(origPlpFile)
0174     msg=sprintf('Cannot find %s!\n', origPlpFile);
0175     error(msg);
0176 end
0177 if ~(isempty(plpFile) | isnan(plpFile))
0178 %    copyfile(origPlpFile, plpFile);
0179     myCopyFile(origPlpFile, plpFile);
0180 else
0181     plpFile=origPlpFile;
0182 end
0183 
0184 % ====== Create the pitch file if necessary
0185 if ~(isempty(pitchFile) | isnan(pitchFile))
0186     origPitchFile=[exeDir, '\output\pitch.txt'];
0187     if ~exist(origPitchFile)
0188         msg=sprintf('Cannot find %s!\n', origPitchFile);
0189         error(msg);
0190     end
0191 %    copyfile(origPitchFile, pitchFile);
0192     myCopyFile(origPitchFile, pitchFile);
0193 end
0194 
0195 % ====== 讀出最後分數
0196 fid=fopen([exeDir, '\output\score.txt'], 'r'); score=fscanf(fid, '%f'); fclose(fid);
0197 
0198 % ====== Put pitch info into cmObj
0199 if rp.getPitch
0200     qiYin=textread(rp.qiYinFile, '%s', 'delimiter', '\n', 'whitespace', '');
0201     pitch0=asciiRead(pitchFile);    % This is the unbroken pitch by UPDUDP
0202 %    fprintf('Length of pitch0 = %d\n', length(pitch0));
0203 %    fprintf('Last pos = %d\n', cmObj.word(end).phone(end).interval(2)*100);
0204     % 延長 pitch,以便使其點數等於 ASR 的 frame num
0205     extendedPointNum=cmObj.word(end).phone(end).interval(2)*100-length(pitch0);
0206     pitch0=[pitch0, pitch0(end)*ones(1, extendedPointNum)];
0207     for i=1:length(cmObj.word)
0208         wordName=cmObj.word(i).name;
0209         volume=cmObj.word(i).volume;
0210         wordStartIndex=cmObj.word(i).phone(1).interval(1)*100+1;
0211         for j=1:length(cmObj.word(i).phone)
0212             startIndex=cmObj.word(i).phone(j).interval(1)*100+1;
0213             endIndex=cmObj.word(i).phone(j).interval(2)*100;
0214             cmObj.word(i).phone(j).pitch0=pitch0(startIndex:endIndex);    % Pitch by UPDUDP
0215             cmObj.word(i).phone(j).pitch=pitch0(startIndex:endIndex);    % Pitch via forced alignment
0216             cmObj.word(i).phone(j).volume=volume((startIndex:endIndex)-wordStartIndex+1);
0217             phoneName=cmObj.word(i).phone(j).name;
0218             plusIndex=findstr(phoneName, '+');
0219             monoPhoneName=phoneName;
0220             if length(plusIndex)==1
0221                 monoPhoneName=phoneName(1:plusIndex-1);
0222             end
0223             index=find(strcmp(qiYin, monoPhoneName));
0224             if ~isempty(index)    % 氣音 ===> pitch=0
0225                 cmObj.word(i).phone(j).pitch=0*cmObj.word(i).phone(j).pitch;
0226             end
0227         end
0228         % Combine phone pitch to have word pitch
0229         cmObj.word(i).pitch0=[cmObj.word(i).phone.pitch0];
0230         cmObj.word(i).pitch=[cmObj.word(i).phone.pitch];
0231     end
0232 end
0233 %}
0234 
0235 % ====== Plotting if necessary
0236 tempWaveFile='';
0237 if plotOpt
0238     % === Convert flv to wav (for plotting only)
0239     if strcmp(lower(waveFileExtName), '.flv') 
0240         tempWaveFile=[tempname, '.wav'];
0241         flv2wav(waveFile, tempWaveFile);    % FLV to WAV conversion
0242         waveFile=tempWaveFile;
0243     end
0244     % === Resample the wave file (for plotting only)
0245     tempWaveFile=[tempname, '.wav'];    % For converting to 16K, 16Bits
0246     wave2wave(waveFile, tempWaveFile, 16000, 16);    % Convert to 16KHz, 16Bits
0247     if ~exist(tempWaveFile)
0248         error(sprintf('Cannot find %s within %s!\n', tempWaveFile, mfilename));
0249     end
0250     % === Plotting
0251     if ~rp.getPitch
0252         waveCmPlot(tempWaveFile, xmlFile);
0253     else
0254         [y, fs, nbits]=wavread(tempWaveFile);
0255         frameSize=640; overlap=480;                % These should be the same as those used in C program
0256         pitchObj0.frameRate=fs/(frameSize-overlap);
0257         pitchObj0.signal=[cmObj.word.pitch];
0258         cmObj2=cmObj;
0259         for i=1:length(cmObj2.word)
0260             for j=1:length(cmObj2.word(i).phone)
0261                 if ~cmObj2.word(i).phone(j).pitched
0262                     cmObj2.word(i).phone(j).pitch=0*cmObj2.word(i).phone(j).pitch;
0263                 end
0264             end
0265             cmObj2.word(i).pitch=[cmObj2.word(i).phone.pitch];
0266         end
0267         pitchObj=pitchObj0;
0268         pitchObj.signal=[cmObj2.word.pitch];
0269         waveCmPitchPlot(tempWaveFile, xmlFile, pitchObj0, pitchObj);
0270         legend('Pitch1: unbroken', 'Pitch2: segmented');
0271     end
0272 end
0273 
0274 % ====== 刪除暫存檔案(若不刪除,temp目錄下太多檔案,會降低執行速度)
0275 if ~debug
0276     if exist(tempWaveFile)==2, delete(tempWaveFile); end
0277     if exist(tempTxtFile)==2, delete(tempTxtFile); end
0278 end
0279 
0280 % ====== selfdemo
0281 function selfdemo
0282 
0283 % === Example of English assessment
0284 waveFile='what_would_you_like_to_know.wav';
0285 text='what would you like to know';
0286 language='english';
0287 plotOpt=1;
0288 pitchFile='test.pitch';
0289 cmObj=waveAssess(waveFile, text, language, plotOpt, pitchFile)
0290 return
0291 
0292 % === Example of Japanese assessment
0293 waveFile='ka_ko_no_o_mo_i_de_o_hu_ri_ka_e_tte_mi_ru.wav';
0294 text='XXXXXXX';        % This is "don't care"
0295 [parentDir, junk, junk, junk]=fileparts(which('waveAssess'));
0296 exeDir=[parentDir, '\exe'];
0297 rp.file=[exeDir, '\japanese.sa.prm'];
0298 rp.qiYinFile=[exeDir, '\asraData\japanese\japanese.qiYin'];
0299 rp.language='japanese';
0300 rp.useEpd=0;
0301 rp.outputDir='output';
0302 rp.sylFile=[parentDir, '\japanese0001.syl'];
0303 rp.netFile='';
0304 rp.wpaFile='';
0305 rp.getPitch=1;
0306 plotOpt=1;
0307 pitchFile='test.pitch';
0308 cmObj=waveAssess(waveFile, text, rp, plotOpt, pitchFile)
0309 return
0310 
0311 % === Example of Chinese assessment
0312 waveFile='yi_cuen_xiang_s_yi_cuen_huei.wav';
0313 text='一寸想思一寸灰';
0314 language='chinese';
0315 plotOpt=1;
0316 pitchFile='test.pitch';
0317 cmObj=waveAssess(waveFile, text, language, plotOpt, pitchFile)

Generated on Tue 01-Jun-2010 09:50:19 by m2html © 2003