Home > asr > wpaExtract.m

wpaExtract

PURPOSE ^

wpaExtract: Extract a dictFile to a smaller one which contains words in textFile

SYNOPSIS ^

function newWpa=wpaExtract(text, wpa)

DESCRIPTION ^

 wpaExtract: Extract a dictFile to a smaller one which contains words in textFile 
    Usage: newWpa=wpaExtract(text, dict, dictWordList)
        text: A text whose words are to be extracted from a dict
        dict: Original dict file
        dictWordList: this is equal to {dict.word}. This is used to reduce the computation time of {dict.word} if dict is too big.
        newWpa: The content of the new dict

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function newWpa=wpaExtract(text, wpa)
0002 % wpaExtract: Extract a dictFile to a smaller one which contains words in textFile
0003 %    Usage: newWpa=wpaExtract(text, dict, dictWordList)
0004 %        text: A text whose words are to be extracted from a dict
0005 %        dict: Original dict file
0006 %        dictWordList: this is equal to {dict.word}. This is used to reduce the computation time of {dict.word} if dict is too big.
0007 %        newWpa: The content of the new dict
0008 
0009 %    Roger Jang, 20050607
0010 
0011 if nargin<1, selfdemo; return; end
0012 
0013 wpaWordList={wpa.word};
0014 words=textNormalize4english(text);
0015 
0016 foundIndex=[];
0017 missIndex=[];
0018 for i=1:length(words)
0019 %    fprintf('%d/%d\n', i, length(words));
0020     word=words{i};
0021     index=find(strcmp(word, wpaWordList));
0022     if length(index)>0
0023         foundIndex=[foundIndex, index];
0024     else
0025         fprintf('Warning: Cannot find "%s" in the given dict!\n', word);
0026         missIndex=[missIndex, i];
0027     end
0028 end
0029 
0030 if ~isempty(missIndex)
0031     logFile='missingWord.txt';
0032     fprintf('Save the missing words to %s!\n', logFile);
0033 %    delete logFile;
0034     fid=fopen(logFile, 'a');
0035     for i=1:length(missIndex)
0036         fprintf(fid, '%s\r\n', words{missIndex(i)});
0037     end
0038     fclose(fid);
0039     pause(0.2);     % 停一下,以免關檔不完全
0040 end
0041 
0042 foundIndex=unique(foundIndex);
0043 newWpa=wpa(foundIndex);
0044 
0045 % ====== Self demo
0046 function selfdemo
0047 text='what movies have you seen recently?';
0048 wpaFile='d:/users/jang/application/asr/dict/english.wpa';
0049 fprintf('Reading %s...\n', wpaFile);
0050 wpa=wpaRead(wpaFile);
0051 newWpa=wpaExtract(text, wpa);
0052 newWpaFile='test.wpa';
0053 fprintf('Writing %s...\n', newWpaFile);
0054 wpaWrite(newWpa, newWpaFile);
0055 type(newWpaFile);

Generated on Tue 01-Jun-2010 09:50:19 by m2html © 2003