Home > asr > textNormalize4english.m

textNormalize4english

PURPOSE ^

textNormalize: English text normalization for pinyin

SYNOPSIS ^

function words=textNormalize4english(text)

DESCRIPTION ^

 textNormalize: English text normalization for pinyin
    Usage: words=textNormalize(text)
        text: a sentence to be processed for it's pinyin via dict
        words: cell string representing words in the text

    This should be synchronized with englishNormalization() within text.cpp.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function words=textNormalize4english(text)
0002 % textNormalize: English text normalization for pinyin
0003 %    Usage: words=textNormalize(text)
0004 %        text: a sentence to be processed for it's pinyin via dict
0005 %        words: cell string representing words in the text
0006 %
0007 %    This should be synchronized with englishNormalization() within text.cpp.
0008 
0009 %    Roger Jang, 20051111
0010 
0011 if nargin<1, selfdemo; return; end
0012 
0013 newText=text;
0014 newText=regexprep(newText, '["?.!,_\-]', ' ');    % 標點符號代換(注意:需用反斜線避掉 "-" 的特殊意義)
0015 newText=lower(newText);                % 全部轉成小寫字體,以配合查字典所需
0016 newText=charCutLeadingTrailing(newText, ' ');            % 砍掉 leading/trailing blanks
0017 
0018 % Get rid of empty words
0019 words=split(newText, ' ');
0020 index=[];
0021 for i=1:length(words)
0022     if isempty(words{i})
0023         index=[index, i];
0024     end
0025 end
0026 words(index)=[];
0027 
0028 % ====== Self demo
0029 function selfdemo
0030 text='Give me 10 bucks, OK?';
0031 words=feval(mfilename, text);
0032 fprintf('text = "%s"\n', text);
0033 fprintf('words = "%s"\n', cell2str(words));

Generated on Tue 01-Jun-2010 09:50:19 by m2html © 2003