textNormalize: English text normalization for pinyin Usage: words=textNormalize(text) text: a sentence to be processed for it's pinyin via dict words: cell string representing words in the text This should be synchronized with englishNormalization() within text.cpp.
0001 function words=textNormalize4english(text) 0002 % textNormalize: English text normalization for pinyin 0003 % Usage: words=textNormalize(text) 0004 % text: a sentence to be processed for it's pinyin via dict 0005 % words: cell string representing words in the text 0006 % 0007 % This should be synchronized with englishNormalization() within text.cpp. 0008 0009 % Roger Jang, 20051111 0010 0011 if nargin<1, selfdemo; return; end 0012 0013 newText=text; 0014 newText=regexprep(newText, '["?.!,_\-]', ' '); % 標點符號代換(注意:需用反斜線避掉 "-" 的特殊意義) 0015 newText=lower(newText); % 全部轉成小寫字體,以配合查字典所需 0016 newText=charCutLeadingTrailing(newText, ' '); % 砍掉 leading/trailing blanks 0017 0018 % Get rid of empty words 0019 words=split(newText, ' '); 0020 index=[]; 0021 for i=1:length(words) 0022 if isempty(words{i}) 0023 index=[index, i]; 0024 end 0025 end 0026 words(index)=[]; 0027 0028 % ====== Self demo 0029 function selfdemo 0030 text='Give me 10 bucks, OK?'; 0031 words=feval(mfilename, text); 0032 fprintf('text = "%s"\n', text); 0033 fprintf('words = "%s"\n', cell2str(words));