21-2 ASRA for English

In this chapter, we shall cover the use of ASRA for spoken English on Unix/Linux. If you want to read the document for other platforms, simply change the platform option in the URL of this page. For instance: To download ASRA package for English, try the following links: We shall cover the following items: For further technical details, please contact Roger Jang at jang@cs.nthu.edu.tw.

For simplicity, we shall use {asraRoot} to denote the root directory of the ASRA package. Here is a list of directories/files under {asraRoot}:

The first thing you can to is to enter {asraRoot}/mainProgram and check out all the main programs. These main programs can be divided into two groups of speech assessment (SA) and voice command (VC), which are two major functions provided by ASRA: To compile all the above main programs, you can simply type the following command within a terminal window:

bash goMainCompile.sh

If there is no problem with compilation, we shall have an executable for each of the above main programs. These executables can then be used for SA or VC in different scenarios. To try out the executables, we can move to {asraRoot}/script4unixEnglish to run some scripts that invoke these main programs:

After the execution of the above scripts, several output files are generated under {asraRoot}/output. You can use these files to save computation and speed up SA/RA significantly. For example: Here is a list of the input files:
  1. english.macb: Macro file containing all the HMM parameters for English acoustic models
  2. english.wpa: WPA file with the phonetic alphabets for all English words. Note that you should not modify this file since it is automatically generated from CMU dictionary available at
    http://www.speech.cs.cmu.edu/cgi-bin/cmudict
    The mapping table between CMU phonetic alphabets and the commonly used KK phonetic alphabets can be found at
    http://blog.urdada.net/2005/07/17/17/
    If you want to add new entries (which correspond to new words in your recognizable texts) to this file, modify english.wpaAddenda accordingly.
  3. english.wpaAddenda: User-defined WPA file, which can be used to hold extra listing not in english.wpa.
  4. english.qiYin: Phone of unvoiced sounds.
  5. mfcc2.cfg: Configuration file for MFCC
  6. phoneRank2scoreParam.txt: Parameters for converting phone ranking into scores.
  7. scoreDiscountParam.txt: Parameters for score discount
  8. english.tnm: Text normalization mapping. You can open this file with a text editor to see the mapping for text normalization.
You can also open the recognition parameter file using a text editor. A brief explanation of each entry in the parameter file is also given in the file.

If necessary, you can specify "outputDir" in the prm file to store the output of ASRA. Some of the output files that may help your debugging:

  1. output.xml: This is the major output file for SA/VC, which lists all the details for computing the final score of confidence measure. Some comments are interleaved into the file to make it self-explanatory.
  2. output.wpa: Minimum wpa file containing phonetic alphabets (PA) for the given txt file. Instead of using the original comprehensive wpa file, you can use this file instead to speed up the loading time. (Of course, the minimum wpa file is only good for the corredponding txt file.)
  3. output.syl: All possible PA sequences for sentences in the txt file. The first column is the PA sequences; the second column is the index (0-based) into the txt file. If some of the PA sequences are unlikely, you can simply delete them and use the update output.syl as the input to ASRA for VC. (Note that SA will not generate output.syl.)
  4. output.net: Lexicon net (same format as HTK's net file)
Other technical issues:
Audio Signal Processing and Recognition (音訊處理與辨識)