After running ASRA library for either VC or SA, the major output file is output.xml. A typical example of output.xml (after running goSaLibFile.bat, available in ASRA package for English over Windows platform) is shown next:The major items of output.xml are
The items within "confidenceMeasure" tags are:
- version: This is the version of the XML format.
- text: For VC, this is the identified text after recognition. For SA, this is the text corresponding to the utterance.
- syl: This is the phonetic alphabets corresponding to the text. For SA, this is empty since the phonetic alphabets are generated by ASRA by table lookup.
- confidenceMeasure: This is a large item that records the confidence measure of the utterance with respect to the text. We shall explain items within "confidenceMeasure" tag next.
The items within "word" tags are:
- language: The language of the utterance.
- score: The overall score between 0 and 100, representing the similarity between the text and the utterance.
- timberScore: Score of timber.
- word: The info of each word within the utterance. For English, each word represents a single word. On the other hand, for Chinese, each word represents a Chinese character. We shall explain items with "word" tag next.
The items within "phone" tags are:
- name: For English, this is the sequence of PAs of the word. For Chinese, this is the HanYuPinYin of the Chinese character.
- interval: The time interval of the word, obtained from Viterbi search.
- text: For English, this is the spelling of the word. For Chinese, this is the Chinese character in Big5 encoding.
- timberScore: This is the timber score of the word.
- pitch: The pitch of the utterance vorresponding to this word, with a pitch rate of 100 per second. If the "getPitch" option is false, this tag will contain no pitch information.
- volume: The volume of the utterance corresponding to this word, computed as the sum of abs. of sample values within a frame.
- timberScores: This is the timber scores obtained from 3 different scoring mechanisms. This is primary for development engineers only.
- phone: This is the phoneme sequence with the word, which will be explain next.
- name: The name of the phoneme.
- interval: The time interval of the phoneme, obtained from Viterbi search.
- timberScore: The timber score of the phoneme.
- pitched: A logical value indicating the phoneme is pitched or not.
- pitch: The pitch of the utterance corresponding to this phoneme.
- volume: The volume of the utterance corresponding to this phoneme.
- cumLogProb: Cumulated log likelihood of the phoneme.
- rankRatio: Rank ratio of the phoneme.
- timeRatio: Time ratio of the phoneme.
- competingModel: List of competing models of this phoneme.
- competingModelLogProb: List of log likelihoods of all the competing models.
- timberScores: Timber scores obtained from 3 different scoring mechanisms. This is primary for development engineers only.
Audio Signal Processing and Recognition (音訊處理與辨識)