- HTKBooks, 
 - 苏统华, 哈尔滨工业大学人工智能研究室, 2006年10月30日, 
 - Howard Hung-Ju Chou, Intelligence Information Retrieval Lab., NCKU, Taiwan(R.O.C.).
 
Environment: 
- HTK 3.4 
 - Cygwin NT-5.1 1.5.25
 
Section 1 is Data Preparation - 資料準備
- Step 1 - the Task Grammer 辨識模型要用的文法資料(gram->wdnet)
 
The tutorial is the task that building a recognizer for phone call.
此練習是建立一個可辨識電話撥打語音指令的模型。
Typical example is 
一般語音指令如下,
- Dial 3323654
 - Dial 9045109
 - Phone Woodland
 - Call Steve Young
 
So we can build a model of grammar for voice dialling according to the human speaking rules.
所以我們可以依據我們說話的模式去建立文法模型。
For example, we won't speak "9045108 call" or "199 Steve Young".
So we can make a grammar graph like the Fig. 3.1 in HTKBook.
We can define the grammar graph by simple symbols.
================================================
$digit = ONE | TWO | THREE | FOUR | FIVE |
SIX | SEVEN | EIGHT | NINE | OH | ZERO;
$name = [ SUE ] LAW |
       [ JULIAN ] 
[ DAVE ] WOOD |
[ PHIL ] LEE |
[ STEVE ] YOUNG;
( SENT-START ( DIAL <$digit> | (PHONE|CALL) $name) SENT-END )
=================================================
vertical bars, "|", are alternatives.
square brackets, "[" and "]", are optional items.
angle braces, "<" and ">", are one or more repetitions.
That's why the Fig. 3.1 look like that. 
Because of "<$digit>", the digit allow repeat once or more.
Because of the rad vertical bars in "DIAL <$digit> | (PHONE|CALL) $name", DIAL XXXX is alternative of PHONE someone or CALL someone.
Because of [ PHIL] LEE, we can say Call LEE or Call PHIL LEE.
Now , we can use HTK tool, HParse, to build a word network in HTK Standard Lattice Format (SLF).
HParse Usage - 
- Without parameters.
 
------------------------
$ HParse gram wdnet------------------------
wdnet file will be like follwoing,
======================
VERSION=1.0
N=31   L=62   
I=0    W=SENT-END            
I=1    W=YOUNG               
I=2    W=!NULL               
I=3    W=STEVE               
I=4    W=LEE                 
I=5    W=PHIL                
I=6    W=WOOD                
I=7    W=DAVE                
I=8    W=TYLER               
I=9    W=JULIAN              
I=10   W=LAW                 
I=11   W=SUE                 
I=12   W=CALL                
I=13   W=!NULL               
I=14   W=PHONE               
I=15   W=ZERO                
I=16   W=!NULL               
I=17   W=OH                  
I=18   W=NINE                
I=19   W=EIGHT               
I=20   W=SEVEN               
I=21   W=SIX                 
I=22   W=FIVE                
I=23   W=FOUR                
I=24   W=THREE               
I=25   W=TWO                 
I=26   W=ONE                 
I=27   W=DIAL                
I=28   W=SENT-START          
I=29   W=!NULL               
I=30   W=!NULL               
J=0     S=2    E=0    
J=1     S=16   E=0   
...omitted
======================
The word network contains 31 nodes and # of arc is 61.
So we have I=0~I=30, 31 nodes.
J=0 S=2 E=0, means arc ID=0, start from Node 2, !NULL, end to Node 0, SENT-END.
The is lower-level description for HTK. gram is high-level description for users.
- -l, include LM log probs in lattice.
 
---------------------------
$ HParse -l gram wdnet---------------------------
wdnet file will be like following,
Each enteries will have log probability.
======================
Omitted....
J=0     S=2    E=0    l=0.00 
J=1     S=16   E=0    l=-2.48 
Omitted....
======================
How to calculate log probability?
You can build different gram file to generate different wdnet file.
Know more Standard Lattice Format(SLF), please refer Secion 20 in HTKBook.

No comments:
Post a Comment