Friday, December 12, 2008

HTK Chapter 3 - Section 1 - Step 1

Below paragraphs are belong to 
  • HTKBooks, 
  • 苏统华, 哈尔滨工业大学人工智能研究室, 2006年10月30日, 
  • Howard Hung-Ju Chou, Intelligence Information Retrieval Lab., NCKU, Taiwan(R.O.C.).

Environment: 
  • HTK 3.4 
  • Cygwin NT-5.1 1.5.25
Section 1 is Data Preparation - 資料準備
  • Step 1 - the Task Grammer 辨識模型要用的文法資料(gram->wdnet)
The tutorial is the task that building a recognizer for phone call.
此練習是建立一個可辨識電話撥打語音指令的模型。
Typical example is 
一般語音指令如下,
  • Dial 3323654
  • Dial 9045109
  • Phone Woodland
  • Call Steve Young
So we can build a model of grammar for voice dialling according to the human speaking rules.
所以我們可以依據我們說話的模式去建立文法模型。

For example, we won't speak "9045108 call" or "199 Steve Young".
So we can make a grammar graph like the Fig. 3.1 in HTKBook.
We can define the grammar graph by simple symbols.
================================================

$digit = ONE | TWO | THREE | FOUR | FIVE |

       SIX | SEVEN | EIGHT | NINE | OH | ZERO;

$name = [ SUE ] LAW |

       [ JULIAN ] TYLER |

       [ DAVE ] WOOD |

       [ PHIL ] LEE |

       [ STEVE ] YOUNG;

( SENT-START ( DIAL <$digit> | (PHONE|CALL) $name) SENT-END )

=================================================
vertical bars, "|", are alternatives.
square brackets, "[" and "]", are optional items.
angle braces, "<" and ">", are one or more repetitions.

That's why the Fig. 3.1 look like that. 
Because of "<$digit>", the digit allow repeat once or more.
Because of the rad vertical bars in "DIAL <$digit> | (PHONE|CALL) $name", DIAL XXXX is alternative of PHONE someone or CALL someone.
Because of [ PHIL] LEE, we can say Call LEE or Call PHIL LEE.

Now , we can use HTK tool, HParse, to build a word network in HTK Standard Lattice Format (SLF).
HParse Usage - 
  • Without parameters. 
------------------------
$ HParse gram wdnet
------------------------
wdnet file will be like follwoing,
======================
VERSION=1.0
N=31   L=62   
I=0    W=SENT-END            
I=1    W=YOUNG               
I=2    W=!NULL               
I=3    W=STEVE               
I=4    W=LEE                 
I=5    W=PHIL                
I=6    W=WOOD                
I=7    W=DAVE                
I=8    W=TYLER               
I=9    W=JULIAN              
I=10   W=LAW                 
I=11   W=SUE                 
I=12   W=CALL                
I=13   W=!NULL               
I=14   W=PHONE               
I=15   W=ZERO                
I=16   W=!NULL               
I=17   W=OH                  
I=18   W=NINE                
I=19   W=EIGHT               
I=20   W=SEVEN               
I=21   W=SIX                 
I=22   W=FIVE                
I=23   W=FOUR                
I=24   W=THREE               
I=25   W=TWO                 
I=26   W=ONE                 
I=27   W=DIAL                
I=28   W=SENT-START          
I=29   W=!NULL               
I=30   W=!NULL               
J=0     S=2    E=0    
J=1     S=16   E=0   
...omitted
======================
The word network contains 31 nodes and # of arc is 61.
So we have I=0~I=30, 31 nodes.
J=0 S=2 E=0, means arc ID=0, start from Node 2, !NULL, end to Node 0, SENT-END.
The is lower-level description for HTK. gram is high-level description for users.
  • -l, include LM log probs in lattice. 
---------------------------
$ HParse -l gram wdnet
---------------------------
wdnet file will be like following,
Each enteries will have log probability.
======================
Omitted....
J=0     S=2    E=0    l=0.00 
J=1     S=16   E=0    l=-2.48 
Omitted....
======================
How to calculate log probability?

You can build different gram file to generate different wdnet file.

For more detail about HParse, please refer Section 17.16 in HTKBook.
What's going on in wdnet file, please refer Section12.2 in HTKBook.
Know more Standard Lattice Format(SLF), please refer Secion 20 in HTKBook.

No comments:

Clicky

Clicky Web Analytics