- HTKBooks,
- 苏统华, 哈尔滨工业大学人工智能研究室, 2006年10月30日,
- Howard Hung-Ju Chou, Intelligence Information Retrieval Lab., NCKU, Taiwan(R.O.C.).
Environment:
- HTK 3.4
- Cygwin NT-5.1 1.5.25
Section 1 is Data Preparation - 資料準備
- Step 1 - the Task Grammer 辨識模型要用的文法資料(gram->wdnet)
The tutorial is the task that building a recognizer for phone call.
此練習是建立一個可辨識電話撥打語音指令的模型。
Typical example is
一般語音指令如下,
- Dial 3323654
- Dial 9045109
- Phone Woodland
- Call Steve Young
So we can build a model of grammar for voice dialling according to the human speaking rules.
所以我們可以依據我們說話的模式去建立文法模型。
For example, we won't speak "9045108 call" or "199 Steve Young".
So we can make a grammar graph like the Fig. 3.1 in HTKBook.
We can define the grammar graph by simple symbols.
================================================
$digit = ONE | TWO | THREE | FOUR | FIVE |
SIX | SEVEN | EIGHT | NINE | OH | ZERO;
$name = [ SUE ] LAW |
[ JULIAN ]
[ DAVE ] WOOD |
[ PHIL ] LEE |
[ STEVE ] YOUNG;
( SENT-START ( DIAL <$digit> | (PHONE|CALL) $name) SENT-END )
=================================================
vertical bars, "|", are alternatives.
square brackets, "[" and "]", are optional items.
angle braces, "<" and ">", are one or more repetitions.
That's why the Fig. 3.1 look like that.
Because of "<$digit>", the digit allow repeat once or more.
Because of the rad vertical bars in "DIAL <$digit> | (PHONE|CALL) $name", DIAL XXXX is alternative of PHONE someone or CALL someone.
Because of [ PHIL] LEE, we can say Call LEE or Call PHIL LEE.
Now , we can use HTK tool, HParse, to build a word network in HTK Standard Lattice Format (SLF).
HParse Usage -
- Without parameters.
------------------------
$ HParse gram wdnet------------------------
wdnet file will be like follwoing,
======================
VERSION=1.0
N=31 L=62
I=0 W=SENT-END
I=1 W=YOUNG
I=2 W=!NULL
I=3 W=STEVE
I=4 W=LEE
I=5 W=PHIL
I=6 W=WOOD
I=7 W=DAVE
I=8 W=TYLER
I=9 W=JULIAN
I=10 W=LAW
I=11 W=SUE
I=12 W=CALL
I=13 W=!NULL
I=14 W=PHONE
I=15 W=ZERO
I=16 W=!NULL
I=17 W=OH
I=18 W=NINE
I=19 W=EIGHT
I=20 W=SEVEN
I=21 W=SIX
I=22 W=FIVE
I=23 W=FOUR
I=24 W=THREE
I=25 W=TWO
I=26 W=ONE
I=27 W=DIAL
I=28 W=SENT-START
I=29 W=!NULL
I=30 W=!NULL
J=0 S=2 E=0
J=1 S=16 E=0
...omitted
======================
The word network contains 31 nodes and # of arc is 61.
So we have I=0~I=30, 31 nodes.
J=0 S=2 E=0, means arc ID=0, start from Node 2, !NULL, end to Node 0, SENT-END.
The is lower-level description for HTK. gram is high-level description for users.
- -l, include LM log probs in lattice.
---------------------------
$ HParse -l gram wdnet---------------------------
wdnet file will be like following,
Each enteries will have log probability.
======================
Omitted....
J=0 S=2 E=0 l=0.00
J=1 S=16 E=0 l=-2.48
Omitted....
======================
How to calculate log probability?
You can build different gram file to generate different wdnet file.
Know more Standard Lattice Format(SLF), please refer Secion 20 in HTKBook.
No comments:
Post a Comment