Wednesday, February 18, 2009

HTK Chapter 3 - Section 2 - Step 6

Below paragraphs are belong to
  • HTKBooks, 
  • 苏统华, 哈尔滨工业大学人工智能研究室, 2006年10月30日, 
  • Howard Hung-Ju Chou, Intelligence Information Retrieval Lab., NCKU, Taiwan(R.O.C.).
Environment:
  • HTK 3.4 
  • Cygwin NT-5.1 1.5.25
Section 2 is Creating Monophone HMMs - 建立單音素模型
  • Step 6 - Creating Flat Start Monophones
  • Step 7 - Fixing the Slience Models
  • Step 8 - Realigning the Training Data
Now, we have the feature vectors of training data and testing data.
.\data\train\feature\S0001.mfcc
.\data\train\feature\S0002.mfcc
.\data\train\feature\S0003.mfcc
.\data\train\feature\S0004.mfcc
.\data\train\feature\S0005.mfcc
.\data\train\feature\S0006.mfcc
.\data\train\feature\S0007.mfcc
.\data\train\feature\S0008.mfcc
.\data\train\feature\S0009.mfcc
....so no. It is Training data.

Testing data is following,
.\data\test\feature\T0001.mfc
.\data\test\feature\T0002.mfc
.\data\test\feature\T0003.mfc
.\data\test\feature\T0004.mfc
.\data\test\feature\T0005.mfc
.\data\test\feature\T0006.mfc
.\data\test\feature\T0007.mfc
.\data\test\feature\T0008.mfc
.\data\test\feature\T0009.mfc

Next step, we have to give HTK a known prototype HMM model, and give initial value for parameters of model which we use.
=============================
~o <  VecSize  > 39 <  MFCC_0_D_A  >
~h "proto"
<  BeginHMM  >
  <  NumStates  > 5
  <  State  > 2
    <  Mean  > 39
      0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
    <  Variance  > 39
      1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
 <  State  > 3
    <  Mean  > 39
      0.0  (x39) it's like state 2 
    <  Variance  > 39
      1.0 (x39)
 <  State  > 4
    <  Mean  > 39
      0.0  (x39) 
    <  Variance  > 39
      1.0  (x39)
 <  TransP  > 5
  0.0 1.0 0.0 0.0 0.0
  0.0 0.6 0.4 0.0 0.0
  0.0 0.0 0.6 0.4 0.0
  0.0 0.0 0.0 0.7 0.3
  0.0 0.0 0.0 0.0 0.0
<  EndHMM  >
=============================
~o <  VecSize  > 39 <  MFCC_0_D_A  >
~o is a macro definition, it's global option macros, tell all programs we use the feature vector which's vector size is 39 and format is MFCC_0_D_A

~h "proto" 
means the file is "proto" HMM model, so you will see some kind of ~h "hmm0", ~h "hmm1". The name have to be with double quotation marks, " ".

HMM definition tag is <  >, it's alike when we coding.
<  BeginHMM  >and <  EnDHMM  > is a pair.
Only macro statements can show up before <  BeginHMM  >.
If the feature vector is local for the HMM, <  VecSize  > will show up after the <  BeginHMM  >. Like this, Section 7.2 in HTK Book.

As we know, HMM is state machine model. We have told HTK how many states we prefer.
Tag <  NumStates   > will tell program the number of states. BTW, all HTK will take 1st state and the last state to be non-emitting(不發聲) states.
So 5, means State1 and State 5 are non-emitting state, State 2, State 3, and State 4, are emitting. Like Fig. 7,1, Section 7.1 in HTK Book. Pay attention, the transition matrix is not the same. I mean state diagram is the same.

Then we have to give initial value for each emitting state, it also depends on what kind of ouput probability distribution we prefer.
In this example, 
2,  3, and  4 are all the same with initial value for mean and variance. 
Because we use 39 to be our vector, so we have to assign 39 zero to <  Mean  >and 39 one to <  Variance  >.

Final one is <  TransP  >, Transition Parameter matrix, the size of matrix depends on number of state.
We use 5 states now, then we will have 5x5 mitrix for transition matrix.
Each entry in the matrix means a propability from state i to state j. 
  0.0 1.0 0.0 0.0 0.0
  0.0 0.6 0.4 0.0 0.0
  0.0 0.0 0.6 0.4 0.0
  0.0 0.0 0.0 0.7 0.3
  0.0 0.0 0.0 0.0 0.0
means 
  • transition probability from state 1 to state 2 is 1.0
  • transition probability from state 2 to state 2 is 0.6
  • transition probability from state 2 to state 3 is 0.4
  • transition probability from state 3 to state 3 is 0.6
  • transition probability from state 3 to state 4 is 0.4
  • transition probability from state 4 to state 4 is 0.7
  • transition probability from state 4 to state 5 is 0.3
total probability in each row is 1.0, row 1 is 0.0+1.0+0.0+0.0+0.0 = 1.0
row 2 is 0.0+0.6+0.4+0.0+0.0 = 1.0, adn so on.

You can check different and know how to use different output probability distribution in Section 7.2 in HTKBook.
You can develop Mixture Gaussian Model (GMM) to be your output probability distribution, just use tag and .
For example, like Fig. 7.3,
<  State  > 2 <  NumMixes  > 2
     <  Mixture  > 1 0.4
        <  Mean  > 4
            0.3  0.2  0.2  1.0
        <  Variance  > 4
            1.0  1.0  1.0  1.0
     <  Mixture  > 2 0.6
       <  Mean  >  4
            0.1  0.0  0.0  0.8
       <  Variance  > 4
            1.0  1.0  1.0  1.0
Above definition means, we use 2 mixtures to be output probability model. Also we have to give initial value for each mixture. BTW, summation of weight for each mixture should be 1.0, 0.6 + 0.4 = 1.0

In Fig. 7.4, you can see that we also can define differnt mixture number to each state.
State 2 in Fig. 7.4 has 2 mixtures
State 3 in Fig. 7.4 has only 1 mixture. 
Another important point in Fig. 7.4, it's to replace simple .

Know more about HMM definition, refer to Chapter 7 in HTK book.

After difining the HMM model, we start the scan whole training data and get the global mean and variance.
--------------------------------------------------------------------------------------
$ HCompV -C ./config/config1 -f 0.01 -m -S train.scp -M ./hmms/hmm0 proto
--------------------------------------------------------------------------------------
Inputs are config1, train.scp. Output is proto and vFloors (generated by -f).
./config/config1
==================
# Coding parameters
TARGETKIND = MFCC_0_D_A
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 26
CEPLIFTER = 22
NUMCEPS = 12
ENORMALISE = F
==================
New proto
==================================================================
~o
<  STREAMINFO  > 1 39
<  VECSIZE  > 39<  NULLD  ><  MFCC_D_A_0  ><  DIAGC  >
~h "proto"
<  BEGINHMM  >
<  NUMSTATES  > 5
<  STATE  > 2
<  MEAN  > 39
 -3.864637e-01 -1.276892e+00 6.429603e-01 -4.361009e+00 6.207581e-01 -6.569096e-01 2.480589e+00 -2.788665e+00 -1.313366e-01 6.740692e-01 -3.017518e+00 -1.560625e+00 5.566235e+01 1.460256e-03 4.730462e-04 -4.827005e-04 -7.249162e-04 -6.306474e-04 6.637267e-04 1.647110e-03 -2.088301e-03 9.362018e-05 -1.825078e-03 -1.855212e-03 -1.778467e-03 3.644651e-03 -1.163874e-04 1.333342e-04 -2.520498e-05 -1.577687e-05 1.496438e-04 -1.295793e-04 -2.109938e-04 5.133062e-04 3.661055e-04 6.873756e-05 1.892049e-04 1.713871e-04 -1.179971e-05
<  VARIANCE  > 39
 4.492153e+01 2.800227e+01 4.004902e+01 7.262168e+01 3.713427e+01 5.923348e+01 3.089855e+01 3.635918e+01 4.011551e+01 3.448929e+01 3.661570e+01 3.404308e+01 7.104830e+01 1.414941e+00 1.002086e+00 1.289929e+00 1.967196e+00 1.588490e+00 1.981885e+00 1.694523e+00 2.165956e+00 1.937736e+00 1.799082e+00 1.821838e+00 1.620020e+00 1.004474e+00 1.865744e-01 1.427446e-01 1.801455e-01 2.748002e-01 2.518953e-01 3.164474e-01 3.122217e-01 3.736564e-01 3.291466e-01 3.174342e-01 3.133416e-01 2.797257e-01 1.262979e-01
<  GCONST  > 1.081255e+02
<  STATE  > 3
<  MEAN  > 39
 -3.864637e-01 -1.276892e+00 6.429603e-01 -4.361009e+00 6.207581e-01 -6.569096e-01 2.480589e+00 -2.788665e+00 -1.313366e-01 6.740692e-01 -3.017518e+00 -1.560625e+00 5.566235e+01 1.460256e-03 4.730462e-04 -4.827005e-04 -7.249162e-04 -6.306474e-04 6.637267e-04 1.647110e-03 -2.088301e-03 9.362018e-05 -1.825078e-03 -1.855212e-03 -1.778467e-03 3.644651e-03 -1.163874e-04 1.333342e-04 -2.520498e-05 -1.577687e-05 1.496438e-04 -1.295793e-04 -2.109938e-04 5.133062e-04 3.661055e-04 6.873756e-05 1.892049e-04 1.713871e-04 -1.179971e-05
<  VARIANCE  > 39
 4.492153e+01 2.800227e+01 4.004902e+01 7.262168e+01 3.713427e+01 5.923348e+01 3.089855e+01 3.635918e+01 4.011551e+01 3.448929e+01 3.661570e+01 3.404308e+01 7.104830e+01 1.414941e+00 1.002086e+00 1.289929e+00 1.967196e+00 1.588490e+00 1.981885e+00 1.694523e+00 2.165956e+00 1.937736e+00 1.799082e+00 1.821838e+00 1.620020e+00 1.004474e+00 1.865744e-01 1.427446e-01 1.801455e-01 2.748002e-01 2.518953e-01 3.164474e-01 3.122217e-01 3.736564e-01 3.291466e-01 3.174342e-01 3.133416e-01 2.797257e-01 1.262979e-01
<  GCONST  > 1.081255e+02
<  STATE  > 4
<  MEAN  > 39
 -3.864637e-01 -1.276892e+00 6.429603e-01 -4.361009e+00 6.207581e-01 -6.569096e-01 2.480589e+00 -2.788665e+00 -1.313366e-01 6.740692e-01 -3.017518e+00 -1.560625e+00 5.566235e+01 1.460256e-03 4.730462e-04 -4.827005e-04 -7.249162e-04 -6.306474e-04 6.637267e-04 1.647110e-03 -2.088301e-03 9.362018e-05 -1.825078e-03 -1.855212e-03 -1.778467e-03 3.644651e-03 -1.163874e-04 1.333342e-04 -2.520498e-05 -1.577687e-05 1.496438e-04 -1.295793e-04 -2.109938e-04 5.133062e-04 3.661055e-04 6.873756e-05 1.892049e-04 1.713871e-04 -1.179971e-05
<  VARIANCE  > 39
 4.492153e+01 2.800227e+01 4.004902e+01 7.262168e+01 3.713427e+01 5.923348e+01 3.089855e+01 3.635918e+01 4.011551e+01 3.448929e+01 3.661570e+01 3.404308e+01 7.104830e+01 1.414941e+00 1.002086e+00 1.289929e+00 1.967196e+00 1.588490e+00 1.981885e+00 1.694523e+00 2.165956e+00 1.937736e+00 1.799082e+00 1.821838e+00 1.620020e+00 1.004474e+00 1.865744e-01 1.427446e-01 1.801455e-01 2.748002e-01 2.518953e-01 3.164474e-01 3.122217e-01 3.736564e-01 3.291466e-01 3.174342e-01 3.133416e-01 2.797257e-01 1.262979e-01
<  GCONST  > 1.081255e+02
<  TRANSP  > 5
 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
 0.000000e+00 6.000000e-01 4.000000e-01 0.000000e+00 0.000000e+00
 0.000000e+00 0.000000e+00 6.000000e-01 4.000000e-01 0.000000e+00
 0.000000e+00 0.000000e+00 0.000000e+00 7.000000e-01 3.000000e-01
 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
<  ENDHMM  >
==================================================================
vFloors
==============================================================================
~v varFloor1
39
 4.492153e-01 2.800227e-01 4.004902e-01 7.262168e-01 3.713427e-01 5.923348e-01 3.089855e-01 3.635918e-01 4.011551e-01 3.448929e-01 3.661570e-01 3.404307e-01 7.104830e-01 1.414941e-02 1.002086e-02 1.289929e-02 1.967196e-02 1.588490e-02 1.981885e-02 1.694523e-02 2.165956e-02 1.937735e-02 1.799082e-02 1.821838e-02 1.620020e-02 1.004474e-02 1.865744e-03 1.427446e-03 1.801455e-03 2.748002e-03 2.518953e-03 3.164474e-03 3.122217e-03 3.736564e-03 3.291466e-03 3.174342e-03 3.133416e-03 2.797257e-03 1.262979e-03
==============================================================================

--------------------------------------------------------------------------------------------------------------------
$ HERest -C ./config/config1 -I ./labels/phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H ./hmms/hmm0/macros -H ./hmms/hmm0/hmmdefs -M ./hmms/hmm1 ./lists/monophones0
--------------------------------------------------------------------------------------------------------------------
The inputs of HERest is 
  • ./config/config1
  • ./labels/phones0.mlf
  • train.scp
  • ./hmms/hmm0/macros
  • ./hmms/hmm0/hmmdefs
  • ./lists/monophones0
Outputs are in ./hmms/hmm1 given by -M
  • macros
  • hmmdefs
./lists/monophones0, it is deleted "sp" from monophone1. If you use monophone1 now, because you can't find corresponding ~h "sp" in hmmdefs file, then you will get an error message.
=========================
k
ao
l
d
ey
v
ay
ax
t
f
r
jh
uw
ia
n
y
iy
ow
w
ah
ih
sil
s
eh
th
uh
ng
z
=========================
./hmms/hmm0/macros, you can find that the only difference between vFloors and macros is following statement in blodface.
===========================================================================
~o
<  VECSIZE  > 39 <  MFCC_0_D_A  >
~v varFloor1
39
 4.492153e-001 2.800227e-001 4.004902e-001 7.262168e-001 3.713427e-001 5.923348e-001 3.089855e-001 3.635918e-001 4.011551e-001 3.448929e-001 3.661570e-001 3.404307e-001 7.104830e-001 1.414941e-002 1.002086e-002 1.289929e-002 1.967196e-002 1.588490e-002 1.981885e-002 1.694523e-002 2.165956e-002 1.937735e-002 1.799082e-002 1.821838e-002 1.620020e-002 1.004474e-002 1.865744e-003 1.427446e-003 1.801455e-003 2.748002e-003 2.518953e-003 3.164474e-003 3.122217e-003 3.736564e-003 3.291466e-003 3.174342e-003 3.133416e-003 2.797257e-003 1.262979e-003
===========================================================================

If you see following messages,
=======================================================
Pruning-On[250.0 150.0 1000.0]
ERROR [+6510]  LOpen: Unable to open label file .\data\train\feature\S0001.lab
FATAL ERROR - Terminating program HERest
=======================================================
That is caused by we don't have S0001.lab file, actually the content of S0001.lab is the same with one small parts labeled by "*/S0001.lab" in phones0.mlf.

S0001.lab will be like,
==============================================
sil
d
ay
ax
l
ey
t
f
ay
v
sil
==============================================
You can download the *.lab file from HERE. to avoid the occuring error.

Then we estimate twice again, like below, almost the same, but we estimate according to previous eastimation results.
  1. we generate hmm1/macros and hmm1/hmmdef  from hmm0/macros and hmm0/hmmdef 
  2. we generate hmm2/macros and hmm2/hmmdef  from hmm1/macros and hmm1/hmmdef 
  3. we generate hmm3/macros and hmm3/hmmdef  from hmm2/macros and hmm2/hmmdef 
--------------------------------------------------------------------------------------------------------------------
$ HERest -C ./config/config1 -I ./labels/phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H./hmms/hmm1/macros -H ./hmms/hmm1/hmmdefs -M ./hmms/hmm2 ./lists/monophones0
--------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------
$ HERest -C ./config/config1 -I ./labels/phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H./hmms/hmm2/macros -H ./hmms/hmm2/hmmdefs -M ./hmms/hmm3 ./lists/monophones0
--------------------------------------------------------------------------------------------------------------------

Finish the Step 6. Go to next step, Step 7.

3 comments:

iNouf said...

hello,
I happen to have the "unable to open label file" error at when trying to execute hresults "i was happy i finally got the final part T__T".
though the lab file exists.
any help?

thank you.

Unknown said...

It's very hard to solve the problem you got. In which step you got the trouble?

Unknown said...

Hi!
I have the same error when i try to run HERest:
Pruning-On[250.0 150.0 1000.0]
ERROR [+6510] LOpen: Unable to open label file .\data\train\feature\S0001.lab
FATAL ERROR - Terminating program HERest

But if I have created a MLF why do I have to put all the .lab files where it says?
Is that the only solution for this error? because if thats true I will have to create all the .lab files one by one.

answer me please.
Thanks!

Clicky

Clicky Web Analytics