JR.Gemini's Knowledge Base: February 2009

Friday, February 20, 2009

HTK Chapter 3 - Section 2 - Step 7

Below paragraphs are belong to

HTKBooks,
苏统华, 哈尔滨工业大学人工智能研究室, 2006年10月30日,
Howard Hung-Ju Chou, Intelligence Information Retrieval Lab., NCKU, Taiwan(R.O.C.).

Environment:

HTK 3.4
Cygwin NT-5.1 1.5.25

Section 2 is Creating Monophone HMMs - 建立單音素模型

Step 6 - Creating Flat Start Monophones
Step 7 - Fixing the Slience Models
Step 8 - Realigning the Training Data

In Step 6, we generate hmm0, hmm1, hmm2, with slience model "sil".

Now the tutorial teaching us to model the model in Fig. 3.9 in Subsection 3.2.2 in HTK Books.

Physical meaning,....

Create a 3 state model for "sp", so we just only one non-emtting state for "sp" model.

How to do that?

Use text editor to...
Use HHEd

The content of "sil" is

=================================================================

~h "sil"

< BEGINHMM >

< NUMSTATES > 5

< STATE > 2

<> 39

-9.389361e-001 -1.287944e+000 8.473723e-002 -4.411200e+000 5.332393e-001 1.843251e-001 2.939802e+000 -2.362492e+000 3.039350e-001 5.898609e-003 -3.105349e+000 -1.462931e+000 5.539479e+001 -2.752953e-002 -2.782337e-002 5.648132e-003 4.534409e-002 1.876847e-002 2.492056e-002 1.361921e-002 -1.723138e-002 1.886967e-002 3.497830e-002 1.276191e-002 2.784961e-002 -3.208526e-002 3.180009e-004 1.971325e-003 -3.830043e-003 -1.048350e-002 -1.810746e-003 -1.773861e-003 -9.375007e-004 3.254613e-004 8.180511e-004 3.765909e-003 1.624564e-003 -3.620259e-004 3.278390e-003

< VARIANCE > 39

4.180664e+001 3.271134e+001 3.581472e+001 6.693031e+001 3.528064e+001 5.052157e+001 2.934049e+001 3.423428e+001 3.710680e+001 3.691701e+001 3.710829e+001 2.969890e+001 6.507053e+001 1.337987e+000 1.072017e+000 1.307518e+000 1.887581e+000 1.697909e+000 1.890241e+000 1.785829e+000 2.181937e+000 1.875866e+000 1.797650e+000 1.730149e+000 1.642454e+000 9.926388e-001 1.710149e-001 1.462792e-001 1.848170e-001 2.626473e-001 2.605441e-001 2.970572e-001 3.222261e-001 3.782587e-001 3.125882e-001 3.063583e-001 2.895371e-001 2.911398e-001 1.187189e-001

< GCONST > 1.071964e+002

< STATE > 3

< MEAN > 39

-1.991913e+000 -4.775551e-002 2.959489e+000 2.209434e+000 2.078557e+000 5.562240e+000 5.464221e+000 -4.776323e+000 1.673594e+000 2.683963e+000 -4.633354e+000 -9.166243e-001 4.628856e+001 -1.207492e-001 -8.760695e-002 -7.070365e-002 7.516075e-002 -4.011013e-003 3.128541e-002 8.115381e-002 -3.286631e-002 1.295639e-001 1.558424e-001 5.380721e-002 1.054287e-001 -1.449030e-001 1.667164e-002 2.022874e-002 1.105829e-003 -2.183086e-002 -7.496935e-003 -4.172942e-002 -3.657551e-002 1.193289e-002 -1.476659e-002 -2.710904e-002 1.349834e-002 9.330045e-004 2.211097e-002

< VARIANCE > 39

5.752877e+000 5.706749e+000 9.791572e+000 1.276698e+001 1.414043e+001 1.682921e+001 1.643664e+001 1.884838e+001 1.942560e+001 2.041147e+001 1.927709e+001 1.510888e+001 1.051241e+001 2.168639e-001 3.732721e-001 6.485465e-001 8.246439e-001 9.308486e-001 1.138545e+000 1.447520e+000 1.688959e+000 1.681041e+000 1.680561e+000 1.580671e+000 1.330634e+000 9.859556e-002 3.477598e-002 6.478215e-002 1.191088e-001 1.600942e-001 1.801341e-001 2.153407e-001 2.852951e-001 3.301157e-001 3.403606e-001 3.369383e-001 3.197604e-001 2.676942e-001 1.323066e-002

7.787578e+001

< STATE > 4

< MEAN > 39

-2.982345e+000 -1.252340e+000 1.087486e+000 7.909203e-001 1.536108e+000 3.573169e+000 5.625374e+000 -3.234990e+000 2.314626e+000 3.188504e+000 -9.258319e-001 1.509047e+000 4.699720e+001 -7.613304e-003 5.702919e-003 -6.563795e-003 -4.346590e-003 -7.446251e-003 -8.997340e-003 -3.822424e-003 -2.726374e-003 -3.682886e-003 -1.174716e-003 1.001520e-002 1.304566e-002 -2.283418e-003 -2.802775e-004 1.980037e-003 1.587337e-003 -6.755204e-004 2.919145e-003 1.646213e-003 -1.079046e-004 1.305768e-003 2.884402e-004 -2.650670e-003 -2.699222e-003 -4.054980e-003 3.949025e-003

< VARIANCE > 39

5.313723e+000 4.299637e+000 5.806711e+000 7.572632e+000 1.195562e+001 1.127259e+001 1.345822e+001 1.842092e+001 1.902783e+001 1.841946e+001 1.679353e+001 1.275744e+001 2.541775e+000 1.125962e-001 2.241242e-001 3.554686e-001 4.804470e-001 7.102868e-001 8.679712e-001 1.053879e+000 1.259253e+000 1.247817e+000 1.199414e+000 1.138910e+000 9.791774e-001 7.236452e-002 2.274701e-002 4.417740e-002 7.068438e-002 9.644291e-002 1.455498e-001 1.809241e-001 2.171511e-001 2.593471e-001 2.625059e-001 2.464305e-001 2.333392e-001 2.013770e-001 1.429966e-002

< GCONST > 6.495581e+001

< TRANSP > 5

0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000

0.000000e+000 9.399074e-001 6.009261e-002 0.000000e+000 0.000000e+000

0.000000e+000 0.000000e+000 8.703428e-001 1.296572e-001 0.000000e+000

0.000000e+000 0.000000e+000 0.000000e+000 9.800954e-001 1.990458e-002

0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000

< ENDHMM >

=================================================================

We copy the red statements to be model of "sp", but remember that only 3 state exist in model "sp" so we have to modify to 2.

=======================================================================

~h "sp"

< BEGINHMM >

< NUMSTATES > 3

< STATE > 2

< MEAN > 39

< VARIANCE > 39

< GCONST > 1.071964e+002

< TRANSP > 3

0.000000e+000 1.000000e+000 0.000000e+000

0.000000e+000 8.703428e-001 1.296572e-001

0.000000e+000 0.000000e+000 0.000000e+000

< ENDHMM >

=======================================================================

Then use HHEd to modefy hmm4/macros and hmm4/hmmdefs accroding the edit commands in sil.hed.

--------------------------------------------------------------------------------------------------------------------

$ HHEd -H ./hmms/hmm4/macros -H ./hmms/hmm4/hmmdefs -M ./hmms/hmm5 sil.hed ./lists/monophones1

--------------------------------------------------------------------------------------------------------------------

The new hmmdefs, will be added the following statements,

======================================================================================

~s "silst"

< MEAN > 39

-9.389361e-01 -1.287944e+00 8.473723e-02 -4.411200e+00 5.332393e-01 1.843251e-01 2.939802e+00 -2.362492e+00 3.039350e-01 5.898609e-03 -3.105349e+00 -1.462931e+00 5.539479e+01 -2.752953e-02 -2.782337e-02 5.648132e-03 4.534409e-02 1.876847e-02 2.492056e-02 1.361921e-02 -1.723138e-02 1.886967e-02 3.497830e-02 1.276191e-02 2.784961e-02 -3.208526e-02 3.180009e-04 1.971325e-03 -3.830043e-03 -1.048350e-02 -1.810746e-03 -1.773861e-03 -9.375007e-04 3.254613e-04 8.180511e-04 3.765909e-03 1.624564e-03 -3.620259e-04 3.278390e-03

4.180664e+01 3.271134e+01 3.581472e+01 6.693031e+01 3.528064e+01 5.052157e+01 2.934049e+01 3.423428e+01 3.710680e+01 3.691701e+01 3.710829e+01 2.969890e+01 6.507053e+01 1.337987e+00 1.072017e+00 1.307518e+00 1.887581e+00 1.697909e+00 1.890241e+00 1.785829e+00 2.181937e+00 1.875866e+00 1.797650e+00 1.730149e+00 1.642454e+00 9.926388e-01 1.710149e-01 1.462792e-01 1.848170e-01 2.626473e-01 2.605441e-01 2.970572e-01 3.222261e-01 3.782587e-01 3.125882e-01 3.063583e-01 2.895371e-01 2.911398e-01 1.187189e-01

< GCONST > 1.071964e+02

======================================================================================

And original ~h "sil" and ~h "sp" become to be like following,

====================================================

~h "sp"

~s "silst"

0.000000e+00 7.000000e-01 3.000000e-01

0.000000e+00 8.703428e-01 1.296572e-01

0.000000e+00 0.000000e+00 0.000000e+00

~h "sil"

< BEGINHMM >

< NUMSTATES > 5

< STATE > 2

< MEAN > 39

< VARIANCE > 39

< GCONST > 1.071964e+02

~s "silst"

< STATE > 4

< MEAN > 39

-2.982345e+00 -1.252340e+00 1.087486e+00 7.909203e-01 1.536108e+00 3.573169e+00 5.625374e+00 -3.234990e+00 2.314626e+00 3.188504e+00 -9.258319e-01 1.509047e+00 4.699720e+01 -7.613304e-03 5.702919e-03 -6.563795e-03 -4.346590e-03 -7.446251e-03 -8.997340e-03 -3.822424e-03 -2.726374e-03 -3.682886e-03 -1.174716e-03 1.001520e-02 1.304566e-02 -2.283418e-03 -2.802775e-04 1.980037e-03 1.587337e-03 -6.755204e-04 2.919145e-03 1.646213e-03 -1.079046e-04 1.305768e-03 2.884402e-04 -2.650670e-03 -2.699222e-03 -4.054980e-03 3.949025e-03

< VARIANCE > 39

5.313723e+00 4.299637e+00 5.806711e+00 7.572632e+00 1.195562e+01 1.127259e+01 1.345822e+01 1.842092e+01 1.902783e+01 1.841946e+01 1.679353e+01 1.275744e+01 2.541775e+00 1.125962e-01 2.241242e-01 3.554686e-01 4.804470e-01 7.102868e-01 8.679712e-01 1.053879e+00 1.259253e+00 1.247817e+00 1.199414e+00 1.138910e+00 9.791774e-01 7.236452e-02 2.274701e-02 4.417740e-02 7.068438e-02 9.644291e-02 1.455498e-01 1.809241e-01 2.171511e-01 2.593471e-01 2.625059e-01 2.464305e-01 2.333392e-01 2.013770e-01 1.429966e-02

< GCONST > 6.495583e+01

< TRANSP > 5

0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00

0.000000e+00 7.519259e-01 4.807409e-02 2.000000e-01 0.000000e+00

0.000000e+00 0.000000e+00 8.703428e-01 1.296572e-01 0.000000e+00

0.000000e+00 2.000000e-01 0.000000e+00 7.840764e-01 1.592367e-02

0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00

< ENDHMM >

====================================================

Because the commands in sil.hed,

============================

AT 2 4 0.2 { sil. transp }

AT 4 2 0.2 { sil. transp }

AT 1 3 0.3 { sp. transp }

TI slist { sil.state[3], sp.state[2] }

============================

AT i j prob itemList(t) in page 256 in HTKBook 3.4.

The probability will be rescaled so that summation of p is equal to 1.0.

For example in ~h "sp",

We modify it by AT 1 3 0.3 { sp.transp }, so the sp.transp

==================================

< TRANSP > 3

0.000000e+000 1.000000e+000 0.000000e+000

0.000000e+000 8.703428e-001 1.296572e-001

0.000000e+000 0.000000e+000 0.000000e+000

==================================

to be rescaled to

==================================

< TRANSP > 3

0.000000e+00 7.000000e-01 3.000000e-01

0.000000e+00 8.703428e-01 1.296572e-01

0.000000e+00 0.000000e+00 0.000000e+00

==================================

TI means Tie itemlist to be macroname.

-------------------------------------

$ Ti macroname itemlist

-------------------------------------

To know more about TI command, please refer to Chapter 10.3 in HTK Book. (PS: the section 10.3 and section 10.4 should be exchanged.)

To know more about HHed, please refer to Chapter 10 in HTK Book.

Continue...

Wednesday, February 18, 2009

Memo to use CLAMP to generate simulation data

Feel free to use the model save file, named "generate_100_200_0.25.mdl"

Aother one is the same with the tutorial, named "generate_6_200_0.25.mdl"

Download from HERE.

"generate_100_200_0.25.mdl" means

Number of curves: 100
Total time: 200 seconds
Sample interval: 0.25
Concentration is 1, 0.99, 0.98~0.01
Injection time: 100 seconds

Generate simulation file with noise fast,

Selection add noise, choose the value you want
Save the file to be "Save sim"
Click model page
Press Simulation again
You will recover the data with no noise

標點符號的英文

相信一定有人跟我一樣，英文不是很好。

看到一些文件，會看到各式各樣的符號，然後也不太熟悉，所以這些網頁到是整理不少符號。

http://www.eol.cn/article/20060227/3175182.shtml

http://www.hoyo.idv.tw/78/english-comma.htm

http://pub.thit.edu.tw/ThitCC/邱世芬老師/doc/3-1-6/3-1-6-2.htm

http://www.grammarbook.com/

HTK Chapter 3 - Section 2 - Step 6

Below paragraphs are belong to

HTKBooks,
苏统华, 哈尔滨工业大学人工智能研究室, 2006年10月30日,
Howard Hung-Ju Chou, Intelligence Information Retrieval Lab., NCKU, Taiwan(R.O.C.).

Environment:

HTK 3.4
Cygwin NT-5.1 1.5.25

Section 2 is Creating Monophone HMMs - 建立單音素模型

Step 6 - Creating Flat Start Monophones
Step 7 - Fixing the Slience Models
Step 8 - Realigning the Training Data

Now, we have the feature vectors of training data and testing data.
.\data\train\feature\S0001.mfcc
.\data\train\feature\S0002.mfcc
.\data\train\feature\S0003.mfcc
.\data\train\feature\S0004.mfcc
.\data\train\feature\S0005.mfcc
.\data\train\feature\S0006.mfcc
.\data\train\feature\S0007.mfcc
.\data\train\feature\S0008.mfcc
.\data\train\feature\S0009.mfcc
....so no. It is Training data.

Testing data is following,
.\data\test\feature\T0001.mfc
.\data\test\feature\T0002.mfc
.\data\test\feature\T0003.mfc
.\data\test\feature\T0004.mfc
.\data\test\feature\T0005.mfc
.\data\test\feature\T0006.mfc
.\data\test\feature\T0007.mfc
.\data\test\feature\T0008.mfc
.\data\test\feature\T0009.mfc

Next step, we have to give HTK a known prototype HMM model, and give initial value for parameters of model which we use.
=============================

~o < VecSize > 39 < MFCC_0_D_A >

~h "proto"

< BeginHMM >

< NumStates > 5

< State > 2

< Mean > 39

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

< Variance > 39

1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

< State > 3

< Mean > 39

0.0 (x39) it's like state 2

< Variance > 39

1.0 (x39)

< State > 4

< Mean > 39

0.0 (x39)

< Variance > 39

1.0 (x39)

< TransP > 5

0.0 1.0 0.0 0.0 0.0

0.0 0.6 0.4 0.0 0.0

0.0 0.0 0.6 0.4 0.0

0.0 0.0 0.0 0.7 0.3

0.0 0.0 0.0 0.0 0.0

< EndHMM >

=============================
~o < VecSize > 39 < MFCC_0_D_A >
~o is a macro definition, it's global option macros, tell all programs we use the feature vector which's vector size is 39 and format is MFCC_0_D_A

~h "proto"

means the file is "proto" HMM model, so you will see some kind of ~h "hmm0", ~h "hmm1". The name have to be with double quotation marks, " ".

HMM definition tag is < >, it's alike when we coding.

< BeginHMM >and < EnDHMM > is a pair.

Only macro statements can show up before < BeginHMM >.

If the feature vector is local for the HMM, < VecSize > will show up after the < BeginHMM >. Like this, Section 7.2 in HTK Book.

As we know, HMM is state machine model. We have told HTK how many states we prefer.

Tag < NumStates > will tell program the number of states. BTW, all HTK will take 1st state and the last state to be non-emitting(不發聲) states.

So 5, means State1 and State 5 are non-emitting state, State 2, State 3, and State 4, are emitting. Like Fig. 7,1, Section 7.1 in HTK Book. Pay attention, the transition matrix is not the same. I mean state diagram is the same.

Then we have to give initial value for each emitting state, it also depends on what kind of ouput probability distribution we prefer.

In this example,

2, 3, and 4 are all the same with initial value for mean and variance.

Because we use 39 to be our vector, so we have to assign 39 zero to < Mean >and 39 one to < Variance >.

Final one is < TransP >, Transition Parameter matrix, the size of matrix depends on number of state.

We use 5 states now, then we will have 5x5 mitrix for transition matrix.

Each entry in the matrix means a propability from state i to state j.

0.0 1.0 0.0 0.0 0.0

0.0 0.6 0.4 0.0 0.0

0.0 0.0 0.6 0.4 0.0

0.0 0.0 0.0 0.7 0.3

0.0 0.0 0.0 0.0 0.0

means

transition probability from state 1 to state 2 is 1.0
transition probability from state 2 to state 2 is 0.6
transition probability from state 2 to state 3 is 0.4
transition probability from state 3 to state 3 is 0.6
transition probability from state 3 to state 4 is 0.4
transition probability from state 4 to state 4 is 0.7
transition probability from state 4 to state 5 is 0.3

total probability in each row is 1.0, row 1 is 0.0+1.0+0.0+0.0+0.0 = 1.0

row 2 is 0.0+0.6+0.4+0.0+0.0 = 1.0, adn so on.

You can check different and know how to use different output probability distribution in Section 7.2 in HTKBook.

You can develop Mixture Gaussian Model (GMM) to be your output probability distribution, just use tag and .

For example, like Fig. 7.3,

< State > 2 < NumMixes > 2

< Mixture > 1 0.4

< Mean > 4

0.3 0.2 0.2 1.0

< Variance > 4

1.0 1.0 1.0 1.0

< Mixture > 2 0.6

< Mean > 4

0.1 0.0 0.0 0.8

< Variance > 4

1.0 1.0 1.0 1.0

Above definition means, we use 2 mixtures to be output probability model. Also we have to give initial value for each mixture. BTW, summation of weight for each mixture should be 1.0, 0.6 + 0.4 = 1.0

In Fig. 7.4, you can see that we also can define differnt mixture number to each state.

State 2 in Fig. 7.4 has 2 mixtures

State 3 in Fig. 7.4 has only 1 mixture.

Another important point in Fig. 7.4, it's to replace simple .

Know more about HMM definition, refer to Chapter 7 in HTK book.

After difining the HMM model, we start the scan whole training data and get the global mean and variance.

--------------------------------------------------------------------------------------

$ HCompV -C ./config/config1 -f 0.01 -m -S train.scp -M ./hmms/hmm0 proto

--------------------------------------------------------------------------------------

Inputs are config1, train.scp. Output is proto and vFloors (generated by -f).

./config/config1

==================

# Coding parameters

TARGETKIND = MFCC_0_D_A

TARGETRATE = 100000.0

SAVECOMPRESSED = T

SAVEWITHCRC = T

WINDOWSIZE = 250000.0

USEHAMMING = T

PREEMCOEF = 0.97

NUMCHANS = 26

CEPLIFTER = 22

NUMCEPS = 12

ENORMALISE = F

==================

New proto

==================================================================

< STREAMINFO > 1 39

< VECSIZE > 39< NULLD >< MFCC_D_A_0 >< DIAGC >

~h "proto"

< BEGINHMM >

< NUMSTATES > 5

< STATE > 2

< MEAN > 39

-3.864637e-01 -1.276892e+00 6.429603e-01 -4.361009e+00 6.207581e-01 -6.569096e-01 2.480589e+00 -2.788665e+00 -1.313366e-01 6.740692e-01 -3.017518e+00 -1.560625e+00 5.566235e+01 1.460256e-03 4.730462e-04 -4.827005e-04 -7.249162e-04 -6.306474e-04 6.637267e-04 1.647110e-03 -2.088301e-03 9.362018e-05 -1.825078e-03 -1.855212e-03 -1.778467e-03 3.644651e-03 -1.163874e-04 1.333342e-04 -2.520498e-05 -1.577687e-05 1.496438e-04 -1.295793e-04 -2.109938e-04 5.133062e-04 3.661055e-04 6.873756e-05 1.892049e-04 1.713871e-04 -1.179971e-05

< VARIANCE > 39

4.492153e+01 2.800227e+01 4.004902e+01 7.262168e+01 3.713427e+01 5.923348e+01 3.089855e+01 3.635918e+01 4.011551e+01 3.448929e+01 3.661570e+01 3.404308e+01 7.104830e+01 1.414941e+00 1.002086e+00 1.289929e+00 1.967196e+00 1.588490e+00 1.981885e+00 1.694523e+00 2.165956e+00 1.937736e+00 1.799082e+00 1.821838e+00 1.620020e+00 1.004474e+00 1.865744e-01 1.427446e-01 1.801455e-01 2.748002e-01 2.518953e-01 3.164474e-01 3.122217e-01 3.736564e-01 3.291466e-01 3.174342e-01 3.133416e-01 2.797257e-01 1.262979e-01

< GCONST > 1.081255e+02

< STATE > 3

< MEAN > 39

< VARIANCE > 39

< GCONST > 1.081255e+02

< STATE > 4

< MEAN > 39

< VARIANCE > 39

< GCONST > 1.081255e+02

< TRANSP > 5

0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00

0.000000e+00 6.000000e-01 4.000000e-01 0.000000e+00 0.000000e+00

0.000000e+00 0.000000e+00 6.000000e-01 4.000000e-01 0.000000e+00

0.000000e+00 0.000000e+00 0.000000e+00 7.000000e-01 3.000000e-01

0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00

< ENDHMM >

==================================================================

vFloors

==============================================================================

~v varFloor1

4.492153e-01 2.800227e-01 4.004902e-01 7.262168e-01 3.713427e-01 5.923348e-01 3.089855e-01 3.635918e-01 4.011551e-01 3.448929e-01 3.661570e-01 3.404307e-01 7.104830e-01 1.414941e-02 1.002086e-02 1.289929e-02 1.967196e-02 1.588490e-02 1.981885e-02 1.694523e-02 2.165956e-02 1.937735e-02 1.799082e-02 1.821838e-02 1.620020e-02 1.004474e-02 1.865744e-03 1.427446e-03 1.801455e-03 2.748002e-03 2.518953e-03 3.164474e-03 3.122217e-03 3.736564e-03 3.291466e-03 3.174342e-03 3.133416e-03 2.797257e-03 1.262979e-03

==============================================================================

--------------------------------------------------------------------------------------------------------------------

$ HERest -C ./config/config1 -I ./labels/phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H ./hmms/hmm0/macros -H ./hmms/hmm0/hmmdefs -M ./hmms/hmm1 ./lists/monophones0

--------------------------------------------------------------------------------------------------------------------

The inputs of HERest is

./config/config1
./labels/phones0.mlf
train.scp
./hmms/hmm0/macros
./hmms/hmm0/hmmdefs
./lists/monophones0

Outputs are in ./hmms/hmm1 given by -M

macros
hmmdefs

./lists/monophones0, it is deleted "sp" from monophone1. If you use monophone1 now, because you can't find corresponding ~h "sp" in hmmdefs file, then you will get an error message.

=========================

sil

=========================

./hmms/hmm0/macros, you can find that the only difference between vFloors and macros is following statement in blodface.

===========================================================================

< VECSIZE > 39 < MFCC_0_D_A >

~v varFloor1

4.492153e-001 2.800227e-001 4.004902e-001 7.262168e-001 3.713427e-001 5.923348e-001 3.089855e-001 3.635918e-001 4.011551e-001 3.448929e-001 3.661570e-001 3.404307e-001 7.104830e-001 1.414941e-002 1.002086e-002 1.289929e-002 1.967196e-002 1.588490e-002 1.981885e-002 1.694523e-002 2.165956e-002 1.937735e-002 1.799082e-002 1.821838e-002 1.620020e-002 1.004474e-002 1.865744e-003 1.427446e-003 1.801455e-003 2.748002e-003 2.518953e-003 3.164474e-003 3.122217e-003 3.736564e-003 3.291466e-003 3.174342e-003 3.133416e-003 2.797257e-003 1.262979e-003

===========================================================================

If you see following messages,

=======================================================

Pruning-On[250.0 150.0 1000.0]

ERROR [+6510] LOpen: Unable to open label file .\data\train\feature\S0001.lab

FATAL ERROR - Terminating program HERest

=======================================================

That is caused by we don't have S0001.lab file, actually the content of S0001.lab is the same with one small parts labeled by "*/S0001.lab" in phones0.mlf.

S0001.lab will be like,

==============================================

sil

==============================================

You can download the *.lab file from HERE. to avoid the occuring error.

Then we estimate twice again, like below, almost the same, but we estimate according to previous eastimation results.

we generate hmm1/macros and hmm1/hmmdef from hmm0/macros and hmm0/hmmdef
we generate hmm2/macros and hmm2/hmmdef from hmm1/macros and hmm1/hmmdef
we generate hmm3/macros and hmm3/hmmdef from hmm2/macros and hmm2/hmmdef

--------------------------------------------------------------------------------------------------------------------

$ HERest -C ./config/config1 -I ./labels/phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H./hmms/hmm1/macros -H ./hmms/hmm1/hmmdefs -M ./hmms/hmm2 ./lists/monophones0

--------------------------------------------------------------------------------------------------------------------

$ HERest -C ./config/config1 -I ./labels/phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H./hmms/hmm2/macros -H ./hmms/hmm2/hmmdefs -M ./hmms/hmm3 ./lists/monophones0

--------------------------------------------------------------------------------------------------------------------

Finish the Step 6. Go to next step, Step 7.

Wednesday, February 11, 2009

Memo to use lmfit-2.4

Thanks for the author of lmfit. http://sourceforge.net/projects/lmfit/

We will have following code in lmfit,

lm_test.c - main( ), giving simple input data pair, fitting function and solve it
lmmin.c - lm_initialize_control, lm_minimize, lm_lmdif, lm_lmpar, lm_qrfac, lm_qrsolv, lm_enorm
lmmin.h -structure definition for lm_control_type
lm_eval.c - lm_evaluate_default, lm_print_default
lm_eval.h - stucture definition for input data, &data in lm_minimize( )

Running lm_test.exe, then you will see a simple demo.

How to make the execution file?

Using linux, just follow the instruction which in lm_test.c

--------------------------------------------------

$ gcc -o lmtest -lm lmmin.c lm_eval.c lm_test.c

--------------------------------------------------

Using windows, include the lmmin.c lm_eval.c to your lm_test.c like,

#include "lmmin.c"

#include "lm_eval.c"

The result will be

==============================================

C:\>lm_test.exe

modify or replace lm_print_default for less verbous fitting

starting minimization

par: 1 1 1 => norm: 0.533515

determining gradient (iteration 1)

par: 1 1 1 => norm: 0.533515

determining gradient (iteration 1)

par: 1 1 1 => norm: 0.533515

determining gradient (iteration 1)

par: 1 1 1 => norm: 0.533515

trying step in gradient direction

par: 1 1 1 => norm: 1.29168

trying step in gradient direction

par: 1 1 1 => norm: 0.247969

.....

determining gradient (iteration 10)

par: 6.25914 16.1219 6.75155 => norm: 0.0152438

trying step in gradient direction

par: 6.25914 16.1219 6.75155 => norm: 0.0152438

terminated after 42 evaluations

par: 6.25914 16.1219 6.75155 => norm: 0.0152438

fitting data as follows:

t[ 0]= 0.07 y= 0.24 fit= 0.24262 residue= -0.00261956

t[ 1]= 0.13 y= 0.35 fit= 0.346227 residue= 0.00377306

t[ 2]= 0.19 y= 0.43 fit= 0.423766 residue= 0.00623411

t[ 3]= 0.26 y= 0.49 fit= 0.498947 residue= -0.00894747

t[ 4]= 0.32 y= 0.55 fit= 0.555683 residue= -0.00568282

t[ 5]= 0.38 y= 0.61 fit= 0.607558 residue= 0.00244152

t[ 6]= 0.44 y= 0.66 fit= 0.65571 residue= 0.00429038

t[ 7]= 0.51 y= 0.71 fit= 0.708095 residue= 0.00190477

t[ 8]= 0.57 y= 0.75 fit= 0.750267 residue=-0.000266809

t[ 9]= 0.63 y= 0.79 fit= 0.790257 residue=-0.000256849

t[10]= 0.69 y= 0.83 fit= 0.828305 residue= 0.00169472

t[11]= 0.76 y= 0.87 fit= 0.870492 residue=-0.000491909

t[12]= 0.82 y= 0.9 fit= 0.904938 residue= -0.00493756

t[13]= 0.88 y= 0.94 fit= 0.937935 residue= 0.00206541

t[14]= 0.94 y= 0.97 fit= 0.969591 residue= 0.000409208

status: success (f) after 42 evaluations

==================================================

The input data is a pair data stored in t[ ] and y[ ], t and y are known.

p[ ] is unknown, a parameter vector for given function, called my_fit_function( ),

In lm_test.c, the fiitting function g is

[p_0 * t_i + (1 - p_0 + p_1 +p_2) * t_i^2] / (1 + p_1 * t_i + p_2 * t_i^2)

we have to find a parameter vector that means p_0, p_1, and p_2 such that g(t_i)=y.

So unknowns are parameter vector, p[0], p[1], p[2], function g, and (t, y) are knowns.

For example,

we have following data pair (t, y)

t y

0.07 0.24

0.13 0.35

0.19 0.42

0.26 0.49

0.32 0.55

0.38 0.61

0.44 0.66

0.51 0.71

0.57 0.75

0.63 0.79

0.69 0.83

0.76 0.87

0.82 0.9

0.88 0.94

0.94 0.97

So we execute the procedue lm_minimize(m_dat, n_p, p, lm_evaluate_default, lm_print_default, &data, &control) and put appropriate parameters.

m_dat : number of input, that means size of t[ ] here. Because t and y are data pair. They are in the same size.
n_p : number of parameter, that means size of p[ ] here.
lm_evaluate_default :
lm_print_default : show the processing messages and results
&data : is lm_data_type, defining in lm_eval.h
&control : a sturcture to control LM, for example, threshold of loop termination.

The initial value of control is defined by lm_initialize_control( ) in lmmin.c

maxcall = 100
epsilon = 6.661338e-015 (Actually is 30*2.220446e-016 by definition statement)
stepbound = 100
ftol = 6.661338e-015
xtol = 6.661338e-015
gtol = 6.661338e-015

They are all defined by following definition statements,

==========================================================

/* machine-dependent constants from float.h */

#define LM_MACHEP DBL_EPSILON /* resolution of arithmetic */

#define LM_DWARF DBL_MIN /* smallest nonzero number */

#define LM_SQRT_DWARF sqrt(DBL_MIN) /* square should not underflow */

#define LM_SQRT_GIANT sqrt(DBL_MAX) /* square should not overflow */

#define LM_USERTOL 30*LM_MACHEP /* users are recommened to require this */

==========================================================

DBL_EPSILON is 2.220446e-016 in my computer.

DBL_MIN is 2.225074e-308

DBL_MAX is 1.797693e+308

continue...

Friday, February 6, 2009

Memo to use cclip-1.2

Thanks for the author of cclip. http://sourceforge.net/projects/cclip/

We will have following code in cclip,

main.c
funcs.c
clip/clip.c
clip/clip.h

Because clip/clip.c is C++ language, I don't know how to compile it if I use C compiler.

So I modify the main.c to main.cpp, then it works.

I add some function for more alike the professional program.

For example, we won't type argument for first time use or will type "--help" or "-h" to udnerstand how to use the option.

So I add the if statement,

====================

if(argc = = 1)

{

PrintfUsage(variables);

}

====================

Then, you can just type the command, for example,

in linux,

------------------------------

$ ./main

------------------------------

In DOS,

------------------------------

$ main.exe

------------------------------

The screen will show simple information of usage,

===================================

C:\>main.exe

USAGE : main [ options] +src Default

-i inputdata Load the input data XXXX

===================================

How to add options?

modify opt options[NUM_OPTS] in clip.c
include right location of your new function
modify #define NUM_OPTS 6

opt options [NUM_OPTS] = {

{ "f", 1, SHORT, Givingformat },

{ "o", 1, SHORT, SaveResult },

{ "i", 1, SHORT, LoadFile },

{ "d", 0, SHORT, display_input_data },

{ "help", 0, LONG, PrintInfo },

{ "h", 0, SHORT, PrintInfo },

};

Comparing with the usage information,

{ "f", 1, SHORT, Givingformat }means -f argument1 to active subprocedure Givingformat ( ).

{ "help", 0, LONG, PrintInfo }means --help without argument to active subprocedure PrintInfo( ).

If you need to the options with 2 or more arguments, modify the 2nd value to be the integer you want.

For example, "--G goal1 goal2 goal3", means { "G", 3, LONG, GOAL }.

NUM_OPTS must be correct with the number of entry in options [ ].

If you subprocedure is not in funcs.c, you have include it right.

For example,

#include "../funcs.c"

#include "../goal.c"

Content in goal.c will be like,

int GOAL(char *args [ ])

{

printf("argument1 is %s, argument2 is %s, argument3 is %s",args[0], args[1], args[2]);

return 0;

}

It should be workable.

Then print "argument1 is goal1, argument2 is goal2, argument3 is goal3" on the screen.

Thursday, February 5, 2009

HTK Chapter 3 - Section 1 - Step 5

Below paragraphs are belong to

HTKBooks,
苏统华, 哈尔滨工业大学人工智能研究室, 2006年10月30日,
Howard Hung-Ju Chou, Intelligence Information Retrieval Lab., NCKU, Taiwan(R.O.C.).

Environment:

HTK 3.4
Cygwin NT-5.1 1.5.25

Section 1 is Data Preparation - 資料準備

HCopy is the general-purpose tool of HTK. Except the editor,it also provides speech coding function. Reference Section 5.16 in HTKBook.

HCopy是HTK裡常用到的工具，除了檔案的編輯功能，還可以提供語音檔案的轉碼。

Now we have the wave files recording from HSLab with trainprompts and testprompts.

train/S0001.wav will recod the first prompt in trainprompts, "DIAL EIGHT FIVE", and so on.

Now we have to transfer S0001.wav into sequence of feature vector. Like the command,

----------------------------------------------------

$HCopy -T 1 -C config S0001.wav S0001.mfcc

----------------------------------------------------

-T is the standard option for each HTK command, you can refer Section 4.4 in HTKBook. Set it to be 1, you can trace the HCopy process.

-C is another standard option to tell program to read the configuration file.

config file is

============================================

# Coding parameters

TARGETKIND = MFCC_0_D_A // Parameter kind of target, MFCC with qualifier_0, Delta coeff., amd acceleration coeff., default is ANON.

TARGETRATE = 100000.0 // 100 ns, sample period of target in 100ns units, Section 5.2, default 0.0.

SAVECOMPRESSED = T // Save the output ﬁle in compressed form, refer to Section 5.16, default True.

SAVEWITHCRC = T // Attach a checksum to output parameter ﬁle, refer to Section 5.16, default True.

WINDOWSIZE = 250000.0 // Analysis window size in 100ns units, refer to Section 5.2, default 256000.

USEHAMMING = T // Use a Hamming window, refer to refer to Section 5.2, default True.

PREEMCOEF = 0.97 // Set pre-emphasis coeﬃcient, refer to Section 5.2, default 0.97.

NUMCHANS = 26 // Number of filterbank channels, refer to Section 5.6, default 20.

CEPLIFTER = 22 // Cepstral liftering coeﬃcient, refer to Section 5.3, default 22.

NUMCEPS = 12 // Number of cepstral coefficients, refer to Section 5.3, default 12.

ENORMALISE = F //Normalize when energy measure is ON, refer to Section 5.8, default True.

============================================

Above deault value is easily found in Section 5.18.

HTK supports FFT-based and LPC-based analysis.

So we can have different parameters for TARGETKIND.

MFCC, means Mel Frequency Cepstral Coefficients (13 attributes)

Users can set different qualifier to your data.

For MFCC, we have

_0, means o'th cepstral parameter C_0
_E, means with log Energy measure, related to (ENOMALIZE, SILFLOOR, ESCALE)
_D, means appending with 1st order coefficients, delta coefficients, related to (DELTAWINDOW)
_A, means appending with 2nd order coefficients, acceleration coefficients, related to (ACCWINDOW)
_T, means appending with 3rd order coefficients, third differential coefficients, related to (THIRDWINDOW)

_D, _A, _T are with dependency. Use _A with _D, use_T with _D and_A. Relted to V1COMPAT, SIMPLEDIFFS.

Because _0 and _E are with the same results sometimes, we also just use one of them.

MFCC, means Mel Frequency Cepstral Coefficients

MFCC_0, means C_0 to be Energy

MFCC_E, means with Energy

MFCC_E_D, means with Energy and Delta

MFCC_E_D_Z, means with Energy, Delta, and Cepstral Mean Normalization

MFCC_E_D_A, means with Energy, Delta, and Acceleration Coefficients

MFCC_0_D, means C_0 to be Energy and Delta

MFCC_0_D_A, means C_0 to be Energy, Delta, and Acceleration Coefficients

For LPC, we have

LPC, means Linear Prediction Coefficient
LPREFC, means Linear Prediction REFlection Coefficeint
LPCEPSTRA, means Linear Prediction derived CEPSTRAl coeﬃcients
LPDELCEP, means Linear Prediction DELta coefficient + CEPstra
IREFC, means LPREFC stored as 16bit

To know more about this, refer to Section 5.10.1 in HTKBook.

If you have a lot of wave files to transfer into mfcc files.

You can use -S option to use script file to transfer amount of files.

The content of script file is (extension is not a big deal)

============================================

.\data\train\speech\S0001.wav .\data\train\feature\S0001.mfcc

.\data\train\speech\S0002.wav .\data\train\feature\S0002.mfcc

.\data\train\speech\S0003.wav .\data\train\feature\S0003.mfcc

.\data\train\speech\S0004.wav .\data\train\feature\S0004.mfcc

.\data\train\speech\S0005.wav .\data\train\feature\S0005.mfcc

.\data\train\speech\S0006.wav .\data\train\feature\S0006.mfcc

.\data\train\speech\S0007.wav .\data\train\feature\S0007.mfcc

.\data\train\speech\S0008.wav .\data\train\feature\S0008.mfcc

.\data\train\speech\S0009.wav .\data\train\feature\S0009.mfcc

....

============================================

Command is

------------------------------------------------------------

$HCopy -T 1 -C config -S scriptfile

------------------------------------------------------------

To know more about HCopy, refer to Section 17.4 in HTKBook.

JR.Gemini's Knowledge Base

Friday, February 20, 2009

HTK Chapter 3 - Section 2 - Step 7

Wednesday, February 18, 2009

Memo to use CLAMP to generate simulation data

標點符號的英文

HTK Chapter 3 - Section 2 - Step 6

Wednesday, February 11, 2009

Memo to use lmfit-2.4

Friday, February 6, 2009

Memo to use cclip-1.2

Thursday, February 5, 2009

HTK Chapter 3 - Section 1 - Step 5

Clicky

知識備忘庫

My Blog List

Send message to me