Thursday, February 5, 2009

HTK Chapter 3 - Section 1 - Step 5

Below paragraphs are belong to
  • HTKBooks, 
  • 苏统华, 哈尔滨工业大学人工智能研究室, 2006年10月30日, 
  • Howard Hung-Ju Chou, Intelligence Information Retrieval Lab., NCKU, Taiwan(R.O.C.).
Environment:
  • HTK 3.4 
  • Cygwin NT-5.1 1.5.25
Section 1 is Data Preparation - 資料準備
HCopy is the general-purpose tool of HTK. Except the editor,it also provides speech coding function. Reference Section 5.16 in HTKBook.
HCopy是HTK裡常用到的工具,除了檔案的編輯功能,還可以提供語音檔案的轉碼。

Now we have the wave files recording from HSLab with trainprompts and testprompts.
train/S0001.wav will recod the first prompt in trainprompts, "DIAL EIGHT FIVE", and so on.

Now we have to transfer S0001.wav into sequence of feature vector. Like the command,
----------------------------------------------------
$HCopy  -T 1 -C config  S0001.wav S0001.mfcc
----------------------------------------------------
-T is the standard option for each HTK command, you can refer Section 4.4 in HTKBook. Set it to be 1, you can trace the HCopy process.
-C is another standard option to tell program to read the configuration file.
config file is
============================================
# Coding parameters
TARGETKIND = MFCC_0_D_A // Parameter kind of target, MFCC with qualifier_0, Delta coeff., amd acceleration coeff., default is ANON.
TARGETRATE = 100000.0  // 100 ns, sample period of target in 100ns units, Section 5.2, default 0.0.
SAVECOMPRESSED = T // Save the output file in compressed form, refer to Section 5.16, default True.
SAVEWITHCRC = T  // Attach a checksum to output parameter file, refer to Section 5.16, default True.
WINDOWSIZE = 250000.0 // Analysis window size in 100ns units, refer to Section 5.2, default 256000.
USEHAMMING = T  // Use a Hamming window, refer to refer to Section 5.2, default True.
PREEMCOEF = 0.97 // Set pre-emphasis coefficient, refer to Section 5.2, default 0.97.
NUMCHANS = 26  // Number of filterbank channels, refer to Section 5.6, default 20.
CEPLIFTER = 22    // Cepstral liftering coefficient, refer to Section 5.3, default 22.
NUMCEPS = 12     // Number of cepstral coefficients, refer to Section 5.3, default 12.
ENORMALISE = F  //Normalize when energy measure is ON, refer to Section 5.8, default True.
============================================
Above deault value is easily found in Section 5.18.

HTK supports FFT-based and LPC-based analysis. 
So we can have different parameters for TARGETKIND.

MFCC, means Mel Frequency Cepstral Coefficients (13 attributes)
Users can set different qualifier to your data.
For MFCC, we have
  • _0, means o'th cepstral parameter C_0
  • _E, means with log Energy measure, related to (ENOMALIZE, SILFLOOR, ESCALE)
  • _D, means appending with 1st order coefficients, delta coefficients, related to (DELTAWINDOW)
  • _A, means appending with 2nd order coefficients, acceleration coefficients, related to (ACCWINDOW)
  • _T, means appending with 3rd order coefficients, third differential coefficients, related to (THIRDWINDOW)
_D, _A, _T are with dependency. Use _A with _D, use_T with _D and_A. Relted to V1COMPAT, SIMPLEDIFFS.
Because _0 and _E are with the same results sometimes, we also just use one of them.

MFCC, means Mel Frequency Cepstral Coefficients
MFCC_0, means C_0 to be Energy
MFCC_E, means with Energy 
MFCC_E_D, means with Energy and Delta
MFCC_E_D_Z, means with Energy, Delta, and Cepstral Mean Normalization
MFCC_E_D_A, means with Energy, Delta, and Acceleration Coefficients
MFCC_0_D, means C_0 to be Energy and Delta
MFCC_0_D_A, means C_0 to be Energy, Delta, and Acceleration Coefficients

For LPC, we have
  • LPC, means Linear Prediction Coefficient
  • LPREFC, means Linear Prediction REFlection Coefficeint
  • LPCEPSTRA, means Linear Prediction derived CEPSTRAl coefficients
  • LPDELCEP, means Linear Prediction DELta coefficient + CEPstra
  • IREFC, means LPREFC stored as 16bit
To know more about this, refer to Section 5.10.1 in HTKBook.

If you have a lot of wave files to transfer into mfcc files.
You can use -S option to use script file to transfer amount of files.
The content of script file is (extension is not a big deal)
============================================
.\data\train\speech\S0001.wav .\data\train\feature\S0001.mfcc
.\data\train\speech\S0002.wav .\data\train\feature\S0002.mfcc
.\data\train\speech\S0003.wav .\data\train\feature\S0003.mfcc
.\data\train\speech\S0004.wav .\data\train\feature\S0004.mfcc
.\data\train\speech\S0005.wav .\data\train\feature\S0005.mfcc
.\data\train\speech\S0006.wav .\data\train\feature\S0006.mfcc
.\data\train\speech\S0007.wav .\data\train\feature\S0007.mfcc
.\data\train\speech\S0008.wav .\data\train\feature\S0008.mfcc
.\data\train\speech\S0009.wav .\data\train\feature\S0009.mfcc
....
============================================
Command is 
------------------------------------------------------------
$HCopy  -T 1 -C config  -S scriptfile
------------------------------------------------------------
To know more about HCopy, refer to Section 17.4 in HTKBook.

1 comment:

Unknown said...

Hi,

I am a master student and I really struggling using HTK Hcopy for my project.

My problem I wan to use this tool to segment the data and then convert these segments to MFCC file

I know how to convert to mfcc format but my issu with the segmentation.

I hope you can help me

Thanks in advance,
rada

Clicky

Clicky Web Analytics