- HTKBooks,
- 苏统华, 哈尔滨工业大学人工智能研究室, 2006年10月30日,
- Howard Hung-Ju Chou, Intelligence Information Retrieval Lab., NCKU, Taiwan(R.O.C.).
- HTK 3.4
- Cygwin NT-5.1 1.5.25
- Step 1 - the Task Grammer 辨識模型要用的文法資料(gram->wdnet)
- Step 2 - the Dictionary 辨識模型要用的字典資料 (利用字典將wlist裡的單字翻譯為phones字串, one -> w ah n)
- Step 3 - Recording the Data 錄製辨識用的語音檔 (產生劇本並依照劇本錄製*.wav)
- Step 4 - Creating the Transcription Files 建立翻譯檔 (根據字典翻譯實驗資料)
- Step 5 - Coding the Data 將語音檔編碼 (*.wav->*.mfcc)
HCopy是HTK裡常用到的工具,除了檔案的編輯功能,還可以提供語音檔案的轉碼。
Now we have the wave files recording from HSLab with trainprompts and testprompts.
train/S0001.wav will recod the first prompt in trainprompts, "DIAL EIGHT FIVE", and so on.
Now we have to transfer S0001.wav into sequence of feature vector. Like the command,
----------------------------------------------------
$HCopy -T 1 -C config S0001.wav S0001.mfcc
----------------------------------------------------
-T is the standard option for each HTK command, you can refer Section 4.4 in HTKBook. Set it to be 1, you can trace the HCopy process.
-C is another standard option to tell program to read the configuration file.
config file is
============================================
# Coding parameters
TARGETKIND = MFCC_0_D_A // Parameter kind of target, MFCC with qualifier_0, Delta coeff., amd acceleration coeff., default is ANON.
TARGETRATE = 100000.0 // 100 ns, sample period of target in 100ns units, Section 5.2, default 0.0.
============================================
Above deault value is easily found in Section 5.18.
HTK supports FFT-based and LPC-based analysis.
So we can have different parameters for TARGETKIND.
MFCC, means Mel Frequency Cepstral Coefficients (13 attributes)
Users can set different qualifier to your data.
For MFCC, we have
- _0, means o'th cepstral parameter C_0
- _E, means with log Energy measure, related to (ENOMALIZE, SILFLOOR, ESCALE)
- _D, means appending with 1st order coefficients, delta coefficients, related to (DELTAWINDOW)
- _A, means appending with 2nd order coefficients, acceleration coefficients, related to (ACCWINDOW)
- _T, means appending with 3rd order coefficients, third differential coefficients, related to (THIRDWINDOW)
_D, _A, _T are with dependency. Use _A with _D, use_T with _D and_A. Relted to V1COMPAT, SIMPLEDIFFS.
Because _0 and _E are with the same results sometimes, we also just use one of them.
MFCC, means Mel Frequency Cepstral Coefficients
MFCC_0, means C_0 to be Energy
MFCC_E, means with Energy
MFCC_E_D, means with Energy and Delta
MFCC_E_D_Z, means with Energy, Delta, and Cepstral Mean Normalization
MFCC_E_D_A, means with Energy, Delta, and Acceleration Coefficients
MFCC_0_D, means C_0 to be Energy and Delta
MFCC_0_D_A, means C_0 to be Energy, Delta, and Acceleration Coefficients
For LPC, we have
- LPC, means Linear Prediction Coefficient
- LPREFC, means Linear Prediction REFlection Coefficeint
- LPCEPSTRA, means Linear Prediction derived CEPSTRAl coefficients
- LPDELCEP, means Linear Prediction DELta coefficient + CEPstra
- IREFC, means LPREFC stored as 16bit
To know more about this, refer to Section 5.10.1 in HTKBook.
If you have a lot of wave files to transfer into mfcc files.
You can use -S option to use script file to transfer amount of files.
The content of script file is (extension is not a big deal)
============================================
.\data\train\speech\S0001.wav .\data\train\feature\S0001.mfcc
.\data\train\speech\S0002.wav .\data\train\feature\S0002.mfcc
.\data\train\speech\S0003.wav .\data\train\feature\S0003.mfcc
.\data\train\speech\S0004.wav .\data\train\feature\S0004.mfcc
.\data\train\speech\S0005.wav .\data\train\feature\S0005.mfcc
.\data\train\speech\S0006.wav .\data\train\feature\S0006.mfcc
.\data\train\speech\S0007.wav .\data\train\feature\S0007.mfcc
.\data\train\speech\S0008.wav .\data\train\feature\S0008.mfcc
.\data\train\speech\S0009.wav .\data\train\feature\S0009.mfcc
....
============================================
Command is
------------------------------------------------------------
$HCopy -T 1 -C config -S scriptfile
------------------------------------------------------------
1 comment:
Hi,
I am a master student and I really struggling using HTK Hcopy for my project.
My problem I wan to use this tool to segment the data and then convert these segments to MFCC file
I know how to convert to mfcc format but my issu with the segmentation.
I hope you can help me
Thanks in advance,
rada
Post a Comment