Damon's notes: How to make corpus for HTS training ?

錄音腳本(須包含到 105 Phoneme) 音素統計程式?

安裝CoolEdite軟體作為錄製語者音訊與切音
(手動切音)使用CoolEdite手動設定字元時間資訊，用滑鼠在語音訊號上拉出字元訊號範圍，此時呈現反白(點擊右鍵選取Add to Cue List)，在View工具欄可開啟Cue List選項。CoolEdite會自動將每個字元的時間長度資訊嵌入wav檔內。它會將Cue List的資訊嵌入在wav的未端，cue為list資訊的起始點。(使用UltraEdit 17.X 工具開啟:它可以將資料轉成十六進位檢視)

由於HTS訓練語料須是raw檔，因此須將錄製剪裁好的wav檔的前44byte的音檔資訊給去除掉，詳細內容請參考 https://ccrma.stanford.edu/courses/422/projects/WaveFormat/
以及cue以後的字元時間資訊也刪除!
作法如下:
1.須先製作bat檔
安裝好Microsoft Visual Studio 2008，開啟Wav2Raw專案，可直接點選Wav2Raw.sln便會自動開啟匯入(如果只有單一文件(wav2raw.c)，可先建立一空專案，然後加入wav2raw.c進行compile後便會產生wav2raw.bat)
fprintf(fp,"sox -c 1 -s -2 -t wav -r 16000 CMICSD_athena_sen000%d.wav -c 1 -s -2 -t raw -r 16000 CMICSD_athena_sen000%d.raw\n",match[i][0],match[i][0]);
16000 is Sample Rate.
CMICSD_athena_sen000%d.wav is your wav files name.
CMICSD_athena_sen000%d.raw is your raw files name.
num=1; is number of your corpus.

---------------- wav2raw.c --------------------------

#include <stdio.h>
#include <stdlib.h>

int main(){
FILE *fp;

int i;
int num;
int match[50][2]={
{6,1},
{7,1},
{8,1},
{34,2},
{49,2},
{51,2},
{63,2},
{85,2},
{112,3},
{121,3},
{257,3},
{306,3},
{344,3},
{350,3},
{371,3},
{493,3},
{613,3},
{624,3},
{674,3},
{728,3},
{745,3},
{850,3},
{940,3},
{944,3},
{974,3},
{984,3},
{1048,4},
{1049,4},
{1072,4},
{1309,4},
{1418,4},
{1432,4},
{1540,4},
{1688,4},
{1751,4},
{1815,4},
{1893,4},
{1968,4},
{2025,4},
{2135,4},
{2150,4},
{2179,4},
{2336,4},
{2355,4},
{2377,4},
{2422,4},
{2449,4},
{2474,4},
{2638,4},
{2858,4}
};

fp = fopen("wav2raw.bat","w");

num=1;
for(i=0;i<num;i++){
switch(match[i][1]){
case 1:{
fprintf(fp,"sox -c 1 -s -2 -t wav -r 16000 CMICSD_athena_sen000%d.wav -c 1 -s -2 -t raw -r 16000 CMICSD_athena_sen000%d.raw\n",match[i][0],match[i][0]);
break;
}
case 2:{
fprintf(fp,"sox -c 1 -s -2 -t wav -r 16000 CMICSD_athena_sen00%d.wav -c 1 -s -2 -t raw -r 16000 CMICSD_athena_sen00%d.raw\n",match[i][0],match[i][0]);
break;
}
case 3:{
fprintf(fp,"sox -c 1 -s -2 -t wav -r 16000 CMICSD_athena_sen0%d.wav -c 1 -s -2 -t raw -r 16000 CMICSD_athena_sen0%d.raw\n",match[i][0],match[i][0]);
break;
}
case 4:{
fprintf(fp,"sox -c 1 -s -2 -t wav -r 16000 CMICSD_athena_sen%d.wav -c 1 -s -2 -t raw -r 16000 CMICSD_athena_sen%d.raw\n",match[i][0],match[i][0]);
break;
}
default:{
break;
}
}

}

fclose(fp);
system("pause");
return 0;
}
-------------------------------------------------------------------------------
2.下載安裝sox
http://sox.sourceforge.net/
須在變數環境>>編輯>>Path中加入;C:\sox-14-3-2\
開啟cmd ，輸入sox 測試是否OK!
將wav2raw.bat與語料(wav檔)放在相同資料夾下，點擊wav2raw.bat便會自動跳出黑窗產生raw檔於資料夾中!

genLabelFromCueWaveWithPhonemePercentage

說明：將切音之wav檔轉為mono & full-context labels，作為HTS訓練用之資料

使用方法
	檔案	描述
主要輸入	input.txt cue_wave資料夾	語料切音後的wav檔
	text.txt	語料切音後的wav檔之語意的文字
	input.txt	configure檔，可由gen input.txt資料夾內程式產生
次要輸入	consonant_mean.txt	TH-CoSS語料之initial比例統計結果?
	lexicon_toneModify_v6.tbl	文字分析器所需之字典檔
	punctuation.tbl	文字分析器所需之符號檔
輸出	mono資料夾(要先建立)	輸出之mono label
輸出	full資料夾(要先建立)	輸出之full-context label

提醒：為HTS訓練用之資料，由於是在windows下產生，必須將產生出來mono & full labels，
裡面得空白格式轉換成unix。
法一(在windows下轉換):開啟notepad++ 編輯>格式轉換>unix格式
法二(在ubuntu下轉換):使用flip:將windows換行符號0D0A改為unix換行0A >>flip -u *.lab

並將wav檔使用sox工具：http://sox.sourceforge.net/，轉為raw檔

合成階段

C_TTS_1.06 project

說明：新版文字分析器(Cloud版)+新版model+hts_engine_1.06之語音合成程式

-----文字分析器----------------

analysis.c .h
component.c
graph.c
label.c
partOfSpeechBigram.h
syllable.h
wordHashTable.h

-----------------------------------------

Damon's notes

2012年3月1日星期四

How to make corpus for HTS training ?

沒有留言:

張貼留言

2012年3月1日 星期四

How to make corpus for HTS training ?

沒有留言:

張貼留言

2012年3月1日星期四