[한국어 음성인식] Kaldi zeroth-korea run.sh stage=2로그

안녕하세요. 이사작전.com의 개발자 플랫폼공작소입니다.
음성인식모듈로 유명한 Kaldi의 한국어버전 zeroth-korea의 run.sh stage=2로그입니다.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
root@yhsang2-desktop:/home/yhsang2/project/kaldi/egs/zeroth_korean/s5# ./run.sh
Re-segment transcripts: data/train_data_01/text --------------------------------------------
Loading model from 'data/local/lm/zeroth_morfessor.seg'...
Done.
No training data files specified.
Segmenting test data...
Reading corpus from '-'...
............................................Done.
 
Done.
Re-segment transcripts: data/test_data_01/text --------------------------------------------
Loading model from 'data/local/lm/zeroth_morfessor.seg'...
Done.
No training data files specified.
Segmenting test data...
Reading corpus from '-'...
Done.
 
Done.
Preparing phone lists and clustering questions
2 silence phones saved to: data/local/dict_nosp/silence_phones.txt
1 optional silence saved to: data/local/dict_nosp/optional_silence.txt
40 non-silence phones saved to: data/local/dict_nosp/nonsilence_phones.txt
3 extra triphone clustering-related questions saved to: data/local/dict_nosp/extra_questions.txt
Lexicon text file saved as: data/local/dict_nosp/lexicon.txt
utils/prepare_lang.sh data/local/dict_nosp <UNK> data/local/lang_tmp_nosp data/lang_nosp
Checking data/local/dict_nosp/silence_phones.txt ...
--> reading data/local/dict_nosp/silence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict_nosp/silence_phones.txt is OK
 
Checking data/local/dict_nosp/optional_silence.txt ...
--> reading data/local/dict_nosp/optional_silence.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict_nosp/optional_silence.txt is OK
 
Checking data/local/dict_nosp/nonsilence_phones.txt ...
--> reading data/local/dict_nosp/nonsilence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict_nosp/nonsilence_phones.txt is OK
 
Checking disjoint: silence_phones.txt, nonsilence_phones.txt
--> disjoint property is OK.
 
Checking data/local/dict_nosp/lexicon.txt
--> reading data/local/dict_nosp/lexicon.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict_nosp/lexicon.txt is OK
 
Checking data/local/dict_nosp/lexiconp.txt
--> reading data/local/dict_nosp/lexiconp.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict_nosp/lexiconp.txt is OK
 
Checking lexicon pair data/local/dict_nosp/lexicon.txt and data/local/dict_nosp/lexiconp.txt
--> lexicon pair data/local/dict_nosp/lexicon.txt and data/local/dict_nosp/lexiconp.txt match
 
Checking data/local/dict_nosp/extra_questions.txt ...
--> reading data/local/dict_nosp/extra_questions.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/local/dict_nosp/extra_questions.txt is OK
--> SUCCESS [validating dictionary directory data/local/dict_nosp]
 
fstaddselfloops data/lang_nosp/phones/wdisambig_phones.int data/lang_nosp/phones/wdisambig_words.int 
prepare_lang.sh: validating output directory
utils/validate_lang.pl data/lang_nosp
Checking existence of separator file
separator file data/lang_nosp/subword_separator.txt is empty or does not exist, deal in word case.
Checking data/lang_nosp/phones.txt ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/lang_nosp/phones.txt is OK
 
Checking words.txt: #0 ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/lang_nosp/words.txt is OK
 
Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK
 
Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> found no unexplainable phones in phones.txt
 
Checking data/lang_nosp/phones/context_indep.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 10 entry/entries in data/lang_nosp/phones/context_indep.txt
--> data/lang_nosp/phones/context_indep.int corresponds to data/lang_nosp/phones/context_indep.txt
--> data/lang_nosp/phones/context_indep.csl corresponds to data/lang_nosp/phones/context_indep.txt
--> data/lang_nosp/phones/context_indep.{txt, int, csl} are OK
 
Checking data/lang_nosp/phones/nonsilence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 184 entry/entries in data/lang_nosp/phones/nonsilence.txt
--> data/lang_nosp/phones/nonsilence.int corresponds to data/lang_nosp/phones/nonsilence.txt
--> data/lang_nosp/phones/nonsilence.csl corresponds to data/lang_nosp/phones/nonsilence.txt
--> data/lang_nosp/phones/nonsilence.{txt, int, csl} are OK
 
Checking data/lang_nosp/phones/silence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 10 entry/entries in data/lang_nosp/phones/silence.txt
--> data/lang_nosp/phones/silence.int corresponds to data/lang_nosp/phones/silence.txt
--> data/lang_nosp/phones/silence.csl corresponds to data/lang_nosp/phones/silence.txt
--> data/lang_nosp/phones/silence.{txt, int, csl} are OK
 
Checking data/lang_nosp/phones/optional_silence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in data/lang_nosp/phones/optional_silence.txt
--> data/lang_nosp/phones/optional_silence.int corresponds to data/lang_nosp/phones/optional_silence.txt
--> data/lang_nosp/phones/optional_silence.csl corresponds to data/lang_nosp/phones/optional_silence.txt
--> data/lang_nosp/phones/optional_silence.{txt, int, csl} are OK
 
Checking data/lang_nosp/phones/disambig.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 22 entry/entries in data/lang_nosp/phones/disambig.txt
--> data/lang_nosp/phones/disambig.int corresponds to data/lang_nosp/phones/disambig.txt
--> data/lang_nosp/phones/disambig.csl corresponds to data/lang_nosp/phones/disambig.txt
--> data/lang_nosp/phones/disambig.{txt, int, csl} are OK
 
Checking data/lang_nosp/phones/roots.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 42 entry/entries in data/lang_nosp/phones/roots.txt
--> data/lang_nosp/phones/roots.int corresponds to data/lang_nosp/phones/roots.txt
--> data/lang_nosp/phones/roots.{txt, int} are OK
 
Checking data/lang_nosp/phones/sets.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 42 entry/entries in data/lang_nosp/phones/sets.txt
--> data/lang_nosp/phones/sets.int corresponds to data/lang_nosp/phones/sets.txt
--> data/lang_nosp/phones/sets.{txt, int} are OK
 
Checking data/lang_nosp/phones/extra_questions.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 12 entry/entries in data/lang_nosp/phones/extra_questions.txt
--> data/lang_nosp/phones/extra_questions.int corresponds to data/lang_nosp/phones/extra_questions.txt
--> data/lang_nosp/phones/extra_questions.{txt, int} are OK
 
Checking data/lang_nosp/phones/word_boundary.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 194 entry/entries in data/lang_nosp/phones/word_boundary.txt
--> data/lang_nosp/phones/word_boundary.int corresponds to data/lang_nosp/phones/word_boundary.txt
--> data/lang_nosp/phones/word_boundary.{txt, int} are OK
 
Checking optional_silence.txt ...
--> reading data/lang_nosp/phones/optional_silence.txt
--> data/lang_nosp/phones/optional_silence.txt is OK
 
Checking disambiguation symbols: #0 and #1
--> data/lang_nosp/phones/disambig.txt has "#0" and "#1"
--> data/lang_nosp/phones/disambig.txt is OK
 
Checking topo ...
 
Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
--> data/lang_nosp/phones/word_boundary.txt doesn't include disambiguation symbols
--> data/lang_nosp/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
--> data/lang_nosp/phones/word_boundary.txt is OK
 
Checking word-level disambiguation symbols...
--> data/lang_nosp/phones/wdisambig.txt exists (newer prepare_lang.sh)
Checking word_boundary.int and disambig.int
--> generating a 84 word/subword sequence
--> resulting phone sequence from L.fst corresponds to the word sequence
--> L.fst is OK
--> generating a 64 word/subword sequence
--> resulting phone sequence from L_disambig.fst corresponds to the word sequence
--> L_disambig.fst is OK
 
Checking data/lang_nosp/oov.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in data/lang_nosp/oov.txt
--> data/lang_nosp/oov.int corresponds to data/lang_nosp/oov.txt
--> data/lang_nosp/oov.{txt, int} are OK
 
--> data/lang_nosp/L.fst is olabel sorted
--> data/lang_nosp/L_disambig.fst is olabel sorted
--> SUCCESS [validating lang directory data/lang_nosp]
arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang_nosp_test_tgsmall/words.txt - data/lang_nosp_test_tgsmall/G.fst 
LOG (arpa2fst[5.5.380~1-0552e]:Read():arpa-file-parser.cc:94) Reading \data\ section.
LOG (arpa2fst[5.5.380~1-0552e]:Read():arpa-file-parser.cc:149) Reading \1-grams: section.
LOG (arpa2fst[5.5.380~1-0552e]:Read():arpa-file-parser.cc:149) Reading \2-grams: section.
LOG (arpa2fst[5.5.380~1-0552e]:Read():arpa-file-parser.cc:149) Reading \3-grams: section.
LOG (arpa2fst[5.5.380~1-0552e]:RemoveRedundantStates():arpa-lm-compiler.cc:359) Reduced num-states from 2408875 to 432520
utils/validate_lang.pl data/lang_nosp_test_tgsmall
Checking existence of separator file
separator file data/lang_nosp_test_tgsmall/subword_separator.txt is empty or does not exist, deal in word case.
Checking data/lang_nosp_test_tgsmall/phones.txt ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/lang_nosp_test_tgsmall/phones.txt is OK
 
Checking words.txt: #0 ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> data/lang_nosp_test_tgsmall/words.txt is OK
 
Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK
 
Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> found no unexplainable phones in phones.txt
 
Checking data/lang_nosp_test_tgsmall/phones/context_indep.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 10 entry/entries in data/lang_nosp_test_tgsmall/phones/context_indep.txt
--> data/lang_nosp_test_tgsmall/phones/context_indep.int corresponds to data/lang_nosp_test_tgsmall/phones/context_indep.txt
--> data/lang_nosp_test_tgsmall/phones/context_indep.csl corresponds to data/lang_nosp_test_tgsmall/phones/context_indep.txt
--> data/lang_nosp_test_tgsmall/phones/context_indep.{txt, int, csl} are OK
 
Checking data/lang_nosp_test_tgsmall/phones/nonsilence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 184 entry/entries in data/lang_nosp_test_tgsmall/phones/nonsilence.txt
--> data/lang_nosp_test_tgsmall/phones/nonsilence.int corresponds to data/lang_nosp_test_tgsmall/phones/nonsilence.txt
--> data/lang_nosp_test_tgsmall/phones/nonsilence.csl corresponds to data/lang_nosp_test_tgsmall/phones/nonsilence.txt
--> data/lang_nosp_test_tgsmall/phones/nonsilence.{txt, int, csl} are OK
 
Checking data/lang_nosp_test_tgsmall/phones/silence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 10 entry/entries in data/lang_nosp_test_tgsmall/phones/silence.txt
--> data/lang_nosp_test_tgsmall/phones/silence.int corresponds to data/lang_nosp_test_tgsmall/phones/silence.txt
--> data/lang_nosp_test_tgsmall/phones/silence.csl corresponds to data/lang_nosp_test_tgsmall/phones/silence.txt
--> data/lang_nosp_test_tgsmall/phones/silence.{txt, int, csl} are OK
 
Checking data/lang_nosp_test_tgsmall/phones/optional_silence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in data/lang_nosp_test_tgsmall/phones/optional_silence.txt
--> data/lang_nosp_test_tgsmall/phones/optional_silence.int corresponds to data/lang_nosp_test_tgsmall/phones/optional_silence.txt
--> data/lang_nosp_test_tgsmall/phones/optional_silence.csl corresponds to data/lang_nosp_test_tgsmall/phones/optional_silence.txt
--> data/lang_nosp_test_tgsmall/phones/optional_silence.{txt, int, csl} are OK
 
Checking data/lang_nosp_test_tgsmall/phones/disambig.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 22 entry/entries in data/lang_nosp_test_tgsmall/phones/disambig.txt
--> data/lang_nosp_test_tgsmall/phones/disambig.int corresponds to data/lang_nosp_test_tgsmall/phones/disambig.txt
--> data/lang_nosp_test_tgsmall/phones/disambig.csl corresponds to data/lang_nosp_test_tgsmall/phones/disambig.txt
--> data/lang_nosp_test_tgsmall/phones/disambig.{txt, int, csl} are OK
 
Checking data/lang_nosp_test_tgsmall/phones/roots.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 42 entry/entries in data/lang_nosp_test_tgsmall/phones/roots.txt
--> data/lang_nosp_test_tgsmall/phones/roots.int corresponds to data/lang_nosp_test_tgsmall/phones/roots.txt
--> data/lang_nosp_test_tgsmall/phones/roots.{txt, int} are OK
 
Checking data/lang_nosp_test_tgsmall/phones/sets.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 42 entry/entries in data/lang_nosp_test_tgsmall/phones/sets.txt
--> data/lang_nosp_test_tgsmall/phones/sets.int corresponds to data/lang_nosp_test_tgsmall/phones/sets.txt
--> data/lang_nosp_test_tgsmall/phones/sets.{txt, int} are OK
 
Checking data/lang_nosp_test_tgsmall/phones/extra_questions.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 12 entry/entries in data/lang_nosp_test_tgsmall/phones/extra_questions.txt
--> data/lang_nosp_test_tgsmall/phones/extra_questions.int corresponds to data/lang_nosp_test_tgsmall/phones/extra_questions.txt
--> data/lang_nosp_test_tgsmall/phones/extra_questions.{txt, int} are OK
 
Checking data/lang_nosp_test_tgsmall/phones/word_boundary.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 194 entry/entries in data/lang_nosp_test_tgsmall/phones/word_boundary.txt
--> data/lang_nosp_test_tgsmall/phones/word_boundary.int corresponds to data/lang_nosp_test_tgsmall/phones/word_boundary.txt
--> data/lang_nosp_test_tgsmall/phones/word_boundary.{txt, int} are OK
 
Checking optional_silence.txt ...
--> reading data/lang_nosp_test_tgsmall/phones/optional_silence.txt
--> data/lang_nosp_test_tgsmall/phones/optional_silence.txt is OK
 
Checking disambiguation symbols: #0 and #1
--> data/lang_nosp_test_tgsmall/phones/disambig.txt has "#0" and "#1"
--> data/lang_nosp_test_tgsmall/phones/disambig.txt is OK
 
Checking topo ...
 
Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
--> data/lang_nosp_test_tgsmall/phones/word_boundary.txt doesn't include disambiguation symbols
--> data/lang_nosp_test_tgsmall/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
--> data/lang_nosp_test_tgsmall/phones/word_boundary.txt is OK
 
Checking word-level disambiguation symbols...
--> data/lang_nosp_test_tgsmall/phones/wdisambig.txt exists (newer prepare_lang.sh)
Checking word_boundary.int and disambig.int
--> generating a 1 word/subword sequence
--> resulting phone sequence from L.fst corresponds to the word sequence
--> L.fst is OK
--> generating a 17 word/subword sequence
--> resulting phone sequence from L_disambig.fst corresponds to the word sequence
--> L_disambig.fst is OK
 
Checking data/lang_nosp_test_tgsmall/oov.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in data/lang_nosp_test_tgsmall/oov.txt
--> data/lang_nosp_test_tgsmall/oov.int corresponds to data/lang_nosp_test_tgsmall/oov.txt
--> data/lang_nosp_test_tgsmall/oov.{txt, int} are OK
 
--> data/lang_nosp_test_tgsmall/L.fst is olabel sorted
--> data/lang_nosp_test_tgsmall/L_disambig.fst is olabel sorted
--> data/lang_nosp_test_tgsmall/G.fst is ilabel sorted
--> data/lang_nosp_test_tgsmall/G.fst has 432520 states
--> utils/lang/check_g_properties.pl successfully validated data/lang_nosp_test_tgsmall/G.fst
--> utils/lang/check_g_properties.pl succeeded.
--> SUCCESS [validating lang directory data/lang_nosp_test_tgsmall]
Succeeded in formatting data.
arpa-to-const-arpa --bos-symbol=503350 --eos-symbol=503351 --unk-symbol=11 'gunzip -c data/local/lm/zeroth.lm.tg.arpa.gz | utils/map_arpa_lm.pl data/lang_nosp_test_tglarge/words.txt|' data/lang_nosp_test_tglarge/G.carpa 
LOG (arpa-to-const-arpa[5.5.380~1-0552e]:BuildConstArpaLm():const-arpa-lm.cc:1078) Reading gunzip -c data/local/lm/zeroth.lm.tg.arpa.gz | utils/map_arpa_lm.pl data/lang_nosp_test_tglarge/words.txt|
utils/map_arpa_lm.pl: Processing "\data\"
utils/map_arpa_lm.pl: Processing "\1-grams:\"
LOG (arpa-to-const-arpa[5.5.380~1-0552e]:Read():arpa-file-parser.cc:94) Reading \data\ section.
LOG (arpa-to-const-arpa[5.5.380~1-0552e]:Read():arpa-file-parser.cc:149) Reading \1-grams: section.
utils/map_arpa_lm.pl: Processing "\2-grams:\"
LOG (arpa-to-const-arpa[5.5.380~1-0552e]:Read():arpa-file-parser.cc:149) Reading \2-grams: section.
utils/build_const_arpa_lm.sh: line 47: 12546 Killed                  arpa-to-const-arpa --bos-symbol=$bos --eos-symbol=$eos --unk-symbol=$unk "gunzip -c $arpa_lm | utils/map_arpa_lm.pl $new_lang/words.txt|" $new_lang/G.carpa
cs
감사합니다.
저작자표시 비영리 동일조건
'AI LAB > 🐸 STT' 카테고리의 다른 글

[한국어 음성인식] Kaldi zeroth-korea run.sh stage=1 로그 (1)	2019.07.16
[한국어 음성인식] Kaldi zeroth-korea run.sh stage=3로그 (0)	2019.07.16
[가장쉬운] Ubuntu 18.04에 CUDA 툴킷을 설치하는 방법 (0)	2019.06.16
[가장 쉬운] Ubuntu 18.04에 NVIDIA Driver 설치하는 방법 (0)	2019.06.16
[solved] No module named numpy with ubuntu 18.04 (0)	2019.06.14
Contents
새소식

인기 검색어

[한국어 음성인식] Kaldi zeroth-korea run.sh stage=2로그

'AI LAB > 🐸 STT' 카테고리의 다른 글

당신이 좋아할만한 콘텐츠

티스토리툴바