Abstract
We propose a technique for improving language modeling for automated speech recognition of medical dictations by interpolating finished text (25M words) with small humangenerated literal or/and machine-generated semiliteral corpora. By building and testing interpolated (ILM) with literal (LILM), semiliteral (SILM) and partial (PILM) corpora, we show that both perplexity and recognition results improve significantly with LILM and SILM; the two yielding very close results.
Original language | English (US) |
---|---|
Title of host publication | 6th International Conference on Spoken Language Processing, ICSLP 2000 |
Publisher | International Speech Communication Association |
ISBN (Electronic) | 7801501144, 9787801501141 |
State | Published - 2000 |
Event | 6th International Conference on Spoken Language Processing, ICSLP 2000 - Beijing, China Duration: Oct 16 2000 → Oct 20 2000 |
Publication series
Name | 6th International Conference on Spoken Language Processing, ICSLP 2000 |
---|
Other
Other | 6th International Conference on Spoken Language Processing, ICSLP 2000 |
---|---|
Country/Territory | China |
City | Beijing |
Period | 10/16/00 → 10/20/00 |
Bibliographical note
Funding Information:Financial support from Academy of Finland is gratefully acknowledged (Grant Number 111692). The author would also like to thank Johnny Lindroos, Fredrick Sundell and Marketta Hiisa for their contribution to the project and their assistance in carrying out some of the experiments.