We present a method for the controlled expansion of a corpus based on “semantic unit selection”: units from a speech database are chosen not for closeness to an acoustic or phonetic target, but rather for their semantic content. While unsuited for general speech synthesis, it may be useful for restricted domains.We provide an application example from our cur-rent line of research: induction of lexical structure (i.e., acoustic, combinatorial, and semantic information) from unanalyzed recordings of informants describing small, closed-world scenarios. Here, semantic unit selection permits existing descriptions to be freely paraphrased and rearranged into new ones. The amount of redundancy can be parameterized, offering a way to control the difficulty to the task.The method is not dependent on the original scene described but can take a formal description of a new scene as input, or even enumerate all scenes describable by the data (along with descriptions).
|Title of host publication||Proceedings of the Computional Linguistics-Applications Conference|
|Editors||K. Jassem, P. Fuglewicz, M. Piasecki, A. Przepiorkowski|
|Number of pages||6|
|Place of Publication||Jachranka|
|Publisher||Polish Information Processing Society|
|Publication status||Published - 2011|
|Event||Computational Linguistics-Applications Conference - CAL'11 - Jachranka , Poland|
Duration: 17 Oct 2011 → 19 Nov 2011
|Conference||Computational Linguistics-Applications Conference - CAL'11|
|Period||17/10/2011 → 19/11/2011|
Uneson, M., & Juel Henrichsen, P. (2011). Expanding a Corpus of Closed-World Descriptions by Semantic Unit Selection. In K. Jassem, P. Fuglewicz, M. Piasecki, & A. Przepiorkowski (Eds.), Proceedings of the Computional Linguistics-Applications Conference (pp. 93-98). Polish Information Processing Society.