Abstract
We present a method for the controlled expansion of a corpus based on “semantic unit selection”: units from a speech database are chosen not for closeness to an acoustic or phonetic target, but rather for their semantic content. While unsuited for general speech synthesis, it may be useful for restricted domains.We provide an application example from our cur-rent line of research: induction of lexical structure (i.e., acoustic, combinatorial, and semantic information) from unanalyzed recordings of informants describing small, closed-world scenarios. Here, semantic unit selection permits existing descriptions to be freely paraphrased and rearranged into new ones. The amount of redundancy can be parameterized, offering a way to control the difficulty to the task.The method is not dependent on the original scene described but can take a formal description of a new scene as input, or even enumerate all scenes describable by the data (along with descriptions).
Original language | English |
---|---|
Title of host publication | Proceedings of the Computional Linguistics-Applications Conference |
Editors | K. Jassem, P. Fuglewicz, M. Piasecki, A. Przepiorkowski |
Number of pages | 6 |
Place of Publication | Jachranka |
Publisher | Polish Information Processing Society |
Publication date | 2011 |
Pages | 93-98 |
ISBN (Electronic) | 9788360810477 |
Publication status | Published - 2011 |
Event | Computational Linguistics-Applications Conference - CAL'11 - Jachranka , Poland Duration: 17 Oct 2011 → 19 Nov 2011 http://www.cla-conf.org/ |
Conference
Conference | Computational Linguistics-Applications Conference - CAL'11 |
---|---|
Country/Territory | Poland |
City | Jachranka |
Period | 17/10/2011 → 19/11/2011 |
Internet address |