Expanding a Corpus of Closed-World Descriptions by Semantic Unit Selection

Marcus Uneson, Peter Juel Henrichsen

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    12 Downloads (Pure)


    We present a method for the controlled expansion of a corpus based on “semantic unit selection”: units from a speech database are chosen not for closeness to an acoustic or phonetic target, but rather for their semantic content. While unsuited for general speech synthesis, it may be useful for restricted domains.We provide an application example from our cur-rent line of research: induction of lexical structure (i.e., acoustic, combinatorial, and semantic information) from unanalyzed recordings of informants describing small, closed-world scenarios. Here, semantic unit selection permits existing descriptions to be freely paraphrased and rearranged into new ones. The amount of redundancy can be parameterized, offering a way to control the difficulty to the task.The method is not dependent on the original scene described but can take a formal description of a new scene as input, or even enumerate all scenes describable by the data (along with descriptions).
    Original languageEnglish
    Title of host publicationProceedings of the Computional Linguistics-Applications Conference
    EditorsK. Jassem, P. Fuglewicz, M. Piasecki, A. Przepiorkowski
    Number of pages6
    Place of PublicationJachranka
    PublisherPolish Information Processing Society
    Publication date2011
    ISBN (Electronic)9788360810477
    Publication statusPublished - 2011
    EventComputational Linguistics-Applications Conference - CAL'11 - Jachranka , Poland
    Duration: 17 Oct 201119 Nov 2011


    ConferenceComputational Linguistics-Applications Conference - CAL'11
    Internet address

    Cite this