Expanding a Corpus of Closed-World Descriptions by Semantic Unit Selection

Marcus Uneson, Peter Juel Henrichsen

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    Abstract

    We present a method for the controlled expansion of a corpus based on “semantic unit selection”: units from a speech database are chosen not for closeness to an acoustic or phonetic target, but rather for their semantic content. While unsuited for general speech synthesis, it may be useful for restricted domains.We provide an application example from our cur-rent line of research: induction of lexical structure (i.e., acoustic, combinatorial, and semantic information) from unanalyzed recordings of informants describing small, closed-world scenarios. Here, semantic unit selection permits existing descriptions to be freely paraphrased and rearranged into new ones. The amount of redundancy can be parameterized, offering a way to control the difficulty to the task.The method is not dependent on the original scene described but can take a formal description of a new scene as input, or even enumerate all scenes describable by the data (along with descriptions).
    Original languageEnglish
    Title of host publicationProceedings of the Computional Linguistics-Applications Conference
    EditorsK. Jassem, P. Fuglewicz, M. Piasecki, A. Przepiorkowski
    Number of pages6
    Place of PublicationJachranka
    PublisherPolish Information Processing Society
    Publication date2011
    Pages93-98
    ISBN (Electronic)9788360810477
    Publication statusPublished - 2011
    EventComputational Linguistics-Applications Conference - CAL'11 - Jachranka , Poland
    Duration: 17 Oct 201119 Nov 2011
    http://www.cla-conf.org/

    Conference

    ConferenceComputational Linguistics-Applications Conference - CAL'11
    CountryPoland
    CityJachranka
    Period17/10/201119/11/2011
    Internet address

    Cite this

    Uneson, M., & Juel Henrichsen, P. (2011). Expanding a Corpus of Closed-World Descriptions by Semantic Unit Selection. In K. Jassem, P. Fuglewicz, M. Piasecki, & A. Przepiorkowski (Eds.), Proceedings of the Computional Linguistics-Applications Conference (pp. 93-98). Jachranka: Polish Information Processing Society.
    Uneson, Marcus ; Juel Henrichsen, Peter. / Expanding a Corpus of Closed-World Descriptions by Semantic Unit Selection. Proceedings of the Computional Linguistics-Applications Conference. editor / K. Jassem ; P. Fuglewicz ; M. Piasecki ; A. Przepiorkowski. Jachranka : Polish Information Processing Society, 2011. pp. 93-98
    @inproceedings{88ed1b6539fe44c39217090c471b3442,
    title = "Expanding a Corpus of Closed-World Descriptions by Semantic Unit Selection",
    abstract = "We present a method for the controlled expansion of a corpus based on “semantic unit selection”: units from a speech database are chosen not for closeness to an acoustic or phonetic target, but rather for their semantic content. While unsuited for general speech synthesis, it may be useful for restricted domains.We provide an application example from our cur-rent line of research: induction of lexical structure (i.e., acoustic, combinatorial, and semantic information) from unanalyzed recordings of informants describing small, closed-world scenarios. Here, semantic unit selection permits existing descriptions to be freely paraphrased and rearranged into new ones. The amount of redundancy can be parameterized, offering a way to control the difficulty to the task.The method is not dependent on the original scene described but can take a formal description of a new scene as input, or even enumerate all scenes describable by the data (along with descriptions).",
    author = "Marcus Uneson and {Juel Henrichsen}, Peter",
    year = "2011",
    language = "English",
    pages = "93--98",
    editor = "K. Jassem and P. Fuglewicz and M. Piasecki and A. Przepiorkowski",
    booktitle = "Proceedings of the Computional Linguistics-Applications Conference",
    publisher = "Polish Information Processing Society",
    address = "Poland",

    }

    Uneson, M & Juel Henrichsen, P 2011, Expanding a Corpus of Closed-World Descriptions by Semantic Unit Selection. in K Jassem, P Fuglewicz, M Piasecki & A Przepiorkowski (eds), Proceedings of the Computional Linguistics-Applications Conference. Polish Information Processing Society, Jachranka, pp. 93-98, Jachranka , Poland, 17/10/2011.

    Expanding a Corpus of Closed-World Descriptions by Semantic Unit Selection. / Uneson, Marcus; Juel Henrichsen, Peter.

    Proceedings of the Computional Linguistics-Applications Conference. ed. / K. Jassem; P. Fuglewicz; M. Piasecki; A. Przepiorkowski. Jachranka : Polish Information Processing Society, 2011. p. 93-98.

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    TY - GEN

    T1 - Expanding a Corpus of Closed-World Descriptions by Semantic Unit Selection

    AU - Uneson, Marcus

    AU - Juel Henrichsen, Peter

    PY - 2011

    Y1 - 2011

    N2 - We present a method for the controlled expansion of a corpus based on “semantic unit selection”: units from a speech database are chosen not for closeness to an acoustic or phonetic target, but rather for their semantic content. While unsuited for general speech synthesis, it may be useful for restricted domains.We provide an application example from our cur-rent line of research: induction of lexical structure (i.e., acoustic, combinatorial, and semantic information) from unanalyzed recordings of informants describing small, closed-world scenarios. Here, semantic unit selection permits existing descriptions to be freely paraphrased and rearranged into new ones. The amount of redundancy can be parameterized, offering a way to control the difficulty to the task.The method is not dependent on the original scene described but can take a formal description of a new scene as input, or even enumerate all scenes describable by the data (along with descriptions).

    AB - We present a method for the controlled expansion of a corpus based on “semantic unit selection”: units from a speech database are chosen not for closeness to an acoustic or phonetic target, but rather for their semantic content. While unsuited for general speech synthesis, it may be useful for restricted domains.We provide an application example from our cur-rent line of research: induction of lexical structure (i.e., acoustic, combinatorial, and semantic information) from unanalyzed recordings of informants describing small, closed-world scenarios. Here, semantic unit selection permits existing descriptions to be freely paraphrased and rearranged into new ones. The amount of redundancy can be parameterized, offering a way to control the difficulty to the task.The method is not dependent on the original scene described but can take a formal description of a new scene as input, or even enumerate all scenes describable by the data (along with descriptions).

    M3 - Article in proceedings

    SP - 93

    EP - 98

    BT - Proceedings of the Computional Linguistics-Applications Conference

    A2 - Jassem, K.

    A2 - Fuglewicz, P.

    A2 - Piasecki, M.

    A2 - Przepiorkowski, A.

    PB - Polish Information Processing Society

    CY - Jachranka

    ER -

    Uneson M, Juel Henrichsen P. Expanding a Corpus of Closed-World Descriptions by Semantic Unit Selection. In Jassem K, Fuglewicz P, Piasecki M, Przepiorkowski A, editors, Proceedings of the Computional Linguistics-Applications Conference. Jachranka: Polish Information Processing Society. 2011. p. 93-98