Fishing in Speech Stream: Angling for a Lexicon

Peter Juel Henrichsen

    Research output: Contribution to journalConference article in journalResearchpeer-review


    We present a learning device able to deduce a set of Danish color and shape terms. Only two data sources are available to the learner: A phonetic transcription of a human informant solving a description task, and a minimal formal model of the picture being described. The system thus contains no preconceived lexical, morphological, or semantic categories. The test data are from the phonetic corpus DanPASS, a standard Danish reference corpus. The learning device, called InShape-2, is an early result of an ambitious research programme at CMOL on data-driven language learning.
    Original languageEnglish
    JournalNEALT (Northern European Association of Language Technology) Proceedings Series
    Issue number11
    Pages (from-to)90-97
    Number of pages8
    Publication statusPublished - 2011
    EventNODALIDA 2011. The 18th Nordic Conference of Computational Linguistics - Riga, Latvia
    Duration: 11 May 201113 May 2011
    Conference number: 18


    ConferenceNODALIDA 2011. The 18th Nordic Conference of Computational Linguistics
    Internet address

    Cite this