Access to digital archives via metadata and lemmatization

Facts

Run time
01/2013  – 08/2015
Sponsors

Federal Ministry of Research, Technology and Space

Description

The project will develop a tool for improved access to selected digital historical text archives. The tool will support queries for lemmata (head words) and allow for the flexible composition of a corpus based on metadata. The mechanisms and resources necessary for this (databases, lexica, morphological analyzers) will be made accessible via web services. The project is geared towards Polish, which represents a rather difficult test case due to its rich morphology and orthographic variation. At the same time, this should make the methods transferable to other languages. Cooperation with the CLARIN-D centers in Saarbrücken, Tübingen, Nijmegen, Berlin and Leipzig is instrumental for the realization of this project.