Language technology

Functions of the LiRI Language technology

The main purposes of the LiRI Information System are:

  • To channel the large volume of data produced by the LiRI data acquisition units, as well as additional language data and databases from other academic institutions, into standardized, interoperable and open access resources. There will be detailed access rules for partners as well as a minimum standard for the quality of hosted research data, digital assets, and metadata. All tools and data hosted by LiRI will be subject to the FAIR principles of Open Access. 
  • To support empirical research from data acquisition to publication.

The LiRI team provides expert support for data management, text processing, and database engineering. Experts in corpus linguistics, computational linguistics and data mining provide and develop tools for natural language processing.

Examples of available software tools:

  • Databases to store and search data collections
  • Natural Language Processing such as automatic part-of-speech tagging, syntactic parsing, semantic tagging etc.
  • Data crawling/scraping and processing of web sources, batch download of documents, conversion to XML/JSON
  • Workflow tool for the management of linguistic annotation
  • Providing access to specialized software for semiautomatic data transcription
  • Implementation and configuration of tools for automatic text analysis and classification
  • General machine learning tasks (supervised or unsupervised), Neural Network Learning etc.
  • Audio signal processing (e.g. for human speech or animal vocalisations)