Upgrading the linguistic ORD-ecosystem UpLORD

Description
Principal Investigators
Main outcomes
Dissemination

Description

UpLORD is a swissuniversities ORD-funded 2-year project (2023-2024) hosted by the University of Zurich, with the support of the Zurich University Library and the CLARIN-CH Consortium. Since 2018, a consortium of partners has been working on building a national ecosystem of infrastructures, which covers the whole linguistic data lifecycle according to ORD requirements (FAIR principles: Findable, Accessible, Interoperable, Reusable) from data generating, processing and analyzing to data sharing and archiving. This ecosystem includes the national technology platform LiRI and the national repository for publishing and archiving linguistic data (SWISSUbase) as service providers, a database of Swiss media texts and a platform for hosting of and searching in large text and audio/video corpora.

The project focuses on upgrading workflows and interoperability of existing infrastructure services, establishing working groups on the national level, documenting and promoting best practices, raising awareness and training about ORD practices in the context of teaching, research and publishing, and building a robust practice of data curation. In the long-term, this project will significantly contribute to a strong foundation for a sustainable ORD strategy for linguistic data in Switzerland.

Here (PDF, 172 KB) you can find details about the Steering Committee and the governance of the project.

Principal Investigators

Prof. Dr. Noah Bubenhofer (LiRI)

Dr. Andrea Malits (Universitätsbibliothek Zürich)

Dr. Cristina Grisot (CLARIN-CH)

Project coordinators: Dr. Letizia Volpin, Dr. Joanna Blochowiak

Main outcomes

In the context of the requirements of Open Science and of FAIR principles, on the one hand, and of that of more challenging data sets (such as, sensitive or with copyright issues), on the other hand, we identified several gaps regarding the current situation in Switzerland that are going to be addressed thanks to ORD project. Below you can find a panorama of the projects' main outcomes :

Technical implementations:

Software development: the LiRI Corpus Platform with three interfaces for text, audio and video data
Building of data convertors: read documentation and access code
Creation of complex annotation representations: read documentation
Development of a corpus language query, the DQD, to allow for complex and powerful queries in text, audio and video data
Formulation of a concept for registering complex metadata for specific types of language data, such as sign language and interactional linguistics data: reach out to LaRS team
Creation of the Swiss CMDI profile useful for metadata interoperability with CLARIN
Implementation of the CLARIN federated authentication system
Implementation of the harvesting of the SWISSUbase repository by the CLARIN VLO
Construction of API to push data from LiRI to SWISSUbase : read documentation

Community engagement, standardization and workflows:

Personal support to Swiss corpus owners to increase the FAIR-compliance of their corpora: read the CLARIN-CH recommendations for data sharing
Implementation of a national working group to formulate and disseminate good practices for the management of sensitive data, copyright and intellectual property issues for language data: CLARIN-CH WG Management of Sensitive and Personal data, Ethical and Legal issues for linguistic data
Implementation of a national working group to collect metadata for Swiss learner corpora and formulate recommendations for building open learner corpora: CLARIN-CH WG
Formulation of a concept about registering corpus platforms and other services, such as the LCP and the Swiss-AL in SWISSUbase, i.e. metadata publication without data
The creation of a survey to collect recommended/standard data formats and the integration of the Swiss results into the CLARIN Standard Information System

Training and documentation:

Formulation of data curation and version control workflows: read documentation
Organisation of webinars about the management of sensitive data, copyright and intellectual property issues to inform the scientific community
Organisation of a series of online training sessions to inform and form the scientific community about how to use the Swiss FAIR-compliant ecosystem of infrastructures 2.0 to enhance leur research.
Publication in the CLARIN-CH Zenodo community of open educational resources based on the webinars series and training series
Creation of the CLARIN-CH Documentation Platform about management of open research data

Certifications

Acquisition of the CoreTrustSeal certification for the Language Repository of Switzerland LaRS
Submission of the application of the Linguistic Research Infrastructure LiRI and the Language Repository of Switzerland LaRS for certification as CLARIN B-center (results expects early 2026)

Dissemination

LCP workshop at SwissText 2025: Bring your own data!
Grisot C., Craevschi A., Futter C., Vukovic T., Zehr J., Krasselt J., Dreesen P. (2025) The Swiss FAIR-compliant ecosystem of infrastructures 2.0. Extended abstract accepted for CLARIN Annual Conference.
Grisot C., Craevschi A. (2025) CLARIN-CH: supporting Open Research Data management. Poster at Swiss NLP Expo.
Grisot C. (2024) Presentation of CLARIN-CH ecosystem in the Tour de CLARIN
Blochowiak J., Grisot C. (2025) Building up the CLARIN-CH Training Programme. Extended abstract accepted for CLARIN Annual Conference.
Bubenhofer, N., Malits, A., Strebel, S., Gräen, J., Buerli, S., & Grisot, C. (2023, December). Building and consolidating a FAIR-compliant ecosystem of infrastructures. In CLARIN Annual Conference Proceedings (p. 95-99).
Schaber, J., Graën, J., McDonald, D., Mustac, I., Rajovic, N., Schneider, G., ... & Kontino, T. (2023, October). The LiRI Corpus Platform. In CLARIN annual conference proceedings (pp. 145-149).
Schaber, J., Graën, J., Mustač, I., Rajović, N., Schneider, G., Zehr, J., & Bubenhofer, N. Swissdox@ LiRI–a large database of media articles made accessible to researchers. CLARIN annual conference proceedings (pp. 111-115).
Poster at Open Access Week 2023 at UZH
Presentation at 2023 SWISSUbase Annual event at UZH (November 2023)
Presentation of UpLORD project (PDF, 1 MB) at 2024 CLARIN-CH Day at University of Neuchâtel (September 9, 2024)
Swissuniversities P5 Open Science closing event (November 18, 2024)

Additional Information

Common Language Resources and Technology Infrastructure

More about Common Language Resources and Technology Infrastructure

CLARIN is a pan-European research infrastructure aiming to render accessible all digital language resources and tools from all over Europe through a single sign-on online environment. Swiss academic institutions founded the CLARIN-CH consortium in 2020.

SWISSUbase is a national repository that facilitates access to research data and projects across different disciplines and provides Swiss research institutions with a reliable data infrastructure.

Are you a member of the Swiss scientific community working with language resources and you feel concerned about the topics addressed in this project?

Would you like to get involved?

Please drop an email to Cristina Grisot.

Quicklinks

Main navigation