Upgrading the linguistic ORD-ecosystem UpLORD

Table of contents
Description
UpLORD is a swissuniversities ORD-funded 2-year project (2023-2024) hosted by the University of Zurich, with the support of the Zurich University Library and the CLARIN-CH Consortium. Since 2018, a consortium of partners has been working on building a national ecosystem of infrastructures, which covers the whole linguistic data lifecycle according to ORD requirements (FAIR principles: Findable, Accessible, Interoperable, Reusable) from data generating, processing and analyzing to data sharing and archiving. This ecosystem includes the national technology platform LiRI and the national repository for publishing and archiving linguistic data (SWISSUbase) as service providers, a database of Swiss media texts and a platform for hosting of and searching in large text and audio/video corpora.
The project focuses on upgrading workflows and interoperability of existing infrastructure services, establishing working groups on the national level, documenting and promoting best practices, raising awareness and training about ORD practices in the context of teaching, research and publishing, and building a robust practice of data curation. In the long-term, this project will significantly contribute to a strong foundation for a sustainable ORD strategy for linguistic data in Switzerland.
Here (PDF, 172 KB) you can find details about the Steering Committee and the governance of the project.
Principal Investigators
Prof. Dr. Noah Bubenhofer (LiRI)
Dr. Andrea Malits (Universitätsbibliothek Zürich)
Dr. Cristina Grisot (CLARIN-CH)
Project coordinators: Dr. Letizia Volpin, Dr. Joanna Blochowiak
Main outcomes
In the context of the requirements of Open Science and of FAIR principles, on the one hand, and of that of more challenging data sets (such as, sensitive or with copyright issues), on the other hand, we identified several gaps regarding the current situation in Switzerland that are going to be addressed thanks to ORD project. Below you can find a panorama of the projects' main outcomes :
Technical implementations:
- Software development: the LiRI Corpus Platform with three interfaces for text, audio and video data
Building of data convertors: read documentation and access code
Creation of complex annotation representations: read documentation
Development of a corpus language query, the DQD, to allow for complex and powerful queries in text, audio and video data
Formulation of a concept for registering complex metadata for specific types of language data, such as sign language and interactional linguistics data: reach out to LaRS team
Creation of the Swiss CMDI profile useful for metadata interoperability with CLARIN
Implementation of the CLARIN federated authentication system
Implementation of the harvesting of the SWISSUbase repository by the CLARIN VLO
Construction of API to push data from LiRI to SWISSUbase : read documentation
Community engagement, standardization and workflows:
Personal support to Swiss corpus owners to increase the FAIR-compliance of their corpora: read the CLARIN-CH recommendations for data sharing
Implementation of a national working group to formulate and disseminate good practices for the management of sensitive data, copyright and intellectual property issues for language data: CLARIN-CH WG Management of Sensitive and Personal data, Ethical and Legal issues for linguistic data
Implementation of a national working group to collect metadata for Swiss learner corpora and formulate recommendations for building open learner corpora: CLARIN-CH WG
Formulation of a concept about registering corpus platforms and other services, such as the LCP and the Swiss-AL in SWISSUbase, i.e. metadata publication without data
The creation of a survey to collect recommended/standard data formats and the integration of the Swiss results into the CLARIN Standard Information System
Training and documentation:
Formulation of data curation and version control workflows: read documentation
Organisation of webinars about the management of sensitive data, copyright and intellectual property issues to inform the scientific community
Organisation of a series of online training sessions to inform and form the scientific community about how to use the Swiss FAIR-compliant ecosystem of infrastructures 2.0 to enhance leur research.
Publication in the CLARIN-CH Zenodo community of open educational resources based on the webinars series and training series
Creation of the CLARIN-CH Documentation Platform about management of open research data
Certifications
- Acquisition of the CoreTrustSeal certification for the Language Repository of Switzerland LaRS
- Submission of the application of the Linguistic Research Infrastructure LiRI and the Language Repository of Switzerland LaRS for certification as CLARIN B-center (results expects early 2026)
Dissemination
- LCP workshop at SwissText 2025: Bring your own data!
- Grisot C., Craevschi A., Futter C., Vukovic T., Zehr J., Krasselt J., Dreesen P. (2025) The Swiss FAIR-compliant ecosystem of infrastructures 2.0. Extended abstract accepted for CLARIN Annual Conference.
- Grisot C., Craevschi A. (2025) CLARIN-CH: supporting Open Research Data management. Poster at Swiss NLP Expo.
- Grisot C. (2024) Presentation of CLARIN-CH ecosystem in the Tour de CLARIN
- Blochowiak J., Grisot C. (2025) Building up the CLARIN-CH Training Programme. Extended abstract accepted for CLARIN Annual Conference.
- Bubenhofer, N., Malits, A., Strebel, S., Gräen, J., Buerli, S., & Grisot, C. (2023, December). Building and consolidating a FAIR-compliant ecosystem of infrastructures. In CLARIN Annual Conference Proceedings (p. 95-99).
- Schaber, J., Graën, J., McDonald, D., Mustac, I., Rajovic, N., Schneider, G., ... & Kontino, T. (2023, October). The LiRI Corpus Platform. In CLARIN annual conference proceedings (pp. 145-149).
- Schaber, J., Graën, J., Mustač, I., Rajović, N., Schneider, G., Zehr, J., & Bubenhofer, N. Swissdox@ LiRI–a large database of media articles made accessible to researchers. CLARIN annual conference proceedings (pp. 111-115).
Poster at Open Access Week 2023 at UZH
Presentation at 2023 SWISSUbase Annual event at UZH (November 2023)
Presentation of UpLORD project (PDF, 1 MB) at 2024 CLARIN-CH Day at University of Neuchâtel (September 9, 2024)
Swissuniversities P5 Open Science closing event (November 18, 2024)