Navigation auf uzh.ch

Suche

LiRI - Linguistic Research Infrastructure Swissdox@LiRI

How To Get Started

This user manual is intended for beginner audiences / first users of the Swissdox@LiRI platform.

More detailed and technical descriptions on how to use the Swissdox@LiRI platform use can be found on the LiRI wiki.

If you need help navigating the platform or building queries, reach out to community@linguistik.uzh.ch.

1. Access the Swissdox@LiRI platform

As you click the link swissdox.linguistik.uzh.ch you will be greeted with a login-panel. If your University / institution is a supporter of Swissdox@LiRI, or if you applied for a Swissdox@LiRI voucher, you should be able to login with the same shortname and password you use for other university matters. If you encounter problems at this access point, please reach out to swissdox@linguistik.uzh.ch

2. Register your project

Once you successfully log in, you'll be ready to register your first "project". All queries are connected to a project, so this is step is necessary. 

Register a project by clicking on the link provided in the description text, or by navigating to the "projects" menu in the top navigation, where all your projects will be listed. 

If you already have one or more projects registered, you can select one in the drop-down menu on the top right. 

3. Create your query

Once you registered a project or selected the one you're currently working on on the top right, you can start building queries. To do this, you either click on "Build new query" or you navigate to the top, where it says "Corpus Query":

Swissdox interface with focus frames on the 2 options to build a new query.
Zoom (PNG, 470 KB)

Setting keyword(s)

Generally speaking, the most important filter you'll apply is the keyword or keyword(s) you are interested in. Swissdox@LiRI returns every article in full text that includes the respective term(s). Setting a keyword is as easy as writing it in the open text field; you can include several by listing them after each other (comma separated). Do keep in mind that the search is case-sensitive – the following query would not include "Climate", capitalized:

Overall View of the Query Building Mask, with only the text for keyword filled in, it says "environment, weather, climate"
Zoom

Other filtering options

Setting one or multiple of the following filters tells the database to consider only articles / items...

  • Languages: ... in the chosen language, such as "French". 
  • Document Types: ... published in the chosen document types, such as "Local Daily Newspaper". 
  • Sources: ... from the chosen media sources, such as "Aargauer Zeitung (AZ)". 
  • Date Ranges: ... from the chosen time frame. Set a start and an end date, such as "2023-01-01 ~ 2023-12-31" to cover all of 2023.

Setting filters can also be helpful in reducing query time. If you do not select any, no filtering is applied.

4. Submit your query

After clicking on "next", you are then redirected to the following meta data mask. Make sure to give your query a meaningful name and set its expiration date consciously. You are then ready to submit your query.

submit query mask, showing where you enter a name for you query and set the expiration date.
Zoom

 

5. Download and open your results

Once your query is finished, you can download your results. Navigate to the "Retrieve datasets" page to see the finished query – as well as all other queries you've done within the selected project. 

Interface showing the download button, which is activated once your query is done.
Zoom

The downloaded file ends in "tsv.xz", implying both the format (tsv = tab-separated values) of the data, as well as it being compressed (xz). If you'd like to review your data before further processing, open the file as follows: 

  1. Decompress the file: Mac users simply double click on the file. Windows users with, e.g., 7-Zip, right-click the file and choose "7-Zip" > "Extract here." 
  2. The now extracted tsv-file can be opened in a spreadsheet program (like Numbers or Excel) or in a text editor.  

If you'd like to open it programatically, here is a python-snippet to do so: 


import lzma

def read_xz_compressed_tsv(filepath):
    fh = lzma.open(filepath, mode='rt', encoding='utf-8')
    for line in fh:
        if not line.strip() or line.startswith('#'):
            continue
        yield line.rstrip().split('\t')
 
for row in read_xz_compressed_tsv('file.tsv.xz'):
    print(row)