Navigation auf uzh.ch

Suche

LiRI - Linguistic Research Infrastructure

LiRI/NCCR Friday Lunchtime Talks

On selected Fridays over lunch time, 12.00 - 13.30 (1h talk + 30 min social lunch)

Foto: Daphne Be Frenchie; source: unsplash

Members of LiRI and NCCR Evolving Language organize the Friday Lunchtime Talks event series. The planned talks in the spring semester 2024 will cover topics such as low-resource languages, visual algorithms and spatial data science in linguistics. If you are interested in participating, please follow the link to the planning document on SWITCHdrive:  https://drive.switch.ch/index.php/s/r8MlFwrs352KtOV  

Past Friday Lunchtime Talks

Weiterführende Informationen

Huw Swanborough: Experimental Design Clinic, June 21, 12-1:30 pm

This lunch-time talk will be a sort of pseudo Experimental Design clinic, going over various considerations and tips for language scientists when gathering behavioural and physiological data. A form for topic suggestions and questions is available here.

Abstract:


Data collection is the experimental ‘point-of-no-return’; once data has been collected we can only analyse what is present in the data files and we cannot go back and interpolate missing information that we want for a subsequent analytical model. Biases and confounding effects may be inseparable from the desired observed effects potentially resulting in null results with no way of mitigating them leading to lost time, money, and mental calmness. These pitfalls and obstacles are created during the initial design of the experiment, yet are often only observable during analysis and often lack intuitive cause and effect (e.g. the format in which you save your data may prevent certain post-hoc comparisons being made). During this session we will go over some of these potential pitfalls and discuss ways of avoiding them and mitigating them, with a particular focus on cognitive and psychoacoustic considerations to the experimental design and stimuli presentation that may result in analysis problems down the line. The session will be part presentation, part open-floor clinic for discussion of topics so that we can grapple with the ideas in a more tangible manner. Questions and contributions towards the content of the meeting are warmly welcomed; if you have any concerns, current obstacles, or even examples of past problems caused during the design stage, please submit them to me by Monday and I will do my best to include them for the open-floor discussion. I will be including examples of the times I have painted myself into a corner, so please don’t worry about any attacking critiques of your work as the aim is to be constructive and use shared experience to avoid repeated mistakes as a group.

Srikanth Madikeri Raghunathan: Large pre-trained self-supervised models for automatic speech processing , June 7, 2024 in AND 4.55/4.57 (Andreasstrasse 15).

Abstract:

In this talk, I will present our work on the application of large pre-trained self-supervised models for different speech processing tasks: low-resource automatic speech recognition, spoken language understanding, and language identification. The success of wav2vec 2.0 style self-training paved the way for rapid training of automatic speech recognition systems, which was later extended to other speech processing tasks such as speaker recognition and language recognition. Our work combines the success of hybrid ASR (so called HMM/DNN approaches) with pre-trained audio encoders to leverage the best of both systems: from using Lattice Free-Maximum Mutual Information as cost function for acoustic model fine-tuning to adapters for parameter-efficient training.

With effective ASR training methods, the current focus of research and development on spoken document processing has shifted towards downstream tasks such as intent detection, slot filling, information retrieval and dialog structure discovery. In our work, we compare different approaches to combine multiple hypotheses from ASR, as opposed to only one-best.

Jean-Marc Odobez & Anshul Gupta: Decoding Visual Attention: from 3D Gaze to Social Gaze inference in Everyday Scenes, April 26, 2024

Abstract:

Beyond words, non-verbal behaviors (NVB) are known to play important roles in face-to-face interactions. However, decoding NVB is a challenging problem that involves both extracting subtle physical NVB cues and mapping them to higher-level communication behaviors or social constructs. Gaze, in particular, serves as a fundamental indicator of attention and interest with functions related to communication and social signaling, and plays an important role in many fields, like intuitive human-computer or robot interface design, or for medical diagnosis, like assessing Autism Spectrum Disorders (ASD) in children. 
However, estimating the visual attention of others - that is, estimating their gaze (3D line of sight) and Visual Focus of Attention (VFOA) - is a challenging task, even for humans. It often requires not only inferring an accurate 3D gaze direction from the person's face and eyes but also understanding the global context of the scene to decide which object in the field of view is actually looked at. Context can include the person or other person activities that can provide priors about which objects are looked at, or the scene structure to detect obstructions in the line of sight. Hence, two lines of research have been followed recently. The first one focused on improving appearance-based 3D gaze estimation from images and videos, while the second investigated gaze following - the task of estimating the 2D pixel location of where a person looks in an image. 
In this presentation, we will discuss different methods that address the two cases mentioned above. We will first focus on several methodological ideas on how to improve 3D gaze estimation, including approaches to build personalized models through few-shot learning and gaze redirection eye synthesis, differential gaze estimation, or taking advantage of priors on social interactions to obtain weak labels for model adaptation. In the second part, we will introduce recent models aiming at estimating gaze targets in the wild, showing how to take advantage of different modalities including estimating the 3D field of view,  as well as methods for inferring  social labels (eye contact, shared attention).

Weiterführende Informationen

LiRI/NCCR Friday Lunchtime Talk with Catalina Torres

In an informal setting we are learning about each others work and current research topic in the area of linguistics, machine learning, and statistics. 
The talks will last 1h (12:00-13:00); then there will be the chance to have lunch together and socialise until 13:30 (in the provided room).
Everyone is welcome!

Date: Friday, March 15, 2023, 12:00-13:30
Location: Andreastrasse 15, 8050 Zurich, 4th floor, room AND 4.55/4.57
Speaker: Catalina Torres

flyer bild

Corpus Phonetics Pipeline, a discussion with Eleanor Chodroff, 8 December, 12:30 -13:30

Access to and availability of large-scale spoken language data has risen dramatically in the past few years, enabling a rise in related scientific and engineering research directions. In this session, we’ll discuss the pipeline to processing such large-scale data with a focus on scientific investigations in the form of “corpus phonetics”. In particular, we’ll discuss transcription approaches, grapheme-to-phoneme (G2P) conversion, and forced alignment systems. I’ll also provide an overview and demo to some G2P systems, as well as the Montreal Forced Aligner.

Date: Friday, December 8, 2023, 12:30-13:30
Location: Andreastrasse 15, 8050 Zurich, 4th floor, room AND 4.55/4.57
Speaker: Eleanor Chodroff