Navigation auf uzh.ch

Suche

LiRI - Linguistic Research Infrastructure

LiRI/NCCR Friday Lunchtime Talks

On selected Fridays over lunch time, 12.00 - 13.30 (1h talk + 30 min social lunch)

Foto: Daphne Be Frenchie; source: unsplash

Members of LiRI and NCCR Evolving Language organize the Friday Lunchtime Talks event series. The planned talks in the spring semester 2024 will cover topics such as low-resource languages, visual algorithms and spatial data science in linguistics. If you are interested in participating, please follow the link to the planning document on SWITCHdrive:  https://drive.switch.ch/index.php/s/r8MlFwrs352KtOV  

Srikanth Madikeri Raghunathan: Large pre-trained self-supervised models for automatic speech processing , June 7, 2024 in AND 4.55/4.57 (Andreasstrasse 15).

Abstract:

In this talk, I will present our work on the application of large pre-trained self-supervised models for different speech processing tasks: low-resource automatic speech recognition, spoken language understanding, and language identification. The success of wav2vec 2.0 style self-training paved the way for rapid training of automatic speech recognition systems, which was later extended to other speech processing tasks such as speaker recognition and language recognition. Our work combines the success of hybrid ASR (so called HMM/DNN approaches) with pre-trained audio encoders to leverage the best of both systems: from using Lattice Free-Maximum Mutual Information as cost function for acoustic model fine-tuning to adapters for parameter-efficient training.

With effective ASR training methods, the current focus of research and development on spoken document processing has shifted towards downstream tasks such as intent detection, slot filling, information retrieval and dialog structure discovery. In our work, we compare different approaches to combine multiple hypotheses from ASR, as opposed to only one-best.

Jean-Marc Odobez & Anshul Gupta: Decoding Visual Attention: from 3D Gaze to Social Gaze inference in Everyday Scenes, April 26, 2024

Abstract:

Beyond words, non-verbal behaviors (NVB) are known to play important roles in face-to-face interactions. However, decoding NVB is a challenging problem that involves both extracting subtle physical NVB cues and mapping them to higher-level communication behaviors or social constructs. Gaze, in particular, serves as a fundamental indicator of attention and interest with functions related to communication and social signaling, and plays an important role in many fields, like intuitive human-computer or robot interface design, or for medical diagnosis, like assessing Autism Spectrum Disorders (ASD) in children. 
However, estimating the visual attention of others - that is, estimating their gaze (3D line of sight) and Visual Focus of Attention (VFOA) - is a challenging task, even for humans. It often requires not only inferring an accurate 3D gaze direction from the person's face and eyes but also understanding the global context of the scene to decide which object in the field of view is actually looked at. Context can include the person or other person activities that can provide priors about which objects are looked at, or the scene structure to detect obstructions in the line of sight. Hence, two lines of research have been followed recently. The first one focused on improving appearance-based 3D gaze estimation from images and videos, while the second investigated gaze following - the task of estimating the 2D pixel location of where a person looks in an image. 
In this presentation, we will discuss different methods that address the two cases mentioned above. We will first focus on several methodological ideas on how to improve 3D gaze estimation, including approaches to build personalized models through few-shot learning and gaze redirection eye synthesis, differential gaze estimation, or taking advantage of priors on social interactions to obtain weak labels for model adaptation. In the second part, we will introduce recent models aiming at estimating gaze targets in the wild, showing how to take advantage of different modalities including estimating the 3D field of view,  as well as methods for inferring  social labels (eye contact, shared attention).

Weiterführende Informationen

LiRI/NCCR Friday Lunchtime Talk with Srikanth Madikeri Raghunathan June 7, 2024 in AND 4.55/4.57 (Andreasstrasse 15).

More about LiRI/NCCR Friday Lunchtime Talk with Srikanth Madikeri Raghunathan June 7, 2024 in AND 4.55/4.57 (Andreasstrasse 15).

Topic:  Large pre-trained self-supervised models for automatic speech processing 

The talks will last 1h (12:00-13:00); then there will be the chance to have lunch together and socialise until 13:30 (in the provided room).
Everyone is welcome!

LiRI/NCCR Friday Lunchtime Talk with IDIAP's Jean-Marc Odobez & Anshul Gupta

 Decoding Visual Attention: from 3D Gaze to Social Gaze inference in Everyday Scenes 

Jean-Marc Odobez and Anshul Gupta are from the Perception and Activity Understanding group at the IDIAP Research Institute. Their main research interests are on human activities analysis from multi-modal data. In their talk  they  discuss different methods that address the challenge of estimating visual attention

Date: Friday, April 26, 2023, 12:00-13:30
Location: Andreastrasse 15, 8050 Zurich, 4th floor, room AND 4.55/4.57
 

The talks will last 1h (12:00-13:00); then there will be the chance to have lunch together and socialise until 13:30 (in the provided room).
Everyone is welcome!

LiRI/NCCR Friday Lunchtime Talk with Catalina Torres

In an informal setting we are learning about each others work and current research topic in the area of linguistics, machine learning, and statistics. 
The talks will last 1h (12:00-13:00); then there will be the chance to have lunch together and socialise until 13:30 (in the provided room).
Everyone is welcome!

Date: Friday, March 15, 2023, 12:00-13:30
Location: Andreastrasse 15, 8050 Zurich, 4th floor, room AND 4.55/4.57
Speaker: Catalina Torres

flyer bild

Corpus Phonetics Pipeline, a discussion with Eleanor Chodroff, 8 December, 12:30 -13:30

Access to and availability of large-scale spoken language data has risen dramatically in the past few years, enabling a rise in related scientific and engineering research directions. In this session, we’ll discuss the pipeline to processing such large-scale data with a focus on scientific investigations in the form of “corpus phonetics”. In particular, we’ll discuss transcription approaches, grapheme-to-phoneme (G2P) conversion, and forced alignment systems. I’ll also provide an overview and demo to some G2P systems, as well as the Montreal Forced Aligner.

Date: Friday, December 8, 2023, 12:30-13:30
Location: Andreastrasse 15, 8050 Zurich, 4th floor, room AND 4.55/4.57
Speaker: Eleanor Chodroff