Navigation auf uzh.ch
Members of LiRI and NCCR Evolving Language organize the Friday Lunchtime Talks event series. The planned talks in the spring semester 2024 will cover topics such as low-resource languages, visual algorithms and spatial data science in linguistics. If you are interested in participating, please follow the link to the planning document on SWITCHdrive: https://drive.switch.ch/index.php/s/r8MlFwrs352KtOV
This lunch-time talk will be a sort of pseudo Experimental Design clinic, going over various considerations and tips for language scientists when gathering behavioural and physiological data. A form for topic suggestions and questions is available here.
Data collection is the experimental ‘point-of-no-return’; once data has been collected we can only analyse what is present in the data files and we cannot go back and interpolate missing information that we want for a subsequent analytical model. Biases and confounding effects may be inseparable from the desired observed effects potentially resulting in null results with no way of mitigating them leading to lost time, money, and mental calmness. These pitfalls and obstacles are created during the initial design of the experiment, yet are often only observable during analysis and often lack intuitive cause and effect (e.g. the format in which you save your data may prevent certain post-hoc comparisons being made). During this session we will go over some of these potential pitfalls and discuss ways of avoiding them and mitigating them, with a particular focus on cognitive and psychoacoustic considerations to the experimental design and stimuli presentation that may result in analysis problems down the line. The session will be part presentation, part open-floor clinic for discussion of topics so that we can grapple with the ideas in a more tangible manner. Questions and contributions towards the content of the meeting are warmly welcomed; if you have any concerns, current obstacles, or even examples of past problems caused during the design stage, please submit them to me by Monday and I will do my best to include them for the open-floor discussion. I will be including examples of the times I have painted myself into a corner, so please don’t worry about any attacking critiques of your work as the aim is to be constructive and use shared experience to avoid repeated mistakes as a group.
Abstract:
In this talk, I will present our work on the application of large pre-trained self-supervised models for different speech processing tasks: low-resource automatic speech recognition, spoken language understanding, and language identification. The success of wav2vec 2.0 style self-training paved the way for rapid training of automatic speech recognition systems, which was later extended to other speech processing tasks such as speaker recognition and language recognition. Our work combines the success of hybrid ASR (so called HMM/DNN approaches) with pre-trained audio encoders to leverage the best of both systems: from using Lattice Free-Maximum Mutual Information as cost function for acoustic model fine-tuning to adapters for parameter-efficient training.
With effective ASR training methods, the current focus of research and development on spoken document processing has shifted towards downstream tasks such as intent detection, slot filling, information retrieval and dialog structure discovery. In our work, we compare different approaches to combine multiple hypotheses from ASR, as opposed to only one-best.