Gerold Schneider
8050 Zürich
Campus Oerlikon
Gerold Schneider is Titulary Professor of Computational Linguistics and co-coordinator of LiRI's service area "Natural Language Processing". His doctoral degree is on large-scale dependency parsing, his habilitation on using computational models for corpus linguistics.
His research interests include corpus linguistics, cognitive linguistics,
statistical approaches, Digital Humanities, learner language, text mining, automated content analysis and language modeling. He has published over 130 articles on these topics, including a book on statistics for linguists available here.
He also works with NLP methods and hate speech detection for the URPP Digital Religion(s) project. Find out more about Gerolds work on his GoogleScholar page or his personal webpage.
Publications
ZORA Publication List
Publications
-
The Influence of Automatic Speech Recognition on Linguistic Features and Automatic Alzheimer’s Disease Detection from Spontaneous Speech In N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti, & N. Xue (Eds.), Proceedings of the International Conference on Computational Linguistics, Language Resources and Evaluation (pp. 15955–15969). Association for Computational Linguistics. https://aclanthology.org/2024.lrec-main.1386/
-
The Visualisation and Evaluation of Semantic and Conceptual Maps In M. Laitinen & J. Tyrkkö (Eds.), Linguistics across Disciplinary Borders: The March of Data (pp. 67–94). Bloomsbury Publishing. https://doi.org/10.5040/9781350362291.0009
-
Native Language Identification Improves Authorship Attribution 289–296. https://aclanthology.org/2024.icnlsp-1.0
-
Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset 4405–4424. https://doi.org/10.18653/v1/2024.naacl-long.248
-
Investigating child language acquisition from a joint perspective: A comparison of traditional and new L1 speakers of English In M. Schmalz, M. Vida-Mannl, & S. Buschfeld (Eds.), Acquisition and Variation in World Englishes: Bridging Paradigms and Rethinking Approaches (No. 69; pp. 133–157). De Gruyter. https://doi.org/10.1515/9783110733723-007
-
Turkish Native Language Identification 303–307. https://aclanthology.org/2023.icnlsp-1.0.pdf
-
Exploring Hybrid Linguistic Features for Turkish Text Readability 223–232. https://aclanthology.org/2023.icnlsp-1.0.pdf
-
The LiRI Corpus Platform In K. Linden, J. Niemi, & T. Kontino (Eds.), CLARIN Annual Conference Proceedings (pp. 145–149). CLARIN ERIC. http://hdl.handle.net/10138/570996
-
“To boldly go where no man has gone before”: how iconic is the Star Trek split infinitive? Linguistics Vanguard, 9, 247–255. https://doi.org/10.1515/lingvan-2022-0168
-
Exploring the role of AI in classifying, analyzing, and generating case reports on assisted suicide cases: feasibility and ethical implications Frontiers in Artificial Intelligence, 6, 1328865. https://doi.org/10.3389/frai.2023.1328865
-
Colloquialisation, compression and democratisation in British parliamentary debates In M. Korhonen, H. Kotze, & J. Tyrkkö (Eds.), Exploring Language and Society with Big Data: Parliamentary discourse across time and space (pp. 336–372). John Benjamins Publishing. https://doi.org/10.1075/scl.111.12sch
-
Swissdox@ LiRI–a large database of media articles made accessible to researchers In K. Linden, J. Niemi, & T. Kontino (Eds.), CLARIN Annual Conference Proceedings (pp. 111–115). CLARIN ERIC. https://helda.helsinki.fi/bitstreams/6aa6e46b-697e-45da-b0f0-d5211d4e78bc/download#page=120
-
Challenges and best practices for digital unstructured data enrichment in health research: A systematic narrative review PLOS Digital Health, 2, e0000347. https://doi.org/10.1371/journal.pdig.0000347
-
Differences in syntactic annotation affect retrieval International Journal of Corpus Linguistics, 28, 378–406. https://doi.org/10.1075/ijcl.21104.zeh
-
Evaluating the Effectiveness of Natural Language Inference for Hate Speech Detection in Languages with Limited Labeled Data 187–201. https://doi.org/10.18653/v1/2023.woah-1.19
-
Detecting and Analysing Learner Difficulties Using a Learner Corpus Without Error Tagging In K. Harrington & P. Ronan (Eds.), Demystifying Corpus Linguistics for English Language Teaching (pp. 229–257). Palgrave Macmillan. https://doi.org/10.1007/978-3-031-11220-1_12
-
Replicable semi-supervised approaches to state-of-the-art stance detection of tweets Information Processing & Management, 60, 103199. https://doi.org/10.1016/j.ipm.2022.103199
-
Do Non-native Speakers Read Differently? Predicting Reading Times with Surprisal and Language Models of Native and Non-native Eye Tracking Data In B. Busse, N. Dumrukcic, & I. Kleiber (Eds.), Language and Linguistics in a Complex World (pp. 153–188). De Gruyter. https://doi.org/10.1515/9783111017433-008
-
Scaling Native Language Identification with Transformer Adapters 5th International Conference on Natural Language and Speech Processing (ICNLSP), Trento. https://doi.org/10.48550/arXiv.2211.10117
-
Complementing Kernel Density Estimation and Topic Modelling to Visualise Political Discourse (J. H. Jantunen & et al, Eds.; pp. 12–27). University of Jyväskylä. https://jyx.jyu.fi/handle/123456789/84140