Article

Hoe betrouwbaar zijn manuele dialecttranscripties? Het GCND geëvalueerd

Authors

Abstract

This paper investigates the reliability of manual dialect transcriptions, focusing on the potential impact of transcription errors on variationist linguistic research. It is argued that such errors are particularly problematic if unevenly distributed across corpus components, as this may result in spurious regional or register effects. The study then presents an error analysis of the Spoken Corpus of (Southern) Dutch Dialects (GCND), first released in October 2024 (Breitbarth et al. 2024). Thirteen variationist linguists each reviewed a transcription in their respective native dialects. Their reviews indicate that despite strict quality control and a detailed transcription protocol, the GCND transcriptions are not error-free. Still, the overall word error rates (WER) are low — 1.29% for lightly standardized and 0.73% for heavily standardized layers — indicating high reliability. Most errors concern function words, which were frequently substituted and occasionally inserted or omitted. The lightly standardized layer also showed excessive Dutchification, with dialect-specific forms often wrongly replaced by standard variants. Error rates varied modestly across transcriptions (WER 0.27–3.17), largely due to differences in reviewer strictness and protocol familiarity, not regional bias. Transparent reporting of such patterns can guide future corpus users in interpreting data with appropriate caution.

Keywords:

How to Cite: Ghyselen, A. , Deklerck, C. , Farasyn, M. , Colleman, T. , Hellebaut, L. & Breitbarth, A. (2025) “Hoe betrouwbaar zijn manuele dialecttranscripties? Het GCND geëvalueerd”, Handelingen van de Koninklijke Commissie voor Toponymie en Dialectologie. 96(1). doi: https://doi.org/10.21825/hctd.94156