SwissDial Dataset

<sup>Photo by eberhard grossgasteiger on Unsplash</sup>
Photo by eberhard grossgasteiger on Unsplash

The first annotated parallel corpus of spoken Swiss German across 8 major dialects (AG, BE, BS, GR, LU, SG, VS, ZH). The dataset includes around 3 hours of high quality audio per dialect together with Swiss German and High German transcripts. The data is freely available to the research community and can be used to explore speech synthesis methods and other technologies dealing with Swiss German texts, such as dialect identification, machine translation, or even to explore linguistic properties of the various dialects.

More details can be found in the data set paper: external page https://arxiv.org/pdf/2103.11401.pdf

The dataset can be downloaded here: external page https://form.jotform.com/223344961502048

Samples from SwissDial dataset can be found here: https://gitlab.inf.ethz.ch/ou-mtc-public/swiss-dial-samples

JavaScript has been disabled in your browser