SwissDial Dataset
The first annotated parallel corpus of spoken Swiss German across 8 major dialects (AG, BE, BS, GR, LU, SG, VS, ZH). The dataset includes around 3 hours of high quality audio per dialect together with Swiss German and High German transcripts. The data is freely available to the research community and can be used to explore speech synthesis methods and other technologies dealing with Swiss German texts, such as dialect identification, machine translation, or even to explore linguistic properties of the various dialects.
More details can be found in the data set paper: external page https://arxiv.org/pdf/2103.11401.pdf
The dataset can be downloaded here: external page https://form.jotform.com/223344961502048
Samples from SwissDial dataset can be found here: https://gitlab.inf.ethz.ch/ou-mtc-public/swiss-dial-samples