Text-to-Text Translation for Swiss Voice Assistant

Abstract

Building a Swiss German voice assistant with comprehension skills is very challenging since Swiss German is a spoken dialect with no standardized writing and many different sub dialects. Using the comprehension skills of a German voice assistant requires Text-to-Text translation between High German and Swiss German. The data used to build a simple dictionary stems from scrapped websites, the ArchiMob corpus and transcribed SRF news reports. To gain a good understanding of the available data, we compute statistics and analyse the data. Simple word-based translation using the dictionary serves as a baseline for the following advanced approaches. As a start, we designed an optimized training and evaluation environment for sequence-to-sequence experiments. Our method of attention-based sequence-to-sequence translation is a state of the art translation method, also deployed by big companies like Google. In this paper, we will explore and discuss word-to-word, word-to-letter and letter-to-letter sequence-to- sequence translations from German to the Zurich dialect. Letter-to-letter translation delivered promising results for the sparse data. Finally, we discuss possible enhancements of the model and solutions for the problem of high variation in the writing, to further increase performance.


Felix Lunzenfichter

Bachelor's Thesis

Status:

Completed

JavaScript has been disabled in your browser