Open source

At the MTC, we strongly believe in open innovation. Therefore we make our code and software available under open-source licenses to the public. Our software is typically available for commercial and non-commercial purposes at no additional cost. With our commitment to open source, we established the first step for successful technology transfers from academia to industry.

Machine Learning Frameworks / Models

We regularly release the code to any models or ML frameworks that we built as part of our projects at the MTC.

Artificial but Natural Voices

Robot like figure and a speech bubble saying Guten Tag

A Text-to-speech framework for Swiss voice generation. external page Find the repository on GitHub

AvatarForge

Generate your own Deepfake Avatar video. external page Find the repository on GitHub

Journalistic Portfolio Analysis

An LLM-based library for extracting debate arguments from news articles. external page Find the repository on GitHub

Low-resource Multi-document Summarization

Entropy-based sampling approaches for abstractive multi-document summarization in low-resource settings. external page Find the repository on GitHub

Guided Single-document Summarization

Implementation of the mBART model with input guidance. external page Find the repository on GitHub

Interactive Post-editing Framework

An open-source translation framework for interactive post-editing research. external page Find the repository on GitHub

Aesthetify: Aesthetic Assessment Tool

An AI-powered aesthetic media retrieval system. Find out more

Federated Neural Collaborative Filtering

A federated learning approach to neural collaborative filtering of news articles. external page Find the repository on Github

A Simple Federated Learning Framework for Small Number of Stakeholders

Our framework allows a small number of stakeholders to train various machine learning models in a federated way. external page Find the repository on Github

Datasets

Machine learning is driven by data. So most ML projects start with the hunt for quality datasets. We are committed to provide valuable datasets that we collect for research purposes.

Artificial but Natural Voices Audio Processing

A high-resolution, professional header image for a GitHub repository focused on text-to-speech preprocessing, featuring a microphone, dynamic sound

A preprocessing pipeline for preparing data collections for text-to-speech training. external page Find the repository on GitHub

Absinth Dataset

A manually annotated dataset for hallucination detection in German news summarization. external page Find out more on GitHub

Multi-GeNews Dataset

An evaluation dataset of German news articles for abstractive multi-document summarization. external page Find out more on GitHub

Reddit Photo Critique Dataset (RPCD)

A dataset for aesthetic assessment that contains tuples of image and photo critiques. Find out more

CHeeSE Dataset

A collection of manually annotated Swiss news articles in German, where each pair of news articles and debate questions is annotated with the stance of the article towards the question, the article emotion, and the emotion of each individual paragraph. Find out more

SwissDial Dataset

An annotated parallel corpus of spoken Swiss German across 8 major dialects (AG, BE, BS, GR, LU, SG, VS, ZH). The dataset includes around 3 hours of high quality audio per dialect together with Swiss German and High German transcripts. Find out more

Data collection tools

At the MTC we are collecting data for various projects. As part of these efforts some simple-to-use data collection tools have been developed that we would like to share.

Online Text Labelling

As part of our emotion & stance project we developed a streamlined web app to collect text labels. The tool was developed for news article annotation but can be easily adapted for different use cases. Find out more

Tournament-style image rating tool

We developed a simple-to-use web tool to collect image rankings from users powered by a Swiss tournament style system. Relative user preference (one image over the other) are used to build a global ranking. Find out more

Swiss German Data Collection Tool

We collaborated with Fachhochschule Nordwestschweiz (FHNW) to add gamification features to their voice collection platform. Find out more