Open source
At the MTC, we strongly believe in open innovation. Therefore we make our code and software available under open-source licenses to the public. Our software is typically available for commercial and non-commercial purposes at no additional cost. With our commitment to open source, we established the first step for successful technology transfers from academia to industry.
Machine Learning Frameworks / Models
We regularly release the code to any models or ML frameworks that we built as part of our projects at the MTC.
Artificial but Natural Voices
A Text-to-speech framework for Swiss voice generation. external page Find the repository on GitHub
AvatarForge
Generate your own Deepfake Avatar video. external page Find the repository on GitHub
Journalistic Portfolio Analysis
An LLM-based library for extracting debate arguments from news articles. external page Find the repository on GitHub
Low-resource Multi-document Summarization
Entropy-based sampling approaches for abstractive multi-document summarization in low-resource settings. external page Find the repository on GitHub
Guided Single-document Summarization
Implementation of the mBART model with input guidance. external page Find the repository on GitHub
Interactive Post-editing Framework
An open-source translation framework for interactive post-editing research. external page Find the repository on GitHub
Aesthetify: Aesthetic Assessment Tool
An AI-powered aesthetic media retrieval system. Find out more
Federated Neural Collaborative Filtering
A federated learning approach to neural collaborative filtering of news articles. external page Find the repository on Github
A Simple Federated Learning Framework for Small Number of Stakeholders
Our framework allows a small number of stakeholders to train various machine learning models in a federated way. external page Find the repository on Github
Datasets
Machine learning is driven by data. So most ML projects start with the hunt for quality datasets. We are committed to provide valuable datasets that we collect for research purposes.
Artificial but Natural Voices Audio Processing
A preprocessing pipeline for preparing data collections for text-to-speech training. external page Find the repository on GitHub
Absinth Dataset
A manually annotated dataset for hallucination detection in German news summarization. external page Find out more on GitHub
Multi-GeNews Dataset
An evaluation dataset of German news articles for abstractive multi-document summarization. external page Find out more on GitHub
Reddit Photo Critique Dataset (RPCD)
A dataset for aesthetic assessment that contains tuples of image and photo critiques. Find out more
CHeeSE Dataset
A collection of manually annotated Swiss news articles in German, where each pair of news articles and debate questions is annotated with the stance of the article towards the question, the article emotion, and the emotion of each individual paragraph. Find out more
SwissDial Dataset
An annotated parallel corpus of spoken Swiss German across 8 major dialects (AG, BE, BS, GR, LU, SG, VS, ZH). The dataset includes around 3 hours of high quality audio per dialect together with Swiss German and High German transcripts. Find out more
Data collection tools
At the MTC we are collecting data for various projects. As part of these efforts some simple-to-use data collection tools have been developed that we would like to share.
Online Text Labelling
As part of our emotion & stance project we developed a streamlined web app to collect text labels. The tool was developed for news article annotation but can be easily adapted for different use cases. Find out more
Tournament-style image rating tool
We developed a simple-to-use web tool to collect image rankings from users powered by a Swiss tournament style system. Relative user preference (one image over the other) are used to build a global ranking. Find out more
Swiss German Data Collection Tool
We collaborated with Fachhochschule Nordwestschweiz (FHNW) to add gamification features to their voice collection platform. Find out more