The Background Tool

Short online news articles or news distributed on social media platforms often lack crucial context and can lead to a fragmented perception of reality. This is in strong contrast to the value propositions of MTC industry partners, which provide relevant background information on almost any topic in order to enable their audience to form informed opinions. This research project tackles abstractive multi-document summarization (MDS) of German text to be able to provide an abridged version of topic-related German news articles. In contrast to English, there is very limited German MDS data, and therefore, this research project focuses on low-resource approaches to MDS.
Goal
This project focuses on multi-document summarization in German news articles. The goal is to perform research on low-resource approaches that would be well-suited for languages like German that have limited or unavailable MDS data.
Outcomes
The Multi-GeNews Dataset
We built a German MDS test set in the news domain that allowed us to evaluate the performance of MDS approaches on abstractive MDS. The dataset consists of 754 Swiss news articles organised into 402 groups of related articles. The dataset is available to the research community to foster further research in German abstractive MDS.
external page Dataset Download
Dataset Paper
Research on Low-resource MDS
We performed research on low-resource MDS and implemented different entropy-based sampling approaches for MDS. The main advantages of these approaches is that they do not require MDS training data and are able to consider all source articles in the generation of the summary. We released our code to the research community, so researchers can use it as a benchmark in later work on low-resource abstractive MDS.
external page Public Open-Source Repository
Research Paper
Additional Project Resources
Resources for Industry Partners
Additional project resources for our industry partners are only available to registered users and can be found here.