The Background Tool

Duration

12 months

Status

Completed

MTC Team

Dr. Laura Mascarell, Ribin Chalumattu, Dr. Fabio Zünd, Prof. Ryan Cotterell

Collaborators

Christian Vogg (SRF), Florian Notter (SRF), Milena Djordjevic (TX Group), Dr. Dominic Herzog (TX Group), Dr. Tatyana Ruzsics (NZZ)


Stacks of paperwork
Photo by Wesley Tingey on Unsplash.

Short online news articles or news distributed on social media platforms often lack crucial context and can lead to a fragmented perception of reality. This is in strong contrast to the value propositions of MTC industry partners, which provide relevant background information on almost any topic in order to enable their audience to form informed opinions. This research project tackles abstractive multi-document summarization (MDS) of German text to be able to provide an abridged version of topic-related German news articles. In contrast to English, there is very limited German MDS data, and therefore, this research project focuses on low-resource approaches to MDS.

Goal

This project focuses on multi-document summarization in German news articles. The goal is to perform research on low-resource approaches that would be well-suited for languages like German that have limited or unavailable MDS data.

Outcomes

The Multi-GeNews Dataset

We built a German MDS test set in the news domain that allowed us to evaluate the performance of MDS approaches on abstractive MDS. The dataset consists of 754 Swiss news articles organised into 402 groups of related articles. The dataset is available to the research community to foster further research in German abstractive MDS.

external pageDataset Download
Dataset Paper

Research on Low-resource MDS

We performed research on low-resource MDS and implemented different entropy-based sampling approaches for MDS. The main advantages of these approaches is that they do not require MDS training data and are able to consider all source articles in the generation of the summary. We released our code to the research community, so researchers can use it as a benchmark in later work on low-resource abstractive MDS.

external pagePublic Open-Source Repository
Research Paper
 


Additional Project Resources

Resources for Industry Partners

Additional project resources for our industry partners are only available to registered users and can be found here.

JavaScript has been disabled in your browser