Click here to flash read.
With the rise of task-specific pre-training objectives, abstractive
summarization models like PEGASUS offer appealing zero-shot performance on
downstream summarization tasks. However, the performance of such unsupervised
models still lags significantly behind their supervised counterparts. Similarly
to the supervised setup, we notice a very high variance in quality among
summary candidates from these models while only one candidate is kept as the
summary output. In this paper, we propose to re-rank summary candidates in an
unsupervised manner, aiming to close the performance gap between unsupervised
and supervised models. Our approach improves the unsupervised PEGASUS by up to
7.27% and ChatGPT by up to 6.86% relative mean ROUGE across four widely-adopted
summarization benchmarks ; and achieves relative gains of 7.51% (up to 23.73%
from XSum to WikiHow) averaged over 30 zero-shot transfer setups (finetuning on
a dataset, evaluating on another).