Click here to flash read.
In this paper, we propose a theoretical framework to explain the efficacy of
prompt learning in zero/few-shot scenarios. First, we prove that conventional
pre-training and fine-tuning paradigm fails in few-shot scenarios due to
overfitting the unrepresentative labelled data. We then detail the assumption
that prompt learning is more effective because it empowers pre-trained language
model that is built upon massive text corpora, as well as domain-related human
knowledge to participate more in prediction and thereby reduces the impact of
limited label information provided by the small training set. We further
hypothesize that language discrepancy can measure the quality of prompting.
Comprehensive experiments are performed to verify our assumptions. More
remarkably, inspired by the theoretical framework, we propose an
annotation-agnostic template selection method based on perplexity, which
enables us to ``forecast'' the prompting performance in advance. This approach
is especially encouraging because existing work still relies on development set
to post-hoc evaluate templates. Experiments show that this method leads to
significant prediction benefits compared to state-of-the-art zero-shot methods.