Click here to flash read.
Using a vocabulary that is shared across languages is common practice in
Multilingual Neural Machine Translation (MNMT). In addition to its simple
design, shared tokens play an important role in positive knowledge transfer,
assuming that shared tokens refer to similar meanings across languages.
However, when word overlap is small, especially due to different writing
systems, transfer is inhibited. In this paper, we define word-level information
transfer pathways via word equivalence classes and rely on graph networks to
fuse word embeddings across languages. Our experiments demonstrate the
advantages of our approach: 1) embeddings of words with similar meanings are
better aligned across languages, 2) our method achieves consistent BLEU
improvements of up to 2.3 points for high- and low-resource MNMT, and 3) less
than 1.0\% additional trainable parameters are required with a limited increase
in computational costs, while inference time remains identical to the baseline.
We release the codebase to the community.
No creative common's license