Click here to flash read.
Conformer-based models have become the most dominant end- to-end architecture
for speech processing tasks. In this work, we propose a carefully redesigned
Conformer with a new down- sampling schema. The proposed model, named Fast Con-
former, is 2.8x faster than original Conformer, while preserv- ing
state-of-the-art accuracy on Automatic Speech Recognition benchmarks. Also we
replace the original Conformer global attention with limited context attention
post-training to enable transcription of an hour-long audio. We further improve
long- form speech transcription by adding a global token. Fast Con- former
combined with a Transformer decoder also outperforms the original Conformer in
accuracy and in speed for Speech Translation and Spoken Language Understanding.
No creative common's license