Click here to flash read.
Bird's eye view (BEV) representation is a new perception formulation for
autonomous driving, which is based on spatial fusion. Further, temporal fusion
is also introduced in BEV representation and gains great success. In this work,
we propose a new method that unifies both spatial and temporal fusion and
merges them into a unified mathematical formulation. The unified fusion could
not only provide a new perspective on BEV fusion but also brings new
capabilities. With the proposed unified spatial-temporal fusion, our method
could support long-range fusion, which is hard to achieve in conventional BEV
methods. Moreover, the BEV fusion in our work is temporal-adaptive and the
weights of temporal fusion are learnable. In contrast, conventional methods
mainly use fixed and equal weights for temporal fusion. Besides, the proposed
unified fusion could avoid information lost in conventional BEV fusion methods
and make full use of features. Extensive experiments and ablation studies on
the NuScenes dataset show the effectiveness of the proposed method and our
method gains the state-of-the-art performance in the map segmentation task.
No creative common's license