Click here to flash read.
Change detection plays a fundamental role in Earth observation for analyzing
temporal iterations over time. However, recent studies have largely neglected
the utilization of multimodal data that presents significant practical and
technical advantages compared to single-modal approaches. This research focuses
on leveraging digital surface model (DSM) data and aerial images captured at
different times for detecting change beyond 2D. We observe that the current
change detection methods struggle with the multitask conflicts between semantic
and height change detection tasks. To address this challenge, we propose an
efficient Transformer-based network that learns shared representation between
cross-dimensional inputs through cross-attention. It adopts a consistency
constraint to establish the multimodal relationship, which involves obtaining
pseudo change through height change thresholding and minimizing the difference
between semantic and pseudo change within their overlapping regions. A
DSM-to-image multimodal dataset encompassing three cities in the Netherlands
was constructed. It lays a new foundation for beyond-2D change detection from
cross-dimensional inputs. Compared to five state-of-the-art change detection
methods, our model demonstrates consistent multitask superiority in terms of
semantic and height change detection. Furthermore, the consistency strategy can
be seamlessly adapted to the other methods, yielding promising improvements.
No creative common's license