Click here to flash read.
Learning discriminative task-specific features simultaneously for multiple
distinct tasks is a fundamental problem in multi-task learning. Recent
state-of-the-art models consider directly decoding task-specific features from
one shared task-generic feature (e.g., feature from a backbone layer), and
utilize carefully designed decoders to produce multi-task features. However, as
the input feature is fully shared and each task decoder also shares decoding
parameters for different input samples, it leads to a static feature decoding
process, producing less discriminative task-specific representations. To tackle
this limitation, we propose TaskExpert, a novel multi-task mixture-of-experts
model that enables learning multiple representative task-generic feature spaces
and decoding task-specific features in a dynamic manner. Specifically,
TaskExpert introduces a set of expert networks to decompose the backbone
feature into several representative task-generic features. Then, the
task-specific features are decoded by using dynamic task-specific gating
networks operating on the decomposed task-generic features. Furthermore, to
establish long-range modeling of the task-specific representations from
different layers of TaskExpert, we design a multi-task feature memory that
updates at each layer and acts as an additional feature expert for dynamic
task-specific feature decoding. Extensive experiments demonstrate that our
TaskExpert clearly outperforms previous best-performing methods on all 9
metrics of two competitive multi-task learning benchmarks for visual scene
understanding (i.e., PASCAL-Context and NYUD-v2). Codes and models will be made
publicly available at https://github.com/prismformore/Multi-Task-Transformer