Click here to flash read.
We investigate composed image retrieval with text feedback. Users gradually
look for the target of interest by moving from coarse to fine-grained feedback.
However, existing methods merely focus on the latter, i.e., fine-grained
search, by harnessing positive and negative pairs during training. This
pair-based paradigm only considers the one-to-one distance between a pair of
specific points, which is not aligned with the one-to-many coarse-grained
retrieval process and compromises the recall rate. In an attempt to fill this
gap, we introduce a unified learning approach to simultaneously modeling the
coarse- and fine-grained retrieval by considering the multi-grained
uncertainty. The key idea underpinning the proposed method is to integrate
fine- and coarse-grained retrieval as matching data points with small and large
fluctuations, respectively. Specifically, our method contains two modules:
uncertainty modeling and uncertainty regularization. (1) The uncertainty
modeling simulates the multi-grained queries by introducing identically
distributed fluctuations in the feature space. (2) Based on the uncertainty
modeling, we further introduce uncertainty regularization to adapt the matching
objective according to the fluctuation range. Compared with existing methods,
the proposed strategy explicitly prevents the model from pushing away potential
candidates in the early stage and thus improves the recall rate. On the three
public datasets, \ie, FashionIQ, Fashion200k, and Shoes, the proposed method
has achieved +4.03%, + 3.38%, and + 2.40% Recall@50 accuracy over a strong
baseline, respectively.
No creative common's license