Learned Image Compression with Mixed Transformer-CNN Architectures. (arXiv:2303.14978v1 [eess.IV])

Click here to flash read.

Learned image compression (LIC) methods have exhibited promising progress and
superior rate-distortion performance compared with classical image compression
standards. Most existing LIC methods are Convolutional Neural Networks-based
(CNN-based) or Transformer-based, which have different advantages. Exploiting
both advantages is a point worth exploring, which has two challenges: 1) how to
effectively fuse the two methods? 2) how to achieve higher performance with a
suitable complexity? In this paper, we propose an efficient parallel
Transformer-CNN Mixture (TCM) block with a controllable complexity to
incorporate the local modeling ability of CNN and the non-local modeling
ability of transformers to improve the overall architecture of image
compression models. Besides, inspired by the recent progress of entropy
estimation models and attention modules, we propose a channel-wise entropy
model with parameter-efficient swin-transformer-based attention (SWAtten)
modules by using channel squeezing. Experimental results demonstrate our
proposed method achieves state-of-the-art rate-distortion performances on three
different resolution datasets (i.e., Kodak, Tecnick, CLIC Professional
Validation) compared to existing LIC methods. The code is at
https://github.com/jmliu206/LIC_TCM.

Click here to read this post out

ID: 24690; Unique Viewers: 0

Unique Voters: 0

Total Votes: 0

Votes:

Latest Change: March 28, 2023, 7:34 a.m. Changes:

/u/anonymous

Dictionaries:

Words:

Spaces:

CC:
No creative common's license

Comments: