back to my homepage

Learning Convolutional Networks for Content-weighted Image Compression

Mu Li, Wangmeng Zuo, Shuhang Gu, Debin Zhao, David Zhang

This website accompanies our paper Learning Convolutional Networks for Content-weighted Image Compression.

Lossy image compression is generally formulated as a joint rate-distortion optimization to learn encoder, quantizer, and decoder. However, the quantizer is non-differentiable, and discrete entropy estimation usually is required for rate control. These make it very challenging to develop a convolutional network (CNN)-based image compression system. In this paper, motivated by that the local information content is spatially variant in an image, we suggest that the bit rate of the different parts of the image should be adapted to local content. And the content aware bit rate is allocated under the guidance of a content-weighted importance map. Thus, the sum of the importance map can serve as a continuous alternative of discrete entropy estimation to control compression rate. And binarizer is adopted to quantize the output of encoder due to the binarization scheme is also directly defined by the importance map. Furthermore, a proxy function is introduced for binary operation in backward propagation to make it differentiable. Therefore, the encoder, decoder, binarizer and importance map can be jointly optimized in an end-to-end manner by using a subset of the ImageNet database.In low bit rate image compression, experiments show that our system significantly outperforms JPEG and JPEG 2000 by structural similarity (SSIM) index, and can produce the much better visual result with sharp edges, rich textures, and fewer artifacts.

The network structure

Layer Activation size
Input 3×128×128
8×8×128 conv, pad 2, stride 4 128×32×32
Residual block, 128 filters 128×32×32
4×4×256 conv, pad 1, stride 2 256×16×16
Residual block, 256 filters 256×16×16
Residual block, 256 filsters 256×16×16
1×1×64(128) conv, pad 0, stride1 64(128)×16×16

Table.1 Network architecture of the convolutional encoder.



Layer Activation size
Input 64(128)×16×16
1×1×512 conv, pad 0, stride 1 512×16×16
Residual block, 512 filters 512×16×16
Residual block, 512 filters 512×16×16
Depth to Space, stride 2 128×32×32
3×3×256 conv, pad 1, stride 1 256×32×32
Residual block, 256 filsters 256×32×32
Depth to space, stride 4 16×128×128
3×3×32 conv, pad 1, stride 1 32×128×128
3×3×3 conv, pad 1, stride 1 3×128×128

Table.2 Network architecture of the convolutional decoder.



Layer Activation size
Input (the output of the last residue block of encoder) 256×16×16
3×3×128 conv, pad 1, stride 1 128×16×16
3×3×128 conv, pad 1, stride 1 128×16×16
1×1×1 conv, pad 1, stride 1 1×16×16

Table.3 Network architecture of the importance map network.




Figure.1 Structure of the residual blocks.


Table 1 and Table 2 give the network architectures of the convolutional encoder and decoder, respectively. Except for the last layer, each convolutional layer is followed by ReLU nonlinearity. For the encoder, the last convolutional layer is followed by a Sigmoid nonlinearity to make sure the output of the encoder is the interval of (0,1). As to the decoder, there is no nonlinear layer after the last convolutional layer. For the residual block, we stack two convolutional layers in each block and remove the batch normalization layers. The architecture of the residual blocks is shown in Figure 1.

Table 3 illustrates the network structure of the importance map network. It takes the feature maps of the last residue block of the encoder as the input and output the importance map. Then, three convolutional layers are adopted to generate the importance map. The first two convolutional layers are followed by the ReLU nonlinear layer, while the last convolutional layer is followed by a Sigmoid nonlinear layer to make sure the generated importance map in the interval of (0,1).

Comparison of image with importance map and without importance map


Figure.2 Comparison between our model with and without importance map.

The compressed images by our model with and without importance map are also shown in Figure 2. And more detailed textures and better visual quality can be obtained by using the importance map. This indicates that the introduced importance map provides our model with more ability to model the textures and edges in low bit rate image compression.

We compare our method to JPEG with 4:2:0 chroma subsampling, and to the OpenJPEG implementation of JPEG 2000 with the default "multiple component transform".

Now, our test codes are available at GitHub. Please try by yourself.

Kodak set

Downloaded from here. Note: We removed a boundary of 8 pixels from each side to eliminate border artifacts and enable a fairer comparison. Click on image to see compression results.

All the compared images are download from website of the paper "End-to-end Optimized Image Compression". And the webpage style are also borrowed from the website. Here, we express our sincere gratitude to Ballé for the wonderful work.

Flag Counter