Online Supplement

Reference-based Multi-stage Progressive Restoration for Multi-degraded Images

Yi Zhang, Qixue Yang, Damon M. Chandler, and Xuanqin Mou

 

This webpage serves as the online supplement of the paper “Reference-based Multi-stage Progressive Restoration for Multi-degraded Images”, submitted to the IEEE Transactions on Image Processing.

 

Ref-IRT is a reference-based multi-degraded image restoration (MDIR) method which operates based on transferring similar textures from the reference image to the distorted image such that the edge/texture/structure information lost in quality degradation can be supplemented and restored. Our method is based on three subsequent stages. The first stage performs the preliminary restoration on the distorted image such that similar edges/textures can be possibly and more accurately found in the reference image; the second primary restoration stage aims to improve the restoration performance by transferring similar edges/textures from a reference image to the distorted image to help recover the lost information; the third stage performs the final restoration such that more accurate texture features are transferred to further enhance the restored image quality.

Due to the page-length limits of the journal article, here, we first introduce the proposed XRIR dataset in more detail. Then, we describe the subjective study conducted to quantify the visual improvement achieved by different MDIR methods tested on multi-degraded images. Moreover, we show the performance of MPENet in predicting distortion parameters. We also present more details about Ref-IRT+, a modified version of the Ref-IRT approach, based on a practical degradation model [1]. Finally, we show visual results of different MDIR methods tested on real-world images taken from the LIVE Challenge Dataset [2] and/or captured by our own camera.

[Download the code and dataset]

 

1. XJTU-referenced image restoration (XRIR) dataset

The XRIR dataset contains 200 high-resolution pristine images each of which has a corresponding reference. Among the 200 image pairs, 137 were carefully collected from the Internet and 63 were captured by using our own camera. Compared with CUFED5 [3] and WR_SR [4], XRIR has advantages in both image number and content diversity. Specifically, the 200 image pairs roughly cover 11 different categories of image content: indoor, outdoor, building, landmark, animal, plant, ocean/lake, forest, mountain, human, and others (which mainly contain the man-made object not belonging to any of the aforementioned 10 categories). Sample images in XRIR with different categories are shown in Figure 1, and the dataset distribution is illustrated in Table 1. The image resolutions of XRIR range from a minimum of 1200×1600 pixels to a maximum of 5874×3810 pixels. Thus, the dataset can be used as a benchmark for performance evaluation of both reference-based image super-resolution and restoration tasks.

 

 

indoor

outdoor

building

landmark

animal

plant

input

A group of people in a room

Description automatically generated

A building with a lawn and bushes

Description automatically generated

A building with a red roof

Description automatically generated

A stone pillars in a desert

Description automatically generated

A group of penguins in the snow

Description automatically generated

A field of yellow flowers

Description automatically generated

reference

A room with many columns and a table

Description automatically generated with medium confidence

A grass field with houses in the background

Description automatically generated

A group of buildings next to a body of water

Description automatically generated

A stone pillars in a desert

Description automatically generated

A group of penguins in the snow

Description automatically generated

A field of yellow flowers

Description automatically generated

 

ocean/lake

forest

mountain

human

others

 

input

A sailboat and a sailboat in the water

Description automatically generated

A river flowing through a forest

Description automatically generated

A mountain with clouds in the sky

Description automatically generated

A group of people wearing colorful hats

Description automatically generated

People walking on a path with a group of people

Description automatically generated

 

reference

A sailboat in the water

Description automatically generated

A river running through a forest

Description automatically generated

A mountain range with clouds

Description automatically generated

A group of people in a market

Description automatically generated

A rack of barrels on grass

Description automatically generated

 

Figure 1. Sample pristine-reference image pairs with different categories in the XRIR dataset.

 

 

Table 1. Common image statistics computed for different image categories in the XRIR dataset.

Image category

indoor

outdoor

building

landmark

animal

plant

ocean/lake

forest

mountain

human

others

Percentage of total images

6.5%

13%

19.5%

12.5%

3.5%

7.5%

16.5%

4%

9%

3%

5%

L*

Mean

46.53

51.24

50.62

52.36

56.88

49.91

51.37

45.53

51.73

47.07

45.46

Variance

455.16

660.02

638.14

463.98

454.62

541.61

505.16

564.37

505.79

617.38

629.65

Skewness

0.26

0.04

-0.02

-0.18

-0.24

0.19

-0.22

0.49

0.18

-0.18

0.32

Kurtosis

2.99

2.34

2.09

2.93

2.53

2.47

2.31

2.62

2.54

2.22

2.46

-Slope

1.57

1.36

1.42

1.41

1.45

1.37

1.38

1.42

1.40

1.55

1.43

a*

Mean

2.83

0.15

0.00

3.58

-8.90

-2.60

-0.75

-3.50

-2.27

5.15

7.20

Variance

47.83

31.74

36.19

32.45

51.25

119.13

29.08

50.47

43.62

164.12

121.49

Skewness

1.33

1.16

1.01

1.15

0.73

0.07

1.19

0.20

0.30

1.94

0.94

Kurtosis

15.95

17.39

13.54

14.78

15.80

4.68

12.91

7.39

9.38

12.47

8.61

-Slope

1.50

1.38

1.43

1.37

1.36

1.43

1.36

1.53

1.39

1.58

1.51

b*

Mean

11.70

1.58

2.61

5.36

-0.23

6.22

-3.69

9.08

0.64

10.28

7.15

Variance

114.35

173.74

125.49

214.57

161.74

216.19

142.48

105.86

199.23

216.71

120.28

Skewness

0.17

0.39

0.43

-0.09

1.08

0.61

0.68

0.76

0.24

0.32

0.39

Kurtosis

7.19

4.82

4.96

3.54

8.62

4.02

4.65

4.98

3.53

5.60

5.81

-Slope

1.59

1.51

1.53

1.50

1.51

1.50

1.48

1.59

1.50

1.64

1.53

 

We also report some common image statistics of our dataset. Specifically, each image in the dataset was first converted from the RGB to CIE L*a*b* color space (by using the rgb2lab function in Matlab which uses a D65 white point and sRGB input space by default). Then, for each channel (L*, a*, and b*), the mean, variance, skewness, kurtosis, and the (negative) slope of the 1D-averaged magnitude spectrum were computed. The histogram of each of these statistic values computed for all dataset images are shown in Figure 2. For each plot, the x-axis represents the statistic value and the y-axis represents the probability corresponding to the value. Observe that the mean, skewness, and slope parameters generally follow a Gaussian distribution, while the variance and kurtosis values generally follow a Weibull distribution. Moreover, the aforementioned statistics for each image category are presented in Table 1, in which each entry represents the average statistic value computed for all the images belonging to the same category. According to Table 1, it seems that different image categories differ in the mean/skewness values of a*/b* channels, meaning that color statistic might be a good feature to distinguish between different categories in the XRIR dataset. In summary, the dataset contains a reasonable variety of mean luminances and chromaticities, levels of activity/contrast, levels of sparseness, and smooth vs. busy regions.

 

Figure 2. Histograms of common image statistics computed for all 200 image pairs in the XRIR dataset.

 

 

2. Subjective study

Since Ref-IRT transfers similar textures from the reference to the target image, and deep learning methods are prone to introduce texture artifacts that can sometimes be visually unpleasant to viewers, we conducted a subjective study to quantify the visual improvement achieved by the proposed method. To this end, 20 multi-degraded images were randomly selected from each of the three datasets (i.e., CUFED5 [3], WR_SR [4], and XRIR) resulting in 60 images in total. For each distorted image, 19 MDIR methods (see the paper for more details) were applied to yield 19 restored images. During the test, for each of the 60 distorted images, the subjects were presented with the 19 restored versions. The subjects were asked to select the image(s) with the best/highest perceived quality. Specifically, if the subject was 100 percent confident that an image is the best, then the single image was selected and recorded. Otherwise, the subject could select 2 or 3 images that he/she considered to have the best quality. For each trial, the 19 restored images were randomly presented to the subject one image at a time, and the subject did not know which image was produced by which IR method in order to avoid potential bias. In addition, for each trial, the subject was not allowed to select more than 3 images. Accordingly, 14 subjects with different ages (ranging from 19-40 years in age) and genders participated in the study.

Results tested on the three groups of dataset images are shown in Table 2, in which each entry denotes how many times the restored image produced by applying the IR method in the row to the distorted image in the column was selected. The last column of Table 2 contains the average probability values which indicate how likely the MDIR methods in the row tend to give the better results. As can be observed, in most cases, our method was selected more frequently than the others, demonstrating that images produced by our method are generally perceived to be of higher visual quality than others.

 

Table 2. Subjective results tested on sample images randomly selected from the CUFED5, WR_SR, and XRIR datasets.

CUFED5

Image ID

'004'

'005'

'011'

'024'

'025'

'026'

'031'

'042'

'057'

'058'

'066'

'069'

'073'

'077'

'079'

'086'

'089'

'091'

'097'

'111'

Average

RL-Restore

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0.006

OWAN

0

1

2

0

4

0

0

1

0

3

0

0

0

0

0

2

0

0

0

1

0.043

HOWAN

0

0

1

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.006

RMBN

0

0

0

0

1

0

0

0

0

1

0

1

0

0

0

0

1

0

0

0

0.012

MEPS

0

0

0

0

0

0

0

0

0

3

0

0

0

0

0

0

0

0

0

0

0.009

DnCNN

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.000

DuRN

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0.003

MIRNet

0

1

2

0

4

0

0

2

0

1

1

0

0

0

2

0

1

0

0

0

0.043

COLA-Net

0

2

0

1

1

0

0

0

1

0

0

0

2

0

0

0

0

1

0

0

0.024

SwinIR

0

0

0

0

4

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.012

Restormer

0

2

1

0

5

0

0

1

1

3

0

0

1

0

3

0

1

2

1

2

0.070

DoubleUNet

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.003

W-Net

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0.006

StackUNet

0

0

0

0

0

0

0

0

1

0

0

0

1

0

0

0

0

1

0

0

0.009

TTSR

0

0

4

1

0

0

0

0

1

0

1

0

0

0

0

0

0

0

1

0

0.024

RefVAE

0

0

0

0

1

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0.006

MASA

0

0

0

0

0

0

2

0

0

0

0

0

0

0

0

0

0

0

2

1

0.015

DATSR

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0.003

Ref-IRT

14

11

9

13

6

14

12

12

11

11

14

13

12

12

9

12

11

11

12

12

0.704

WR_SR

Image ID

'002'

'004'

'014'

'016'

'019'

'024'

'026'

'028'

'029'

'030'

'032'

'037'

'039'

'053'

'058'

'060'

'061'

'063'

'067'

'079'

Average

RL-Restore

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.000

OWAN

0

4

5

4

0

5

4

0

2

0

2

0

4

1

0

0

3

3

0

1

0.089

HOWAN

0

0

0

0

0

0

0

0

0

1

0

2

0

1

0

0

0

1

0

0

0.012

RMBN

0

0

0

0

0

0

0

1

0

0

2

1

0

0

2

1

0

1

0

1

0.021

MEPS

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0.007

DnCNN

0

0

0

0

0

0

2

0

0

0

0

2

0

0

0

0

0

0

0

2

0.014

DuRN

0

0

0

0

0

1

0

0

0

0

0

4

0

0

0

0

0

0

0

1

0.014

MIRNet

0

3

2

3

5

4

3

2

3

3

0

3

1

3

1

0

4

7

0

1

0.113

COLA-Net

0

0

0

2

0

4

3

3

0

0

0

4

0

0

2

0

0

0

0

1

0.045

SwinIR

0

0

0

1

0

2

5

3

2

0

0

1

0

0

1

0

0

1

0

2

0.042

Restormer

1

4

2

1

4

2

2

2

1

5

0

0

2

1

2

0

7

5

0

1

0.099

DoubleUNet

0

0

0

4

0

1

0

0

0

0

0

0

2

1

0

0

0

1

0

1

0.024

W-Net

0

1

0

4

5

5

1

1

3

2

0

2

0

2

1

2

1

5

0

0

0.082

StackUNet

0

0

0

2

0

0

2

0

0

0

0

2

0

1

0

0

0

0

0

0

0.016

TTSR

0

1

0

0

2

1

0

0

0

2

0

0

0

1

1

0

0

0

0

1

0.021

RefVAE

1

1

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

1

0.009

MASA

0

0

0

0

0

0

0

0

0

0

0

0

0

0

2

2

0

0

0

0

0.009

DATSR

0

0

1

1

1

1

0

1

1

0

0

1

1

0

0

1

0

0

1

0

0.024

Ref-IRT

13

10

12

5

6

2

6

6

8

8

10

0

7

10

7

10

9

5

13

5

0.358

XRIR

Image ID

'040'

'046'

'050'

'055'

'058'

'074'

'083'

'093'

'102'

'104'

'105'

'111'

'132'

'146'

'153'

'158'

'164'

'171'

'178'

'185'

Average

RL-Restore

0

0

0

0

0

0

0

0

0

0

2

0

0

0

0

0

0

0

0

0

0.005

OWAN

0

0

0

0

2

0

2

1

0

3

2

0

1

3

0

0

0

0

0

0

0.034

HOWAN

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0.002

RMBN

0

0

0

0

0

0

3

1

0

0

4

0

0

1

1

0

0

0

0

2

0.029

MEPS

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

1

0.005

DnCNN

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

1

0

0.007

DuRN

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0.000

MIRNet

0

4

0

2

6

2

3

3

5

6

2

3

1

3

2

3

1

0

1

4

0.124

COLA-Net

0

0

0

0

1

0

6

1

0

0

1

1

3

1

2

0

1

0

1

0

0.044

SwinIR

0

0

0

0

0

0

2

0

0

0

5

1

0

0

0

0

0

0

0

0

0.019

Restormer

0

1

0

7

3

8

0

0

2

5

2

6

3

6

2

0

1

0

0

4

0.122

DoubleUNet

0

0

0

0

0

0

0

0

0

0

1

0

0

0

3

0

0

0

0

0

0.010

W-Net

0

1

0

1

1

1

2

0

0

1

1

2

2

0

1

0

1

0

2

0

0.039

StackUNet

0

0

1

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0.007

TTSR

0

6

0

0

2

4

1

2

2

3

0

0

0

3

0

0

0

1

0

1

0.061

RefVAE

0

0

0

0

2

0

0

0

1

0

2

0

0

0

0

0

0

0

0

0

0.012

MASA

1

0

0

0

0

0

0

2

1

0

3

0

0

0

0

0

0

0

0

0

0.017

DATSR

0

1

0

0

1

0

0

1

1

1

1

0

0

0

0

0

0

0

0

0

0.015

Ref-IRT

13

10

14

12

5

7

6

5

8

8

4

7

10

4

9

13

13

12

12

12

0.448

 

 

3. Performance of MPENet

In Ref-IRT, the MPENet was employed to predict the three distortion parameter values of a multi-degraded image, which were used to add the same level of distortions to the reference image for better content/texture matching. Here, we evaluate the performance of MPENet in distortion parameter estimation. To this end, the PLCC/SROCC values between the estimated distortion parameters and the recorded ground-truth distortion parameters for each dataset image were calculated. Table 3 shows the PLCC/SROCC results computed for each distortion type in the three datasets (i.e., CUFED5 [3], WR_SR [4], XRIR). A large value of PLCC/SROCC closer to one indicates a better prediction.

 

Table 3. PLCC and SROCC values computed for each distortion type in the three testing datasets.

Dataset

CUFED5

WR_SR

XRIR

Matrix

Blur

Noise

JPEG

Blur

Noise

JPEG

Blur

Noise

JPEG

PLCC

0.980

0.999

0.999

0.966

0.999

0.999

0.967

0.999

0.998

SROCC

0.985

0.998

1.000

0.961

0.998

1.000

0.970

0.998

1.000

 

As observed in Table 2, in general, all distortion parameter values can be well predicted, though the performance on Gaussian blur is a bit weak. This is due to the fact that the blur artifact can be easily destroyed by the subsequent noise and compression distortions. Also, the non-strict boundary between a blurry image and the pristine version adds another difficulty for blur parameter estimation. For example, a pristine image that focuses on a specific object can also display blur in the surrounding areas. Despite such an inaccuracy, we believe that the proposed MPENet is qualified for the distortion parameter estimation task, because only similar distortions are required for the reference image to achieve a decent matching result.

 

4. More details about Ref-IRT+

(1) Network architecture of MPENet

To fill in the performance gap of Ref-IRT in dealing with synthesized distortions and real-world applications, we additionally trained our method on images corrupted by using a practical degradation model [1]. In addition to JPEG compression, this new degradation model considers different Gaussian blur kernels (i.e., isotropic and anisotropic Gaussian kernels) to generate the blur distortion, different Gaussian noise models (i.e., the channel-independent additive white Gaussian noise (AWGN), the gray-scale AWGN, and the general case) to generate the noise distortion, and different sequential orders of distortions being added to images to expand the degradation space.

 

A diagram of a computer program

Description automatically generated with medium confidence

Figure 3. Network architecture of the modified MPENet in Ref-IRT+.

 

To enable our approach to work on this new degradation model, the distortion parameter estimation (DPE) block in MPENet has to be modified since more distortion parameters are required which include (1) the 2×2 covariance matrix  of the multivariate normal distribution for generating the isotropic/anisotropic Gaussian kernels; (2) the 3×3 covariance matrix  of the Gaussian noise model; and (3) the quality parameter  for the JPEG compression. Since  and  are symmetric matrices, the number of the distortion parameters for the three distortion types are three, six, and one, respectively. Also, the MPENet has to predict the sequential order of the blur and noise distortions being added to the image, because the blur distortion can help reduce the perceived noise strength. Accordingly, the modified MPENet is shown in Figure 3, in which (a) is used for predicting the sequential order, and (b) is used for predicting the ten distortion parameters. Note that (a) and (b) are fed by the same three feature vectors, which are obtained from the three average pooling layers in front. Besides, the modified MPENet takes as input the RGB color image, instead of the luminance image, because either the channel-independent AWGN or the grayscale AWGN or the generalized AWGN were added to the three channels.

 

(2) Visual results on different distortion types/intensities

Ref-IRT+ was tested on three distortion types (blur + JPEG, noise + JPEG, blur + noise + JPEG) at three distortion intensities (mild, moderate, severe), and thus nine distortion scenarios were considered in the test. Figure 4 shows the visual results of Ref-IRT+ tested on sample distorted images generated from the pristine images in the CUFED5 dataset. The first column denotes the input images; the second and third columns denote the results produced by the first and final stages of Ref-IRT+. As can be observed, the reference image is less likely to help when images are corrupted by noise and JPEG compression. Also, when images are mildly distorted, the reference image might be unnecessary, because a deep CNN model might be good enough for recovering the mildly-distorted image contents.

 

Blur + JPEG

Mild

Moderate

Severe

 

Noise + JPEG

Mild

Moderate

Severe

 

Blur + Noise + JPEG

Mild

Moderate

Severe

 

Input

Ref-IRT-I+

Ref-IRT+

Figure 4. Visual results of Ref-IRT+ tested on sample distorted images generated from the pristine images in the CUFED5 dataset with different distortion types and intensities.

 

5. Test on real-world images

We also tested our algorithm on real-world images. To this end, images taken from the LIVE Challenge dataset and captured by our own camera were used for testing. For testing on LIVE Challenge, the reference images were randomly selected from the 127 pristine images in the LIVE [5], CSIQ [6], and CBSD68 [7] datasets. For testing on images of our own, the target image was captured by a web camera which is of lower quality, and the reference image was captured by another high-quality camera on the similar scene. Visual results of Ref-IRT+ tested on sample images taken from the LIVE Challenge dataset [2] are shown in Figure 5. Visual comparison of Ref-IRT vs. other MDIR methods tested on the image captured by our own camera is shown in Figure 6. As can be observed, Ref-IRT+ is able to remove the blur, noise, and compression artifacts in real-world images thanks to training with the practical degradation model [1]. Also, by referring to a reference, Ref-IRT is able to achieve better results than other MDIR methods.

 

Input

Ref-IRT+

Input

Ref-IRT+

Figure 5. Visual results of Ref-IRT+ tested on sample real-world images from the LIVE Challenge dataset.

 

 

Input

RL-Restore

OWAN

HOWAN

 

RMBN

MEPS

DnCNN

DuRN

 

MIRNet

COLA-Net

SwinIR

Restormer

 

DoubleUNet

W-Net

StackUNet

TTSR

 

RefVAE

MASA

DATSR

Ref-IRT

 

Figure 6. Visual comparison of Ref-IRT vs. other MDIR methods tested on a real-world image captured by using a web camera.

 

References

[1] K. Zhang, J. Liang, L. Van Gool, and R. Timofte, “Designing a practical degradation model for deep blind image super-resolution,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 4791–4800.

[2] D. Ghadiyaram and A. C. Bovik, “Massive online crowdsourced study of subjective and objective picture quality,” IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 372–387, 2015.

[3] Z. Zhang, Z. Wang, Z. Lin, and H. Qi, “Image super-resolution by neural texture transfer,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7982–7991, 2019.

[4] Y. Jiang, K. C. Chan, X. Wang, C. C. Loy, and Z. Liu, “Robust reference-based super-resolution via c2-matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2103–2112, 2021.

[5] H. R. Sheikh, Z. Wang, A. C. Bovik, and L. K. Cormack, “Image and video quality assessment research at LIVE,” Online, http://live.ece.utexas.edu/research/quality/.

[6] E. C. Larson and D. M. Chandler, “Most apparent distortion: full reference image quality assessment and the role of strategy,” Journal of Electronic Imaging, vol. 19, no. 1, p. 011006, 2010.

[7] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in IEEE International Conference on Computer Vision (ICCV), vol. 2, 2001, pp. 416–423.