Online Supplement
Reference-based Multi-stage
Progressive Restoration for Multi-degraded Images
Yi Zhang, Qixue Yang, Damon M. Chandler, and Xuanqin
Mou
This webpage
serves as the online supplement of the paper “Reference-based Multi-stage
Progressive Restoration for Multi-degraded Images”, submitted to the IEEE Transactions on Image Processing.
Ref-IRT
is a reference-based multi-degraded image restoration (MDIR) method which
operates based on transferring similar textures from the reference image to the
distorted image such that the edge/texture/structure information lost in
quality degradation can be supplemented and restored. Our method is based on
three subsequent stages. The first stage performs the preliminary restoration
on the distorted image such that similar edges/textures can be possibly and
more accurately found in the reference image; the second primary restoration
stage aims to improve the restoration performance by transferring similar
edges/textures from a reference image to the distorted image to help recover
the lost information; the third stage performs the final restoration such that
more accurate texture features are transferred to further enhance the restored
image quality.
Due to the page-length
limits of the journal article, here, we first introduce
the proposed XRIR dataset in more detail. Then, we describe the subjective
study conducted to quantify the visual improvement achieved by different MDIR
methods tested on multi-degraded images. Moreover, we show the performance of
MPENet in predicting distortion parameters. We also present more details about
Ref-IRT+, a modified version of the Ref-IRT approach, based on a practical
degradation model [1]. Finally, we show visual results of different MDIR
methods tested on real-world images taken from the LIVE Challenge Dataset [2]
and/or captured by our own camera.
[Download the code and
dataset]
1. XJTU-referenced image restoration (XRIR)
dataset
The
XRIR dataset contains 200 high-resolution pristine images each of which has a
corresponding reference. Among the 200 image pairs, 137 were carefully
collected from the Internet and 63 were captured by using our own camera.
Compared with CUFED5 [3] and WR_SR [4], XRIR has advantages in both image
number and content diversity. Specifically, the 200 image pairs roughly cover
11 different categories of image content: indoor, outdoor, building, landmark,
animal, plant, ocean/lake, forest, mountain, human, and others (which mainly
contain the man-made object not belonging to any of the aforementioned 10
categories). Sample images in XRIR with different categories are shown in
Figure 1, and the dataset
distribution is illustrated in Table 1. The
image resolutions of XRIR range from a minimum of 1200×1600 pixels to a maximum of 5874×3810 pixels. Thus, the dataset can be used as a benchmark for
performance evaluation of both reference-based image super-resolution and
restoration tasks.
|
indoor |
outdoor |
building |
landmark |
animal |
plant |
input |
|
|
|
|
|
|
reference |
|
|
|
|
|
|
|
ocean/lake |
forest |
mountain |
human |
others |
|
input |
|
|
|
|
|
|
reference |
|
|
|
|
|
|
Figure 1. Sample
pristine-reference image pairs with different categories in the XRIR dataset.
Table 1. Common image statistics computed for different image
categories in the XRIR dataset.
Image category |
indoor |
outdoor |
building |
landmark |
animal |
plant |
ocean/lake |
forest |
mountain |
human |
others |
|
Percentage of total images |
6.5% |
13% |
19.5% |
12.5% |
3.5% |
7.5% |
16.5% |
4% |
9% |
3% |
5% |
|
L* |
Mean |
46.53 |
51.24 |
50.62 |
52.36 |
56.88 |
49.91 |
51.37 |
45.53 |
51.73 |
47.07 |
45.46 |
Variance |
455.16 |
660.02 |
638.14 |
463.98 |
454.62 |
541.61 |
505.16 |
564.37 |
505.79 |
617.38 |
629.65 |
|
Skewness |
0.26 |
0.04 |
-0.02 |
-0.18 |
-0.24 |
0.19 |
-0.22 |
0.49 |
0.18 |
-0.18 |
0.32 |
|
Kurtosis |
2.99 |
2.34 |
2.09 |
2.93 |
2.53 |
2.47 |
2.31 |
2.62 |
2.54 |
2.22 |
2.46 |
|
-Slope |
1.57 |
1.36 |
1.42 |
1.41 |
1.45 |
1.37 |
1.38 |
1.42 |
1.40 |
1.55 |
1.43 |
|
a* |
Mean |
2.83 |
0.15 |
0.00 |
3.58 |
-8.90 |
-2.60 |
-0.75 |
-3.50 |
-2.27 |
5.15 |
7.20 |
Variance |
47.83 |
31.74 |
36.19 |
32.45 |
51.25 |
119.13 |
29.08 |
50.47 |
43.62 |
164.12 |
121.49 |
|
Skewness |
1.33 |
1.16 |
1.01 |
1.15 |
0.73 |
0.07 |
1.19 |
0.20 |
0.30 |
1.94 |
0.94 |
|
Kurtosis |
15.95 |
17.39 |
13.54 |
14.78 |
15.80 |
4.68 |
12.91 |
7.39 |
9.38 |
12.47 |
8.61 |
|
-Slope |
1.50 |
1.38 |
1.43 |
1.37 |
1.36 |
1.43 |
1.36 |
1.53 |
1.39 |
1.58 |
1.51 |
|
b* |
Mean |
11.70 |
1.58 |
2.61 |
5.36 |
-0.23 |
6.22 |
-3.69 |
9.08 |
0.64 |
10.28 |
7.15 |
Variance |
114.35 |
173.74 |
125.49 |
214.57 |
161.74 |
216.19 |
142.48 |
105.86 |
199.23 |
216.71 |
120.28 |
|
Skewness |
0.17 |
0.39 |
0.43 |
-0.09 |
1.08 |
0.61 |
0.68 |
0.76 |
0.24 |
0.32 |
0.39 |
|
Kurtosis |
7.19 |
4.82 |
4.96 |
3.54 |
8.62 |
4.02 |
4.65 |
4.98 |
3.53 |
5.60 |
5.81 |
|
-Slope |
1.59 |
1.51 |
1.53 |
1.50 |
1.51 |
1.50 |
1.48 |
1.59 |
1.50 |
1.64 |
1.53 |
We also
report some common image statistics of our dataset. Specifically, each image in
the dataset was first converted from the RGB to CIE L*a*b* color space (by using the rgb2lab
function in Matlab which uses a D65 white point and
sRGB input space by default). Then, for each channel (L*, a*, and b*), the mean,
variance, skewness, kurtosis, and the (negative) slope
of the 1D-averaged magnitude spectrum were computed. The histogram of each of
these statistic values computed for all dataset images are shown in Figure 2. For
each plot, the x-axis represents the statistic value
and the y-axis represents the probability corresponding to the value. Observe
that the mean, skewness, and slope parameters generally follow a Gaussian distribution, while the variance and kurtosis
values generally follow a Weibull distribution. Moreover,
the aforementioned statistics for each image category are presented in Table 1,
in which each entry represents the average statistic value computed for all the
images belonging to the same category. According to Table 1, it seems that
different image categories differ in the mean/skewness values of a*/b* channels,
meaning that color statistic might be a good feature to distinguish between
different categories in the XRIR dataset. In summary, the dataset contains a reasonable variety of mean luminances and chromaticities, levels
of activity/contrast, levels of sparseness, and smooth vs. busy regions.
|
|
|
|
|
Figure 2. Histograms of common image statistics computed for all 200
image pairs in the XRIR dataset.
2. Subjective study
Since
Ref-IRT transfers similar textures from the reference to the target image, and
deep learning methods are prone to introduce texture artifacts that can sometimes be visually unpleasant to viewers, we conducted a subjective
study to quantify the visual improvement achieved by the proposed method. To
this end, 20 multi-degraded images were randomly selected from each of the
three datasets (i.e., CUFED5 [3], WR_SR [4], and XRIR) resulting in 60 images in total. For each distorted image, 19 MDIR methods (see the paper for more details) were
applied to yield 19 restored images. During
the test, for each of the 60
distorted images, the subjects were presented with
the 19 restored
versions. The subjects were asked to select the image(s) with the best/highest
perceived quality. Specifically, if the subject was 100
percent confident that an image is the best, then the single image was selected
and recorded. Otherwise, the subject could
select 2 or 3 images that he/she considered to have the best quality. For each
trial, the 19 restored images were randomly presented to the subject one image at a time, and the subject did not know which image was
produced by which IR method
in order to avoid potential bias. In addition, for each trial, the
subject was not allowed to select more than 3 images. Accordingly, 14 subjects with different ages (ranging from 19-40 years in age) and genders participated in the study.
Results tested on the three groups of dataset images are shown in Table 2, in which each entry
denotes how many times the restored image produced by applying the IR method in
the row to the distorted image in the column was
selected. The last column of Table 2 contains the average probability values
which indicate how likely the MDIR methods in the row tend to give the better
results. As can be observed, in most cases, our method was selected more
frequently than the others, demonstrating that images produced by our method
are generally perceived to be of higher visual
quality than others.
Table 2. Subjective results tested on sample images randomly selected from the
CUFED5, WR_SR, and XRIR datasets.
CUFED5 |
Image ID |
'004' |
'005' |
'011' |
'024' |
'025' |
'026' |
'031' |
'042' |
'057' |
'058' |
'066' |
'069' |
'073' |
'077' |
'079' |
'086' |
'089' |
'091' |
'097' |
'111' |
Average |
RL-Restore |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0.006 |
|
OWAN |
0 |
1 |
2 |
0 |
4 |
0 |
0 |
1 |
0 |
3 |
0 |
0 |
0 |
0 |
0 |
2 |
0 |
0 |
0 |
1 |
0.043 |
|
HOWAN |
0 |
0 |
1 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.006 |
|
RMBN |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0.012 |
|
MEPS |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
3 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.009 |
|
DnCNN |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.000 |
|
DuRN |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0.003 |
|
MIRNet |
0 |
1 |
2 |
0 |
4 |
0 |
0 |
2 |
0 |
1 |
1 |
0 |
0 |
0 |
2 |
0 |
1 |
0 |
0 |
0 |
0.043 |
|
COLA-Net |
0 |
2 |
0 |
1 |
1 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
2 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0.024 |
|
SwinIR |
0 |
0 |
0 |
0 |
4 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.012 |
|
Restormer |
0 |
2 |
1 |
0 |
5 |
0 |
0 |
1 |
1 |
3 |
0 |
0 |
1 |
0 |
3 |
0 |
1 |
2 |
1 |
2 |
0.070 |
|
DoubleUNet |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.003 |
|
W-Net |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0.006 |
|
StackUNet |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0.009 |
|
TTSR |
0 |
0 |
4 |
1 |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0.024 |
|
RefVAE |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0.006 |
|
MASA |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
1 |
0.015 |
|
DATSR |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0.003 |
|
Ref-IRT |
14 |
11 |
9 |
13 |
6 |
14 |
12 |
12 |
11 |
11 |
14 |
13 |
12 |
12 |
9 |
12 |
11 |
11 |
12 |
12 |
0.704 |
|
WR_SR |
Image ID |
'002' |
'004' |
'014' |
'016' |
'019' |
'024' |
'026' |
'028' |
'029' |
'030' |
'032' |
'037' |
'039' |
'053' |
'058' |
'060' |
'061' |
'063' |
'067' |
'079' |
Average |
RL-Restore |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.000 |
|
OWAN |
0 |
4 |
5 |
4 |
0 |
5 |
4 |
0 |
2 |
0 |
2 |
0 |
4 |
1 |
0 |
0 |
3 |
3 |
0 |
1 |
0.089 |
|
HOWAN |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
2 |
0 |
1 |
0 |
0 |
0 |
1 |
0 |
0 |
0.012 |
|
RMBN |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
2 |
1 |
0 |
0 |
2 |
1 |
0 |
1 |
0 |
1 |
0.021 |
|
MEPS |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
1 |
0.007 |
|
DnCNN |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
0 |
0 |
0 |
0 |
2 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
0.014 |
|
DuRN |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
4 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0.014 |
|
MIRNet |
0 |
3 |
2 |
3 |
5 |
4 |
3 |
2 |
3 |
3 |
0 |
3 |
1 |
3 |
1 |
0 |
4 |
7 |
0 |
1 |
0.113 |
|
COLA-Net |
0 |
0 |
0 |
2 |
0 |
4 |
3 |
3 |
0 |
0 |
0 |
4 |
0 |
0 |
2 |
0 |
0 |
0 |
0 |
1 |
0.045 |
|
SwinIR |
0 |
0 |
0 |
1 |
0 |
2 |
5 |
3 |
2 |
0 |
0 |
1 |
0 |
0 |
1 |
0 |
0 |
1 |
0 |
2 |
0.042 |
|
Restormer |
1 |
4 |
2 |
1 |
4 |
2 |
2 |
2 |
1 |
5 |
0 |
0 |
2 |
1 |
2 |
0 |
7 |
5 |
0 |
1 |
0.099 |
|
DoubleUNet |
0 |
0 |
0 |
4 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
1 |
0 |
0 |
0 |
1 |
0 |
1 |
0.024 |
|
W-Net |
0 |
1 |
0 |
4 |
5 |
5 |
1 |
1 |
3 |
2 |
0 |
2 |
0 |
2 |
1 |
2 |
1 |
5 |
0 |
0 |
0.082 |
|
StackUNet |
0 |
0 |
0 |
2 |
0 |
0 |
2 |
0 |
0 |
0 |
0 |
2 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0.016 |
|
TTSR |
0 |
1 |
0 |
0 |
2 |
1 |
0 |
0 |
0 |
2 |
0 |
0 |
0 |
1 |
1 |
0 |
0 |
0 |
0 |
1 |
0.021 |
|
RefVAE |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0.009 |
|
MASA |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
2 |
0 |
0 |
0 |
0 |
0.009 |
|
DATSR |
0 |
0 |
1 |
1 |
1 |
1 |
0 |
1 |
1 |
0 |
0 |
1 |
1 |
0 |
0 |
1 |
0 |
0 |
1 |
0 |
0.024 |
|
Ref-IRT |
13 |
10 |
12 |
5 |
6 |
2 |
6 |
6 |
8 |
8 |
10 |
0 |
7 |
10 |
7 |
10 |
9 |
5 |
13 |
5 |
0.358 |
|
XRIR |
Image ID |
'040' |
'046' |
'050' |
'055' |
'058' |
'074' |
'083' |
'093' |
'102' |
'104' |
'105' |
'111' |
'132' |
'146' |
'153' |
'158' |
'164' |
'171' |
'178' |
'185' |
Average |
RL-Restore |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.005 |
|
OWAN |
0 |
0 |
0 |
0 |
2 |
0 |
2 |
1 |
0 |
3 |
2 |
0 |
1 |
3 |
0 |
0 |
0 |
0 |
0 |
0 |
0.034 |
|
HOWAN |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0.002 |
|
RMBN |
0 |
0 |
0 |
0 |
0 |
0 |
3 |
1 |
0 |
0 |
4 |
0 |
0 |
1 |
1 |
0 |
0 |
0 |
0 |
2 |
0.029 |
|
MEPS |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0.005 |
|
DnCNN |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0.007 |
|
DuRN |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.000 |
|
MIRNet |
0 |
4 |
0 |
2 |
6 |
2 |
3 |
3 |
5 |
6 |
2 |
3 |
1 |
3 |
2 |
3 |
1 |
0 |
1 |
4 |
0.124 |
|
COLA-Net |
0 |
0 |
0 |
0 |
1 |
0 |
6 |
1 |
0 |
0 |
1 |
1 |
3 |
1 |
2 |
0 |
1 |
0 |
1 |
0 |
0.044 |
|
SwinIR |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
0 |
0 |
0 |
5 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.019 |
|
Restormer |
0 |
1 |
0 |
7 |
3 |
8 |
0 |
0 |
2 |
5 |
2 |
6 |
3 |
6 |
2 |
0 |
1 |
0 |
0 |
4 |
0.122 |
|
DoubleUNet |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
3 |
0 |
0 |
0 |
0 |
0 |
0.010 |
|
W-Net |
0 |
1 |
0 |
1 |
1 |
1 |
2 |
0 |
0 |
1 |
1 |
2 |
2 |
0 |
1 |
0 |
1 |
0 |
2 |
0 |
0.039 |
|
StackUNet |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.007 |
|
TTSR |
0 |
6 |
0 |
0 |
2 |
4 |
1 |
2 |
2 |
3 |
0 |
0 |
0 |
3 |
0 |
0 |
0 |
1 |
0 |
1 |
0.061 |
|
RefVAE |
0 |
0 |
0 |
0 |
2 |
0 |
0 |
0 |
1 |
0 |
2 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.012 |
|
MASA |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
1 |
0 |
3 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.017 |
|
DATSR |
0 |
1 |
0 |
0 |
1 |
0 |
0 |
1 |
1 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.015 |
|
Ref-IRT |
13 |
10 |
14 |
12 |
5 |
7 |
6 |
5 |
8 |
8 |
4 |
7 |
10 |
4 |
9 |
13 |
13 |
12 |
12 |
12 |
0.448 |
3.
Performance of MPENet
In
Ref-IRT, the MPENet was employed to predict the three distortion parameter
values of a multi-degraded image, which were used to add the same level of
distortions to the reference image for better content/texture matching. Here,
we evaluate the performance of MPENet in distortion parameter estimation. To
this end, the PLCC/SROCC values between the estimated distortion parameters and
the recorded ground-truth distortion parameters for each dataset image were
calculated. Table 3 shows the PLCC/SROCC results computed for each distortion
type in the three datasets (i.e., CUFED5 [3], WR_SR [4], XRIR). A large value
of PLCC/SROCC closer to one indicates a better prediction.
Table 3. PLCC and SROCC values computed
for each distortion type in the three testing datasets.
Dataset |
CUFED5 |
WR_SR |
XRIR |
||||||
Matrix |
Blur |
Noise |
JPEG |
Blur |
Noise |
JPEG |
Blur |
Noise |
JPEG |
PLCC |
0.980 |
0.999 |
0.999 |
0.966 |
0.999 |
0.999 |
0.967 |
0.999 |
0.998 |
SROCC |
0.985 |
0.998 |
1.000 |
0.961 |
0.998 |
1.000 |
0.970 |
0.998 |
1.000 |
As
observed in Table 2, in general, all distortion parameter values can be well predicted,
though the performance on Gaussian blur is a bit weak. This is due to the fact
that the blur artifact can be easily destroyed by the subsequent noise and
compression distortions. Also, the non-strict boundary between a blurry image
and the pristine version adds another difficulty for blur parameter estimation.
For example, a pristine image that focuses on a specific object can also
display blur in the surrounding areas. Despite such an inaccuracy, we believe
that the proposed MPENet is qualified for the distortion parameter estimation
task, because only similar distortions are required for the reference image to
achieve a decent matching result.
4. More details about Ref-IRT+
(1) Network architecture of MPENet
To
fill in the performance gap of Ref-IRT in dealing with synthesized distortions
and real-world applications, we additionally trained our method on images
corrupted by using a practical degradation model [1]. In addition to JPEG
compression, this new degradation model considers different Gaussian blur
kernels (i.e., isotropic and anisotropic Gaussian kernels) to generate the blur
distortion, different Gaussian noise models (i.e., the channel-independent
additive white Gaussian noise (AWGN), the gray-scale AWGN, and the general
case) to generate the noise distortion, and different sequential orders of
distortions being added to images to expand the degradation space.
Figure 3. Network architecture of the modified MPENet in Ref-IRT+.
To
enable our approach to work on this new degradation model, the distortion
parameter estimation (DPE) block in MPENet has to be modified since more
distortion parameters are required which include (1) the 2×2 covariance matrix of the multivariate normal distribution for
generating the isotropic/anisotropic Gaussian kernels; (2) the 3×3 covariance matrix
of the Gaussian noise model; and (3) the
quality parameter
for the JPEG compression. Since
and
are symmetric matrices, the number of the
distortion parameters for the three distortion types are three, six, and one,
respectively. Also, the MPENet has to predict the sequential order of the blur
and noise distortions being added to the image, because the blur distortion can
help reduce the perceived noise strength. Accordingly, the modified MPENet is
shown in Figure 3, in which (a) is used for
predicting the sequential order, and (b) is used for predicting the ten
distortion parameters. Note that (a) and (b) are fed by the same three feature
vectors, which are obtained from the three average pooling layers in front. Besides,
the modified MPENet takes as input the RGB color image, instead of the
luminance image, because either the channel-independent AWGN or the grayscale
AWGN or the generalized AWGN were added to the three channels.
(2) Visual results on different distortion
types/intensities
Ref-IRT+
was tested on three distortion types (blur + JPEG, noise + JPEG, blur + noise +
JPEG) at three distortion intensities (mild, moderate, severe), and thus nine
distortion scenarios were considered in the test. Figure 4 shows the visual results of Ref-IRT+ tested on
sample distorted images generated from the pristine images in the CUFED5
dataset. The first column denotes the input images; the second and third
columns denote the results produced by the first and final stages of Ref-IRT+.
As can be observed, the reference image is less likely to help when images are
corrupted by noise and JPEG compression. Also, when images are mildly
distorted, the reference image might be unnecessary, because a deep CNN model
might be good enough for recovering the mildly-distorted
image contents.
|
Blur + JPEG |
||
Mild |
|
|
|
Moderate |
|
|
|
Severe |
|
|
|
|
Noise + JPEG |
||
Mild |
|
|
|
Moderate |
|
|
|
Severe |
|
|
|
|
Blur + Noise + JPEG |
||
Mild |
|
|
|
Moderate |
|
|
|
Severe |
|
|
|
|
Input |
Ref-IRT-I+ |
Ref-IRT+ |
Figure 4. Visual results of Ref-IRT+ tested on sample distorted
images generated from the pristine images in the CUFED5 dataset with different
distortion types and intensities.
5.
Test on real-world images
We
also tested our algorithm on real-world images. To this end, images taken from
the LIVE Challenge dataset and captured by our own camera were used for
testing. For testing on LIVE Challenge, the reference images were randomly
selected from the 127 pristine images in the LIVE [5], CSIQ [6], and CBSD68 [7]
datasets. For testing on images of our own, the target image was captured by a
web camera which is of lower quality, and the reference image was captured by
another high-quality camera on the similar scene.
Visual results of Ref-IRT+ tested on sample images taken from the
LIVE Challenge dataset [2] are shown in Figure 5. Visual
comparison of Ref-IRT vs. other MDIR methods tested on the image captured by
our own camera is shown in Figure 6. As can be observed,
Ref-IRT+ is able to remove the blur, noise, and compression
artifacts in real-world images thanks to training with the practical
degradation model [1]. Also, by referring to a reference, Ref-IRT is able to
achieve better results than other MDIR methods.
Input |
Ref-IRT+ |
Input |
Ref-IRT+ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 5. Visual results of Ref-IRT+ tested on sample real-world images from the LIVE Challenge dataset.
|
|
|
|
Input |
RL-Restore |
OWAN |
HOWAN |
|
|
|
|
RMBN |
MEPS |
DnCNN |
DuRN |
|
|
|
|
MIRNet |
COLA-Net |
SwinIR |
Restormer |
|
|
|
|
DoubleUNet |
W-Net |
StackUNet |
TTSR |
|
|
|
|
RefVAE |
MASA |
DATSR |
Ref-IRT |
Figure 6. Visual comparison of Ref-IRT vs. other MDIR methods tested on a real-world image captured by using a web camera.
References
[1] K. Zhang, J. Liang, L. Van Gool, and R. Timofte, “Designing a practical degradation model for deep blind image super-resolution,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 4791–4800.
[2] D. Ghadiyaram and A. C. Bovik, “Massive online crowdsourced study of subjective and objective picture quality,” IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 372–387, 2015.
[3] Z. Zhang, Z. Wang, Z. Lin, and H. Qi, “Image super-resolution by neural texture transfer,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7982–7991, 2019.
[4] Y. Jiang, K. C. Chan, X. Wang, C. C. Loy, and Z. Liu, “Robust reference-based super-resolution via c2-matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2103–2112, 2021.
[5] H. R. Sheikh, Z. Wang, A. C. Bovik, and L. K. Cormack, “Image and video quality assessment research at LIVE,” Online, http://live.ece.utexas.edu/research/quality/.
[6] E. C. Larson and D. M. Chandler, “Most apparent distortion: full reference image quality assessment and the role of strategy,” Journal of Electronic Imaging, vol. 19, no. 1, p. 011006, 2010.
[7] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in IEEE International Conference on Computer Vision (ICCV), vol. 2, 2001, pp. 416–423.