SRTGAN: Triplet Loss based Generative Adversarial Network for Real-World Super-Resolution

Dhruv Patel¹*

Abhinav Jain¹*

Simran Bawkar¹

Manav Khorasiya¹

Kalpesh Prajapati¹

Kishor Upla¹

Kiran Raja²

Raghavendra Ramachandra²

Christoph Busch²

Presented at the 7th International Conference on Computer Vision & Image Processing 2022

* denotes equal contribution
1 Sardar Vallabhbhai National Institute of Technology (SVNIT), Surat, India.
2 Norwegian University of Science and Technology (NTNU), Gjøvik, Norway.

[Paper]

[GitHub]

Motivation

Single Image Super-Resolution (SISR)

Triplet loss-based GAN for Real-world Super-Resolution (SRTGAN)

Overview

We propose a Triplet loss-based patch GAN, a generator trained in a multi-loss setting and assisted by a patch-based discriminator.
We have implemented a Triplet-based adversarial GAN loss, which exploits the information provided in the LR image (as a negative sample). This allows the patch-based discriminator to better differentiate between HR and LR images; hence, improving the adversary.
Training is performed on a fusion of content (pixel-wise L1 loss), GAN (triplet-based), Quality Assessment (QA), and perceptual losses, leading to superior quantitative and subjective quality of SR results.

Modules

Our proposed framework consists of 2 major components:

Generator: It is trained in a multi loss setting comprising of 3 different modules:
- LLIE: comprises a convolutional layer to extract low-level edge and structural information.
- HLIE: comprises of 32 Residual-In-Residual (RIR) blocks and a convolutional layer to extract high-level information.
- SRRec: omprises of an upsampling block and 2 conv layers to reconstruct spatial dimensions same as input.
Discriminator: A PatchGAN based discriminator network to distinguish foreground and background on patch with scale of 70x70 pixels.

Generator Network

Discriminator Network

Quantitative Analysis

Generally, for comparison of SR results obtained using the proposed method with other state-of-the-art methods, PSNR and SSIM values are estimated, which are the standard measurements for the SR problem.
However, these metrics do not entirely justify the quality based on human perception. Therefore, we estimate an additional metric, called LPIPS which is a deep network based full-reference perceptual quality assessment score. A low LPIPS value indicates better visual quality.

The quantitative comparison of the proposed and other existing SR methods on RealSR validation and DIV2KRK datasets

Qualitative Analysis

The comparison of the SR results obtained using the proposed and other state-of-the-art methods on RealSR validation dataset

The comparison of the SR results obtained using the proposed and other state-of-the-art methods on DIV2KRK dataset

Contact

If you have any questions, please reach out to any of the above mentioned authors.