SRTGAN: Triplet Loss based Generative Adversarial Network for Real-World Super-Resolution

Dhruv Patel1*
Abhinav Jain1*
Simran Bawkar1
Manav Khorasiya1
Kalpesh Prajapati1
Kishor Upla1
Kiran Raja2
Raghavendra Ramachandra2
Christoph Busch2

Presented at the 7th International Conference on Computer Vision & Image Processing 2022
* denotes equal contribution
1 Sardar Vallabhbhai National Institute of Technology (SVNIT), Surat, India.
2 Norwegian University of Science and Technology (NTNU), Gjøvik, Norway.


    Real World applications (surveillance, forensics, etc.) require HR images, but these are not available in most situations due to limitations of optical sensors and their cost, giving rise to the need for Single Image Super-Resolution (SISR).

    We tackle this problem by proposing a Triplet loss-based GAN for Real-world Super-Resolution (SRTGAN) , which exploits the information from LR image through triplet loss formulation and improves the adversary and perceptual quality of generated images.

    The below images show true LR and corresponding bicubic downsampled LR image from ground truth HR.
    Real SR Dataset
    Div2KRK Dataset


  • We propose a Triplet loss-based patch GAN, a generator trained in a multi-loss setting and assisted by a patch-based discriminator.
  • We have implemented a Triplet-based adversarial GAN loss, which exploits the information provided in the LR image (as a negative sample). This allows the patch-based discriminator to better differentiate between HR and LR images; hence, improving the adversary.
  • Training is performed on a fusion of content (pixel-wise L1 loss), GAN (triplet-based), Quality Assessment (QA), and perceptual losses, leading to superior quantitative and subjective quality of SR results.


Proposed Framework

Our proposed framework consists of 2 major components:

  • Generator: It is trained in a multi loss setting comprising of 3 different modules:
    • LLIE: comprises a convolutional layer to extract low-level edge and structural information.
    • HLIE: comprises of 32 Residual-In-Residual (RIR) blocks and a convolutional layer to extract high-level information.
    • SRRec: omprises of an upsampling block and 2 conv layers to reconstruct spatial dimensions same as input.

  • Discriminator: A PatchGAN based discriminator network to distinguish foreground and background on patch with scale of 70x70 pixels.
    Generator Network
    Discriminator Network

Quantitative Analysis

  • Generally, for comparison of SR results obtained using the proposed method with other state-of-the-art methods, PSNR and SSIM values are estimated, which are the standard measurements for the SR problem.
  • However, these metrics do not entirely justify the quality based on human perception. Therefore, we estimate an additional metric, called LPIPS which is a deep network based full-reference perceptual quality assessment score. A low LPIPS value indicates better visual quality.
  • The quantitative comparison of the proposed and other existing SR methods on RealSR validation and DIV2KRK datasets

Qualitative Analysis

    The comparison of the SR results obtained using the proposed and other state-of-the-art methods on RealSR validation dataset

    The comparison of the SR results obtained using the proposed and other state-of-the-art methods on DIV2KRK dataset

    The comparison of the SR results obtained using the proposed and other state-of-the-art methods on DIV2KRK dataset


If you have any questions, please reach out to any of the above mentioned authors.