One model can implement multiple image modification tasks

One model can implement multiple image modification tasks! The author of this paper combines techniques such as self-attention GAN and gradually increasing training methods to achieve stunning old photo colorization effects.

Image colorization, image enhancement, restoration of old images, etc. are hot issues in the field of computer vision, however, there are not many studies that use one model to implement multiple tasks well.

Recently, GitHub user Janson Antic released a project called DeOldify, which is an artifact for colorizing and restoring old photos.

Address: https://github.com/jantic/DeOldify

Let's see the effect first:

Maria Anderson as the Fairy Fleur de farine and Lyubov Rabtsova as her page in the ballet “Sleeping Beauty” at the Imperial Theater, St. Petersburg, Russia, 1890.

Woman relaxing in her livingroom (1920, Sweden)

Medical Students pose with a cadaver around 1890

Surfer in Hawaii, 1890

Whirling Horse, 1898

Interior of Miller and Shoemaker Soda Fountain, 1899

Paris in the 1880s

Edinburgh from the sky in the 1920s

(For more images of the results, please click the link at the end of the article to view)

technical details

This is a deep learning based model. Specifically, I combined the following methods:

Self-Attention Generative Adversarial Network (https://arxiv.org/abs/1805.08318). Apart from the fact that the generator is a pretrained Unet, I just made a few modifications to make it have spectral normalization and self attention. At first I tried to implement a version of Wasserstein GAN, but it didn't work, after switching to this version everything was fine. I like the theory of Wasserstein GAN, but it is not successful in practice, so I like Self-Attention GANs.

The training structure is inspired by Progressive Growing of GANs (https://arxiv.org/abs/1710.10196). But it's not exactly the same, the difference is mainly that my version has the same number of layers - just incrementally changing the size of the input and adjusting the learning rate to make sure the size conversion is successful. The end result is basically the same - faster, more stable training, and better generalization.

Two Time-Scale Update Rule (https://arxiv.org/abs/1706.08500). It's also very simple, just one-to-one generator/critic iterations and a higher critic learning rate.

Generator Loss consists of two parts: one is the basic Perceptual Loss (or Feature Loss) based on VGG16, which basically just biases the generated model to replicate the input image. The second part is the critic's loss score. For curious, Perceptual Loss alone is not enough to produce good results. It tends to just encourage a bunch of brown/green/blue, basically it's just cheating on the test, neural nets are great at doing that! The key thing to realize here is that GANs are actually learning a loss function, which is really a big step forward from the ideals we are after in machine learning. Of course, you usually get better results when you hand over something that was previously coded by humans to machine learning. This project is just that.

What's amazing about this mod is that it's useful for a wide variety of image modifications and should do it well. The above example is the result of the shading model, but this is only one part of the pipeline, and more tasks can be developed with the same model.

The next task I developed with this mockup was to fix old images to make them look better, so my next project was the "defade" mockup. I have made my initial efforts and am in the early stages of model training as I write this. Basically, it just trains the same model to reconstruct the image, using exaggerated contrast/brightness adjustments to enhance the image as a simulation of faded photos and photos taken with old/broken equipment. I've seen some promising results:

How to start this project

That's the gist of this project - I want to use GANs to make old photos look better and more importantly, make the project useful.

This project was built with the Fast.AI library. Unfortunately, using an older version, I haven't upgraded to the newer version. So the prerequisites are:

Old version of Fast.AI library: https://github.com/jantic/fastai

Fast.AI's existing dependencies: There are already convenient requirements.txt and environment.yml

Pytorch 0.4.1 (requires spectral_norm, so latest stable version).

Jupyter Lab

Tensorboard (i.e. install Tensorflow) and TensorboardX (https://github.com/lanpa/tensorboardX). It's important to note that by default, progress images are written to Tensorboard every 200 iterations, so it's always convenient to see what the model is doing.

BEEFY graphics card. Mine is a GeForce 1080TI (11GB), really wish there was more RAM than 11GB. Both Unet and Critic are very large, but the bigger the better.

For those who want to start converting their own images right away: ...well, you'll need to wait for me to upload the pretrained weights first. Once available, they can be referenced in visual notebooks. I will use ColorizationVisualization.ipynb. You just need to replace colorizer_path=IMAGENET.parent/('bwc_rc_gen_192.h5') with the weights file of the generator (colorizer) I uploaded

Assuming you're running this program on a GPU with enough memory (eg a 11 GB GeForce 1080Ti), I'd keep the size around 500px. If the RAM is less than 11GB, you can downscale the image, or try running on the CPU.

Special Cable

Special Cable,Cable Special,Special Electrical Cables,Special Cables Industries

Shenzhen Bendakang Cables Holding Co., Ltd , https://www.bdkcables.com

This entry was posted in on