Generating Abstract Artwork with Python

Introduction

In this project we investigated, researched and created a portfolio of generative, abstract artwork using GANs (generative adversarial networks), a deep learning technique, with Python and PyTorch. We further utilized a range of different technologies such as MASSIVE and Weights & Biases to attain the best possible solution in producing high quality artwork. Additionally, all development was tracked with version control software, Git and Gitlab.

With no pre-existing knowledge on GANs, our project began with intensive research in gathering a core understanding in how GANs work, specifically how they correlate to generating images.

There are many great articles online discussing the fundamental knowledge of traditional GANs with generators, discriminators and with their corresponding loss functions, however, in this article we hope to elaborate further and provide insight into a unique perspective on new recently published GAN models; and how we incorporated a few of them into a few of our own models.

Figure 1: High Level Overview of GAN Model

Note: Created using LucidCharts

Specifically we will provide a high level overview discussing ESRGANs (Enhanced Super-Resolution Generative Adversarial Networks) and DCGANs (Deep Convolutional Generative Adversarial Networks) while also showing how we integrated CANs (Creative Adversarial Networks) into our own models. 

GAN Models

DCGAN

DCGANs were come across in the initial stages of our research, which is an extension of a GAN, however it uses convolutional layers in the discriminator and convolution-transpose layers in the generator (Inkawhich, 2022). The idea is to use convolutional neural networks (CNNs) to ensure a stable architecture by upsampling and downsampling. It does not use max pooling or the method of fully connected layers.

In summary DCGAN works by (Hui, 2016):

  1. Replacing all max pooling with strides

  2. Use transposed convolution for upsampling

  3. Eliminate fully connected layers

  4. Batch normalisation except the output layer for the generator and the input layer for the discriminator

  5. ReLU in the generator

  6. LeakyReLU in the discriminator.

ESRGAN

After producing our abstract artwork, to increase the quality we utilized a pre-existing ESRGAN model created by Xintao Wang and his team created recently in 2018. 

Figure 2: GAN Generated Abstract Art 

Note: LHS is an image generated by MDN Team and on the RHS illustrates the enhanced image using the ESRGAN.

In essence, the ESRGAN model strives to achieve “consistently better visual quality” while also maintaining a “realistic and natural texture” (Wang et al., 2018, p. 1). 

Most importantly to note, the model focuses solemnly on realism as it effectively asks “whether one image is more realistic than the other” rather than a typical GAN which would compare if “whether one image is real or fake” (Wang et al., 2018, p. 2).

To effectively create the ESRGAN, it is required to improve the adversarial and perpetual loss in the current SRGAN (Super-Resolution Generative Adversarial Networks) by using techniques such as relativisitc GAN (“predict relative realness instead of absolute”), residual scaling, and smaller initialization (Wang et al., 2018, p.4).

CAN 

For a traditional GAN which looks at real and fake images, a CAN creates images based on maximizing deviation from style and minimizing deviation from art distribution. 

It generates variability based on adjusting the parameters in order to deviate from the dataset’s style and design. Given this characteristic and functionality in the model, the generated images are limited to its creativity and variability (Elgammal et al., 2017). 

Even though through this model, the artwork produced is not novel, we have adjusted parameters to produce simple abstract artwork.

Figure 3: CAN Model Diagram

Note: Created using LucidCharts - Modeled (Elgammal et al., 2017, p. 7)

Generator receives two signals from:

Signal 1: Discriminator’s classification of whether it is art or not (minimises). Images will emulate art.

Signal 2: How well the discriminator can classify generated art into established styles (maximises).

Discriminator is the most important model in training and learning inside this model as it accounts for both the “art distribution and art styles/classes (K)” (Elgammal et al., 2017, p. 9). This is determined through K-way loss.

Figure 4: CAN Generated Abstract Artwork

Note: Produced by team

Deep Learning Engineer Experiences

ALINA ERMOLAEVA 

In terms of my model, it is built upon DCGANs, taking abstract images and resizes them to 64x64 pixels. Other sizes such as 128x128 pixels were also tested, however the training time for the model was increased immensely. Therefore, 64x64 pixels was the desired size to allow more results to be generated with other changing hyperparameters. Like any other model, this one implements a generator and discriminator, where my discriminator uses convolution neural networks (CNNs) as its basic building block. The generator converts the image into a tensor which is known as transposed convolution (using ConvTranspose2d from PyTorch).

The discriminator training relies on the binary cross entropy loss function that determines how well it differentiates between real and generated images, which ultimately compares each of the predicted probabilities to the actual and can take a value of either 0 or 1.

Both the generator and discriminator models were trained at the same time, where the Adam optimiser was used. This is a replacement optimisation algorithm which changes the weights and learning rates of the neural network in order to reduce the losses.

As the images are being generated, the model learns from its previous created image and the resolution of the next image increases. 

Conclusion

We appreciate all the guidance and support from the entire MDN team. A huge thanks to Will Maclean, the Deep Learning Lead, for overshadowing the entire project throughout the semester and an honorable mention to Kamron for assisting, researching and supporting the team during the initial stages of the project.

REFERENCES

Elgammal, A. Liu, B. et al. (2017). CAN: Creative Adversarial Networks Generating “Art” by Learning About Styles and Deviating from Style Norms. https://arxiv.org/pdf/1706.07068.pdf

Hui, J. (2016). GAN — DCGAN (Deep convolutional generative adversarial networks). Retrieved from medium: https://jonathan-hui.medium.com/gan-dcgan-deep-convolutional-generative-adversarial-networks-df855c438f

Inkawhich, N. (2022). DCGAN TUTORIAL. Retrieved from PyTorch: https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html

Wang, X. Y, K. et al. (2018). ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. https://openaccess.thecvf.com/content_ECCVW_2018/papers/11133/Wang_ESRGAN_Enhanced_Super-Resolution_ Generative_Adversarial_Networks_ECCVW_2018_paper.pdf

Previous
Previous

Self driving cars: the new trolley problem

Next
Next

Discrimination, Fairness and AI: The Issue of Bias