In previous articles I discussed the opportunities for machine learning in digital asset management (DAM). Most of these articles focused on leveraging machine intelligence to gain insights into asset content, and use that insight into the digital asset management, publishing and analytics processes. Examples that were highlighted were the automatic extraction of metadata from an image and detecting the location of a face to perform better cropping.
A different area is the concept of using machine learning to generate or extend digital assets. A recent article in the New York Times touched on this. The unique challenge around using machine learning for this purpose is that it is hard to actually test whether the machine learning algorithm has done a good job.
For example, it is relatively easy to verify whether the discovered metadata of an image is correct, and you could easily create a training set with labeled responses for the network. For a generated image, it is harder to determine whether this image is as good as a real image. Often times, there is not just one good answer. In many cases, there are many or even infinite options that are good.
To address this, the concept of generative adversarial networks (GAN) was developed. A GAN consist of two machine learning algorithms (neural networks), one to generate an image, and another assessing whether the image is real or generated. The 2 algorithms compete with each other. So instead of getting a training set to improve and learn, one network is tested against another network, allowing for unsupervised learning.
Generative Adversarial Network (GANs)
GANs were first introduced by Ian Goodfellow et al. in a paper published in 2014. The publication of this paper generated a lot of momentum, and we have seen some great ideas in the use of GAN’s for digital assets.
Some interesting examples are:
Generate high resolution images based on lowres. Use machine learning to generate a high resolution image from a lower resolution version. An example code repository: https://github.com/tensorlayer/SRGAN
Transform images. Automatically transform an image into a related image. For example, transform a horse into a zebra, a summer landscape into a winter landscape. An example code repository: https://github.com/junyanz/CycleGAN
Transform styles. - Transform one image into another based on a provided style image. An example code repository: https://github.com/luanfujun/deep-photo-styletransfer
Generate images. Automatically create real looking images. The generation of “celebrity” images was one of the highlighted areas in the referenced New York Times article, and the video below highlights some of output from that work.An example code repository: https://github.com/facebook/eyescream
Generate video frames. Use still photos to generate a video, or extend a video to generate additional video frames based on prior content. The image below from a Facebook post outlines the difference between the results computed from a traditional neural network vs a GAN (the adversarial GDL network) An example code repository: https://github.com/cvondrick/videogan
Improved image manipulation. Support the image editing process to make all edits look more natural.An example code repository: https://github.com/junyanz/iGAN
Generate images based on text. Automatically generate an image based on the provided textual description. A good summary of the techniques used for this are in this Microsoft post An example code repository: https://github.com/hanzhanggit/StackGAN
GAN & Digital Asset Management
The examples above show some interesting new possibilities. Many of these will have a benefit in a DAM context. Obviously, improved asset editing tools built into the DAM will be a first step. The digital asset management system can also start incorporating a cognitive ability to interpret more abstract commands (e.g. ‘make this person look happy’) to enhance assets.
Going beyond the manual tools, the DAM itself might get an ability to automatically optimize and improve photography leveraging machine learning.
A frequent challenge for many clients is to scale asset production. When assets can be generated on demand, an ability to generate (related) images might become a built in function in digital asset management systems. This will open up new opportunities where asset production is currently a barrier. Richer personalized experiences online experiences, with unique image variations for each individual, can become a possibility.
Another interesting aspect could become digital rights management for assets. If an original photo is used for feeding the neural network, does its rights apply to the output as well?
Online shopping sites will be able to further enhance their ability to create rich immersive imagery and video to highlight products. Again, this can be connected to the visitor, creating a more individualized product catalog.
With the ability to generate unique natural looking variations of images, they can become their own “natural” QR code, providing another way to identify and track a visitor.
In the end, the concept of stock photos might evolve over time as well. With these new capabilities, high quality stock photos can potentially be generated, allowing us to move away from these overused standard photos.