We have been having a great time lately experimenting many AI art and music generation tools. Mostly through the use of Google Colab and Python. It’s endless fun to create using these tools with unique text prompts as input. We are working to integrate these tools into our design process. Our designers have been using it as inspiration to help design custom UI panels. One of our goals is to create music videos. This is all still in its infancy but is rapidly growing and becoming very powerful.

2001: A Space Odyssey visualized as a single image via VQGAN

Big Mac from the year 1750 imagined with Midjourney

VQGAN+CLIP

VQGAN+CLIP is a text-to-image neural network that generates stunning animation and images from just text prompts. The implementation of VQGAN+CLIP that we used, takes in a string of text prompts like this:

'Apple': {10: 1, 20: 0}, 'Orange': {10: 0, 20: 1}

This prompt tells the animation to show apples starting at frame 10 then fade them out by frame 20 and fade in oranges from frame 10 to frame 20

VQGAN+CLIP Chronological Animations

Since this model allows you to keyframe your text prompts, we wanted to see if we could create an animation with a chronological story. To create the animation below, we gave the model a string of chronological prompts like this:

'sunrise ': {1: 1, 90: 0, 720: 0},'mid-day blue sky': {1: 0, 90: 1, 180: 0, 720: 0},'evening sunset': {1: 0, 90: 0, 180: 1, 270: 0, 720: 0},'starts in the night sky': {1: 0, 180: 0, 270: 1, 360: 0, 720: 0},'early morning sunrise': {1: 0, 270: 0, 360: 1, 450: 0, 720: 0},'summer day clear sky': {1: 0, 360: 0, 450: 1, 540: 0, 720: 0},'bright sunset': {1: 0, 450: 0, 540: 1, 630: 0, 720: 0},'night time sky': {1: 0, 540: 0, 630: 1, 720: 1}

As you can see, this creates the feeling of days and nights passing.

Creating the text prompts in this format can be time consuming so we created a tool that will average each prompt out based on the max length of frames then output the string for you.

DALL-E

DALL-E is a neural network by openAI that creates incredibly accurate images from text captions. While VQGAN+CLIP produces more abstract images of the given prompts, DALL-E is able to create both abstract and near photorealistic images in just about any style you can think of.

Here we gave the prompts “3D logo of the word “M1”

Here we asked "a photorealistic photo of a squid wearing a shirt with the word "M1" standing on the streets of Chicago”

A squid head on a human body wearing a shirt that says "M1" created by DALL-E

And here we prompted “an abstract painting of squids swimming through the streets of Chicago”

But if we want to change the style, we can simply ask “a 3D render of squids swimming through the streets of Chicago”

3D Ken Burns Effect

While the previous AI tool mentioned are great for generating new content, the 3D Ken Burns Effect is able to bring new life to existing content. Using this technique, you can turn a single still image into an animated three dimensional space. Below are a few examples where we applied this effect onto landscape images created in DALL-E.

Audio ai tools

Music Transformer

Music Transformer is a neural network created by Magenta AI. This model is able to generate music and export the notes as a midi file. Below is an example of a midi file created by Music Transformer and performed in FL Studio.

GANsynth

GANsynth is another AI tool created by Magenta AI. This model is able to generate audio using generative adversarial networks based on a given midi file. The model can also interpolate the audio across multiple timbres. Below, is an example track created with GANsynth using the Music Transformer midi file mention above. As you can hear, GANsynth creates the audio for the given progression while interpolating between various piano and vocal timbres.

After creating multiple progression track using GANsynth, I brought them into FL studio to create a song for our VQGAN+CLIP animations.

Author: Joey Morello

AI Art Experiments - Part 1