This week in multimodal ai art (09/04 - 15/04)

Follow multimodalart on Twitter, come hang on our Discord and consider supporting on Patreon

Some people said that after the insane two last weeks, this week was 'boring' for the space. I have to disagree. Although no big papers or giant breakthroughs, it was a very eventful week for the open source world.

Text-to-Image synthesizers:

- GLID-3 XL Filtered LAION-400M (code+model, Kaggle Notebook)

by Jack000, Kaggle Notebook by Lite

GLID-3, reported here two weeks ago, on it's XL iteration, released a watermark and blurry image filtered CompVis' LAION-400M dataset.

- ruDALLE Arbitrary Aspect Ratio (code+model, colab)

ruDALLE is a DALLE replication in the russian language, this model allows people to generate amazingly coherent images that are not only in square format. The author is also exploring generating images in arbitrarily large resolutions too πŸ‘€

- DALLE Mega (W&B Training page)

Boris Dayma's DALLE 1 replication, famous for the DallE Mini replication is getting a mega training on a 2.6B parameter model trained on a self-curated dataset by the author. This process may take around 1 month

- CLOOB Latent Diffusion fine tuned datasets

Leveraging CLOOB's Latent Diffusion easy and fast fine tuning code (reported here before), two fine tuned datasets were released. danbooru (anime image board web-scraping, Github)

Wikiart (open source collection of public domain art, Google Colab) (by Jonathan Whitaker)

CLOOB Latent Diffusion is also getting better with the newly released LAION 5b KL autoencoder by @rivershavewings that allows for training and fine-tuning a higher range of datasets on it.

- RQ-VAE now works with less VRAM (GitHub)

Two weeks ago we reported on RQ-VAE, one of the drawbacks was it was VRAM hungry, requiring 32GB+ of VRAM. Some modifications to the code enable it to run on simpler machines. Colab and Spaces coming soon!

- MidJourney deployed v2 model (Apply to their beta)

MidJourney is a restricted access AI art platform that is currently in beta that has a secret sauce to guiding the models to stunningly good quality results. They released a new version of its model that after some tuning it is now consistently producing results the community deems even better than before. (Apply for their beta here)

- Disco Diffusion v5.2 with VR mode (Colab)

The beloved Disco Diffusion model got an upgrade that now can generate left and right images for each frame that can be hooked to a VR headset. A "warp" version of the Notebook has also been released by devdef and that improves init videos

- 2D Animation Enabled JAX CLIP Guided Diffusion v2.7 (Colab)

Huemin's amazing JAX CLIP Guided Diffusion edit now has a 2D animation mode!

New CLIP and CLIP like models:

LiT - Locked-image Tuning (GitHub, Blog post)

by Google

LiT is a new model that brings CLIP-like capabilities (ability to match image-text pairs) to already pre-trained image models.

Learning AI Art:

AIAIArt course (GitHub, Discord)

AIArt is a free and open source AI art course by John Whitaker. There are synchronous classes for the next few Saturdays 4 PM UTC on Twitch. All previous classes stay recorded and available on Google Colabs on the GitHub link, this Saturday (16/04) they will cover transformers for image generation, the theory behind models such as DALLE 1. Check their Discord out for more information