And I thought last week couldn't be toppled with RQ-VAE, GLID-3, LAION-5B, StyleGAN XL, Make-a-Scene, CLIPMatrix and more. Couldn't have been MORE wrong! What can I say about this insane rhythm of multimodal ai art releases? Enjoy the ride, I guess.
Dall-E 2 Latent Diffusion LAION 400M* KNN Diffusion CLOOB Guided Latent Diffusion YFCC CFG*
Video Diffusion Text2Live TATS: Time Agnostic VQGAN*
* code released
The newest iteration of Dall-E is out. With results that baffled the community, the new model takes the CLIP Guided Diffusion approach to the next level. No code or model released. There's a waiting list to use their closed demo.
Animal helicopter chimeras generated with DALLĀ·E 2: pic.twitter.com/5b8a9iq3k9
— Aditya Ramesh (@model_mechanic) April 7, 2022
Who wants to start a band?
— multimodal ai art (@multimodalart) April 5, 2022
"The album cover of Agressive Kittens" [Latent Diffusion LAION_400M] pic.twitter.com/VUcsj2srJ8
the milky way on a milk bottle - Latent Diffusion LAION-400M pic.twitter.com/hytg6bap9P
— multimodal ai art (@multimodalart) April 4, 2022
KNN-Diffusion: Image Generation via Large-Scale Retrieval
— Aran Komatsuzaki (@arankomatsuzaki) April 7, 2022
Achieves SotA in human evaluations and outperforms GLIDE by using retriever + diffusion.https://t.co/WSxzbmqjNx pic.twitter.com/jGJTTb4OEj
Another model to try: CLOOB-conditioned latent diffusion. Github: https://t.co/KHALGF1OsC
— Johnowhitaker_Art (@JohnowhitakerA) April 3, 2022
A quick notebook I made to generate images with the pretrained model: https://t.co/hZbFOaCx9q
Thanks to CLOOB it trains without captions! Looking forward to some fine-tuning experiments :) pic.twitter.com/3Zb4rUPo4k
[div class="small]by Google[/div] A text-to-video model based on a new "video diffusion approach" that can generate short video snippets.
Video Diffusion Models
— hardmaru (@hardmaru) April 8, 2022
Amazing video samples generated from a text-conditioned video diffusion model.https://t.co/2aMO4HMHBchttps://t.co/6d6I1qYH94 pic.twitter.com/LCwyLKCj4e
Text2LIVE: Text-Driven Layered Image and Video Editing
— AK (@ak92501) April 7, 2022
abs: https://t.co/kpR1mOVate
project page: https://t.co/HG83iuwBPr pic.twitter.com/39QeAof24N
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
— AK (@ak92501) April 8, 2022
abs: https://t.co/ZHHIgEXRJX
project page: https://t.co/XXQk3LCSk0 pic.twitter.com/PL8sMJBx5D