Some people said that after the insane two last weeks, this week was 'boring' for the space. I have to disagree. Although no big papers or giant breakthroughs, it was a very eventful week for the open source world.
GLID-3 XL Filtered LAION-400M* ruDALLE Arbitrary Aspect Ratio* Dall-E Mega started training* CLOOB Latent Diffusion fine tuned datasets* RQ-VAE works with less VRAM* MidJourney new v2 model Disco Diffusion v5.2 with VR mode Jax CLIP Gudied Diffusion v2.7 Huemin Edit with animation mode
* code released
GLID-3, reported here two weeks ago, on it's XL iteration, released a watermark and blurry image filtered CompVis' LAION-400M dataset.
Generate any arbitrary aspect ratio images using the ruDALLE models!https://t.co/nQbJPy7Hz1
— Alex Shonenkov (@AVShonenkov) April 14, 2022
Colab & Kaggle notebooks are available now pic.twitter.com/8lJ9OqIdVX
Got a bit scared as I had trouble running inference on dalle-mega (such a big model) π±
— Boris Dayma π₯ (@borisdayma) April 15, 2022
Finally was able to! 2.5 days of training
βββββββββββββββ 8% pic.twitter.com/ghr33bMQy7
I've released my danbooru CFG model for CLOOB Conditioned Latent Diffusion, available at the link below:https://t.co/5huujUrA9g pic.twitter.com/lLkqrkXWF7
— John David Pressman (@jd_pressman) April 12, 2022
Wikiart (open source collection of public domain art, Google Colab) (by Jonathan Whitaker)
"[Spring|Summer|Autumn|Winter] landscape"
— Johnowhitaker_Art (@JohnowhitakerA) April 7, 2022
No other prompts. This is what it's good at. All the same seed so you can see roughly similar layouts. pic.twitter.com/rcetpRbtEh
CLOOB Latent Diffusion is also getting better with the newly released LAION 5b KL autoencoder by @rivershavewings that allows for training and fine-tuning a higher range of datasets on it.
Two weeks ago we reported on RQ-VAE, one of the drawbacks was it was VRAM hungry, requiring 32GB+ of VRAM. Some modifications to the code enable it to run on simpler machines. Colab and Spaces coming soon!
MidJourney is a restricted access AI art platform that is currently in beta that has a secret sauce to guiding the models to stunningly good quality results. They released a new version of its model that after some tuning it is now consistently producing results the community deems even better than before. (Apply for their beta here)
A persian miniature painting of a mecha-robot
— multimodal ai art (@multimodalart) April 16, 2022
[@midjourney diffusion v2 (left), v1 (right)] pic.twitter.com/yyQl8yxtYZ
The beloved Disco Diffusion model got an upgrade that now can generate left and right images for each frame that can be hooked to a VR headset. A "warp" version of the Notebook has also been released by devdef and that improves init videos
Huemin's amazing JAX CLIP Guided Diffusion edit now has a 2D animation mode!
LiT is a new model that brings CLIP-like capabilities (ability to match image-text pairs) to already pre-trained image models.
Remember our LiTπ₯ (Locked-image Tuning) paper?
— Lucas Beyer (@giffmana) April 14, 2022
We have just released:
- An in-browser live demo: https://t.co/FEF63hwnTc ; it's addictive, share your results #lit_demo!
- Blog: https://t.co/EFb7JAOXiE
- Code&models: https://t.co/s5LmS1Z9LU
- Colab: https://t.co/SdKoHqPcJX
1/ https://t.co/hGmMet2KWU
AIArt is a free and open source AI art course by John Whitaker. There are synchronous classes for the next few Saturdays 4 PM UTC on Twitch. All previous classes stay recorded and available on Google Colabs on the GitHub link, this Saturday (16/04) they will cover transformers for image generation, the theory behind models such as DALLE 1. Check their Discord out for more information
Training text-to-img models is fun! Even when they don't work, the outputs are at least more interesting than a bad accuracy figure or garbled text.
— Johnowhitaker_Art (@JohnowhitakerA) April 13, 2022
Want to know the details of this particular bad idea? You'll have to wait for lesson 6 of AIAIART to come out this weekend ;) pic.twitter.com/ggd90la54R