Coin World News Report:
Stability AI may be starting its path to redemption. After the disappointing release of SD3 Medium, they are back with two new models: Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Turbo.
In a statement, Stability said, “In June, we released Stable Diffusion 3 Medium, the first public version of the Stable Diffusion 3 series. This version did not fully meet our standards or the expectations of our community.”
In an official blog post, they further explained, “Instead of quickly fixing the issues based on valuable community feedback, we took the time to further develop a version that advances our mission of transforming visual media.”
Before rushing to write this breaking news, we generated some images for testing โ and the results were very promising, especially for the base model.
The SD 3.5 series is designed to run on consumer-grade systems, even on low-end systems by some standards, making advanced image generation more accessible than ever before. Yes, they have heard the complaints about the previous version, so this one is expected to be even better – so much so that their featured image is a woman lying on grass, mocking the horror show.
This situation happened earlier when faced with similar challenges.
Image: Stability AI
Another significant aspect of this version is the new licensing model. Stable Diffusion 3.5 adopts a more lenient license, allowing for both commercial and non-commercial use. Small businesses and individuals with revenues below $1 million can use and build these models for free.
Companies with higher revenues must contact Stability to negotiate fees. In comparison, the Black Forest Lab offers its low-end Flux Schnell for free, the medium Flux Dev for free and non-commercial use, and its SOTA model Flux Pro as a closed-source model (for reference, Flux is generally considered the best open-source image generator currently, at least in the post-SDXL era).
What are the benefits of Stable Diffusion 3.5?
Stability AI has released three versions of Stable Diffusion 3.5, each catering to different needs:
Stable Diffusion 3.5 Large: This is the largest one, with 8 billion parameters, designed to provide top-notch image quality and strict real-time compliance. It is tailored for professional use, especially at a resolution of 1 million pixels, but can handle various styles and visual formats.
Stable Diffusion 3.5 Large Turbo: For those who want to trade a bit of quality for speed, this streamlined version of the Large model is the preferred choice. It generates high-quality images in just four steps, unlike the regular SD3.5, which takes about 30 steps to generate high-quality images. This is comparable to Flux Schell.
Stable Diffusion 3.5 Medium: This model is coming soon, with 2.5 billion parameters and optimized for consumer hardware. It offers a middle ground for users who need stability performance at resolutions ranging from 0.25 to 2 million pixels without sacrificing customization.
These models are more flexible, allowing users to fine-tune them according to specific creative needs. If you’re concerned about whether your consumer-grade GPU can handle it, Stability AI has got you covered. Our own tests showed that Large Turbo can output images in about 40 seconds on an RTX 2060 with 6GB VRAM.
The non-quantized full-fat version takes over 3 minutes on the same low-end hardware, but that’s the price for quality.
Improvements under the hood
Stability AI is catching up to Flux, the preferred model for customization. To improve the user experience, Stability has reimagined the behavior of SD 3.5. Stability said, “When developing the models, we prioritized customization to provide a flexible foundation. To achieve this, we integrated query key normalization into the transformer blocks, stabilizing the model training process and simplifying further fine-tuning and development.”
In other words, you can now adjust and improve these models more easily than before, whether you’re an artist looking to create custom styles or a developer building AI-driven applications. Stability even shared a LoRA training guide to help get things started faster.
LoRA (Low Order Refinement Adaptation) is a technique for fine-tuning models to focus on specific concepts, be it style or subject, without retraining the entire large base model.
Image: Same generation without LoRA vs. with LoRA affecting child node pose. Image source: Jose Lanz
Of course, flexibility comes with some trade-offs. The models are now so creative that Stability warns, “A lack of specific prompts may increase uncertainty in the output, and aesthetic levels may vary.”
If you’re still skeptical about Stable Diffusion 3.5 and its “uncertainty” makes you hesitate, here are some future-oriented suggestions โ it supports “negative prompts,” meaning your prompts can include instructions of what not to do. This is a great boon for those who want to improve text and image generation effortlessly.
It’s also a good addition for those who want more control over their generation. Moreover, it seems to excel at handling prompts in the old SDXL style. In fact, in some aspects, the prompt style of SD3.5 is closer to MidJourney than Flux, allowing users to achieve results without the need for a linguistic degree.
Aside from customization, Stable Diffusion 3.5 has made progress in other areas:
Fast compliance: In terms of adhering to user input, the Large model now rivals larger models and leads in the image generator field. Stability guarantees SD 3.5 Large outperforms Flux.1 Dev in quick adherence but still lacks in aesthetic quality.
Image quality: We’re talking about generating images that can handle resource-intensive models without consuming GPU memory. In Stability’s benchmark tests, Flux.1 Dev slightly outperforms, but SD 3.5 Large is more efficient and consumes fewer resources. SD 3.5 Large Turbo is on par with Flux.1 Schnell in terms of adherence and quality.
Style diversity: Stable Diffusion 3.5 can handle a wide range of styles, whether it’s 3D rendering, photo-realistic images, line art, or painting styles. It covers a broader range of styles than Flux โ at least in our quick tests.
Yes, it’s worth mentioning that it has not been audited. SD3.5 Large can produce certain types of content, including nudity, but it’s not too difficult, although it’s not perfect. The model has not been intentionally limited, providing users with ample creative freedom (although it may require fine-tuning and specific prompts for optimal results).
This was heavily criticized when SD3 was released and pointed out as a major reason for its difficulty in understanding anatomy. We can confirm its ability to generate NSFW images; however, the model is not on the same level as the best Flux fine-tunings but comparable to the original Flux model.
But fair warning: Although SD3.5 is powerful, you NSFW furry artists should not expect the Pony Diffusion model to be quick or even considered. The creators of the most popular and powerful NSFW models have confirmed their lack of interest in developing a fine-tuned SD3.5. Instead, they opt for Auraflow as their base. They may consider Flux once it’s complete.
For the tinkering enthusiasts, ComfyUI now supports Stable Diffusion 3.5, allowing local inference using a workflow based on signature nodes. There are plenty of workflow examples available. If you’re struggling with reducing RAM but want to try the full SD3.5 experience, Comfy has launched an experimental fp8 scaling model that reduces memory usage.
What’s next?
On October 29th, we’ll get Stable Diffusion 3.5 Medium, and shortly after, Stability promises to release Control Nets for SD 3.5.
Control Nets promise to bring advanced control features tailored for professional use cases, likely taking SD3.5’s capabilities to a new level. If you want to learn more about them, you can read our summary of the SD 1.5 guide.
However, using controls allows users to do things like selecting poses for subjects, playing with depth maps, reimagining scenes based on doodles, and more.
So, is Stable Diffusion 3.5 a Flux killer? Not entirely, but it certainly starts to look like a contender. Some users will still nitpick, especially after the debacle of SD3 Medium. However, with better anatomical handling, clearer licensing, and significant improvements in compliance and output quality, it’s hard to argue against this being a big step forward. Stability AI is learning from past mistakes and moving towards a future where advanced AI tools are more accessible to everyone.