Introduction:
As the SD community lost faith in SD3 and started shifting to Flux, SD3.5 finally launched, and surprisingly, it didn’t disappoint. Released on June 12, 2024, the much-anticipated SD3 Medium weight model generated a lot of buzz. While the initial test image ("girl on the lawn") triggered some laughs, the SD3 model showed significant improvements in prompt adherence and text generation compared to SDXL. However, it still struggled with realistic human figure generation, a critical need for many users, preventing it from securing a stronghold. On July 5, StabilityAI promised an "improved SD3 Medium," but after multiple delays, user expectations faded. Then, on August 1, 2024, Flux rolled out a model with better prompt adherence and human rendering, quickly becoming the new favorite and shifting the focus of the SD community towards Flux.
Four months later, SD3.5 Large made an unexpected comeback, despite users having largely migrated to Flux. Initial reactions were skeptical, but over the following days, feedback revealed that SD3.5's image quality was surprisingly good. In my own tests, SD3.5’s color rendering and lighting effects approached the standards of Midjourney, showcasing substantial potential.
In this article, let’s take a closer look at SD3.5's capabilities!
1、 Setting Up SD3.5 – A Quick Guide:
●Go to stabilityai/stable-diffusion-3.5-large, download the sd3.5_large.safetensors model file by clicking the download arrow, and place it in the ComfyUI directory under models/checkpoint.
●https://huggingface.co/stabilityai/stable-diffusion-3.5-large/tree/main Go to the text_encoders link and download clip_g, clip_l, and t5xxl (choose fp8 or fp16 based on your GPU capacity; if you have 16GB+ VRAM, opt for fp16 for better quality), and place them in the models/clip directory.
●https://huggingface.co/stabilityai/stable-diffusion-3.5-large/tree/main/text_encoders Download the official ComfyUI workflow, load https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/SD3.5L_example_workflow.json in ComfyUI, select your downloaded models, and adjust parameters. (I recommend limiting it to 24 steps, as SD3.5's CFG is sufficient and doesn't require 40 steps as suggested in the official workflow.)
Now you can start using the generated images! However, all images in this article were generated using SeaArt.
Flux text to image: https://www.seaart.ai/workFlowAppDetail/cqofr0le878c73dckp1g
SD 3.5 text to image: https://www.seaart.aiworkFlowAppDetail/cscbdlte878c73amh300
2. Image Generation Comparison
➢ Photorealistic Themes
1.Professional Photography / Cinematic Portraits
Prompt: "a cinematic front view photo of a slim white male dryad emerging from a tree, his eyes closed with his head lowered, facing the viewer with his back on the tree. His arms and chest are made of green branches and white flowers, his hair made of brown vines and branches, his body fused with the tree trunk, his skin covered in moss and leaf, his shoulders and collar bone resembling pale human skin. The photo is taken with a 35mm lens capturing the essence of golden hour."
In this round, SD3.5 outperformed Flux in terms of color and lighting, even approaching the quality of Midjourney V6.1. While Flux adhered better to details like “front view,” “golden hour,” and “hair made of branches,” SD3.5’s overall aesthetics were superior, which is rare for open-source models. The challenge of balancing aesthetics and detail is common among models, as even user-tuned Flux outputs struggle to match the original Midjourney results. Additionally, some complex prompts can only be generated through paid channels on SD’s official Replicate platform, while local runs struggle to replicate the same outcomes, even with adjustments to CFG or Dynamic Thresholding.
Prompt: "full color portrait photo of a 20yo woman laying underwater, sunlight casting realistic water caustics on her face and body, she's wearing a gauze white dress."
In this round, SD3.5 showed better results overall, despite not being perfect. Previous versions of SD and Flux struggled with “water caustics,” which only Midjourney and Ideogram could handle well. Although SD3.5's water caustics weren’t flawless, they were more realistic compared to Flux, which appeared as if the character was in a regular environment with artificially added water effects. Additionally, Flux’s clavicle shadows were overly rigid, while SD3.5's interpretation was closer to a natural underwater scene.
Prompt: "a top-down close-up photo of a slim attractive woman playing the piano, the camera focusing on her hands, she is wearing a red and black plaid skirt, the piano is shiny and black. The photo is taken in a bright room with soft diffused sunlight."
In this round, both SD3.5 and Flux were evenly matched. Flux excelled in hand structure rendering, a well-known strength, but its generated fingers were overly uniform. In contrast, SD3.5 delivered more natural skin details, though the color was slightly too red, making the overall look somewhat off. On the piano keys, Flux demonstrated better precision in spacing and alignment, while SD3.5's color and lighting maintained an aesthetic appeal, but the right side of the piano was incomplete, affecting the final output.
2. Casual / Amateur Style Portraits
Prompt: "Amateur phone photo of a slim attractive white man, wearing unbuttoned long-sleeve white dress shirt and black pants. He is sitting on the front edge of bed, hand running through his hair, while looking at viewer. The photo was taken in a bedroom at night, the bedroom is dimly lit with warm color tone, the only light source is the bedside table lamp. The photo was posted to reddit in 2012. The image is grainy jpeg with motion blur and soft focus, a snapshot taken by amateur with deep focus, added digital sharpening and blurry, diffused poor lighting."
In this round, SD3.5 fell short across the board, failing to meet most of the prompt requirements. The generated figure did not appear slim, wasn't positioned at the front edge of the bed, and did not show the hand running through the hair. The lighting was also off, lacking the expected dim ambiance with only the table lamp as the light source. Overall, the image quality was closer to that of SDXL, even featuring the infamous “legs sunk into the bed” issue typical of SDXL. In short, this test highlighted SD3.5's inability to capture the casual, amateur vibe that the prompt demanded.
Prompt: "amateur side view photo of a slim white woman, she is cooking in kitchen and wearing a white apron, underneath the apron is a plain grey tshirt, she is looking at the food in the steel sauce pot, her head lowered, holding a wooden spoon in her right hand. The photo was taken from her left side by an amateur, taken with a smartphone in 2015, in a modern kitchen with soft diffused indoor lighting at night."
Again, SD3.5 struggled significantly in this round, especially with the hand rendering, which resulted in awkward, unrealistic shapes. Even Flux faced issues with achieving proper depth of field for this prompt. However, SD3.5's output was noticeably weaker overall, with unrealistic hair volume, inaccurate side eye rendering, and an incorrect pot handle shape—all failing to meet the prompt's expectations.
3.Architecture / Objects / Animals
Prompt: "low angle close-up photo of the Eiffel Tower, on a sunny day in Paris, center composition."
In this round, SD3.5’s performance was relatively weak, as it struggled with structural accuracy and details, which are crucial in architectural rendering. Flux, on the other hand, managed to produce a better-defined image, accurately capturing the architectural details and overall composition. This made the round a clear victory for Flux, as SD3.5's attempt fell short on nearly every level.
Prompt: "a photo of a seal bicolor ragdoll cat, it is facing camera, standing on its hind legs on a blue pillow, holding out one paw, wearing a wizard hat and a purple wizard robe, casting spells with its paw, silver sparkles swirling around its paw. The photo is taken in a spring garden in the morning, with bright diffused natural lighting."
SD3.5 finally managed to pull ahead in this round. While both models delivered similar quality, Flux suffered from overly exaggerated depth of field, while SD3.5 produced a more balanced image that closely matched the prompt's details. Though SD3.5's cat paws were not perfectly rendered, it did manage to deliver the overall aesthetic and key elements of a point ragdoll. Flux, however, misunderstood the “seal bicolor ragdoll cat” descriptor, missing the specific breed details. SD3.5’s interpretation of the “silver sparkles” was also more accurate compared to Flux’s golden sparkles.
Prompt: "Photograph of a majestic cake adorned with intricate fondant decorations inspired by ocean waves. The whole cake has a base color of modest dark blue, surrounded by swirls of light blue layers shaped like ocean waves, the layers closing in from bottom to top, forming a curved shape like a blooming rosebud. On the outer side of the cake, pink and purple corals decorating the bottom of ocean wave fondant, resembling a beautiful tiara. The photo is taken in a room with simple dark background."
Here, SD3.5 shone with its aesthetic prowess, displaying striking gradients of blue-green hues, even though it did not fully adhere to the prompt regarding coral placement and rosebud shape. Its color transitions were appealing, creating a visually pleasing output. In contrast, Flux adhered more closely to the rosebud shape, but the execution of the ocean waves was less refined and lacked the desired aesthetics. Additionally, the corals appeared misplaced in the final rendering. While SD3.5's image had a minor issue with the purple ribbon at the cake base, this could easily be corrected with post-editing.
➢ Art Styles
1.Manga / Anime / Doodles
Prompt: "1980s Retro manga style illustration of a slim young white man, with messy wavy light brown hair and fair skin, his head tilted to the side, his face clean-shaved. The image only portrays his face and chest. He is wearing a long clothing made of white waterlily flowers and green leaves, the thick layers of leaf covering his whole body, contrasting his blue eyes, a laurel leaf wreath on his head, while he looks at viewer. He is in a summer forest at dusk, soft diffused sunlight shining on him."
Both SD3.5 and Flux struggled to accurately depict the “1980s Retro manga” style. SD3.5 produced an image resembling generic modern illustrations, while Flux's output was also off-mark, deviating from classic 1980s manga aesthetics. However, Flux managed to better interpret the prompt elements like “long garment made of waterlily flowers and leaves,” and delivered a more appropriate background. The training data for both models seems to lack sufficient material on this specific art style, making it difficult to declare a clear winner.
Prompt: "Vector clipart of a fluffy orange cat sitting on an office chair, facing a computer moniter, its one paw placed on keyboard, one paw placed on mouse, turning to look at viewer, simple pale pink background, bold line style."
SD3.5 failed to deliver on this prompt. The cat wasn’t positioned correctly on the chair, nor did it have its paws on the keyboard and mouse as described. Moreover, the overall style did not match the “simple vector clipart” look that the prompt demanded. Flux, although generating a slightly cross-eyed cat, managed to meet most of the prompt requirements, making it the winner for this round.
Prompt: "Crayon drawing of a chubby white duck on top of a tubby orange cat on top of a small capybara. All three animals stacked vertically, on the grass of a sunny garden."
In this test, SD3.5 effectively captured the “Crayon drawing” aesthetic, portraying a rough, spontaneous style as described in the prompt. While SD3.5’s image leaned more towards a rough pencil or marker sketch, it maintained the playful, hand-drawn feel intended by the prompt. Flux, in contrast, generated overly clean lines and lacked the rough, naive charm of a crayon drawing. SD3.5’s rougher interpretation made it the clear winner here.
2. Other Art Styles
Prompt: "Retro 16bit pixel game art of a grumpy penguin with wings, facing the viewer while holding a large board that says "IT'S PENGUIN, NOT PENGWING", sitting on ice in antarctica, the image is nostalgic and pixelated with vibrant colors."
In this test, SD3.5 and Flux each had their strengths. Flux had a more pleasant color scheme, but SD3.5's output was closer to the classic retro pixel art feel: larger pixel blocks, brighter colors, and higher contrast, making it more authentic to classic 16-bit games. While both models handled the prompt relatively well, the results were ultimately a tie, as each had its own advantages.
Prompt: "3D animation movie scene of a grumpy penguin with wings, facing the viewer while holding a large board that says "IT'S PENGUIN, NOT PENGWING", sitting on ice in antarctica, DreamWorks style."
Both SD3.5 and Flux succeeded in generating 3D animated scenes that matched the prompt, but each had notable flaws. SD3.5 rendered detailed wings, but the face appeared awkward, reducing the overall realism. Flux’s penguin design was closer to DreamWorks' “Madagascar Penguins” style but had issues with the wing connections. While SD3.5 had peculiar hand-like details on the wings, it was Flux’s more recognizable styling that took the upper hand here.
Prompt: "A renaissance oil painting of a strange creature that resembles a chimera of fish and cat, the creature's upper body is a white angora cat, its lower body looks like tropical fish with iridescent fish scales. The creature is swimming in the sea, the water is dark blue colored."
In this final test, SD3.5 delivered a solid performance, better interpreting the chimera concept and providing a more natural depiction of the cat's upper body. Though Flux’s fish body rendered well with authentic oil painting textures, its cat rendering was overly exaggerated and lacked natural proportions. SD3.5 also successfully depicted the “dark blue” water color, which Flux missed. This round went to SD3.5 for its better alignment with the prompt.
That’s it for the SD3.5 vs. Flux review! If you have different insights or more examples of generated images, feel free to share your own reviews and join the conversation!