Tag: image-generation

26 discussions across 10 posts tagged "image-generation".

AI Signal - April 28, 2026

LTX2.3 in Ostris Ai toolkit on a 5090 Training done in 7 hours r/StableDiffusion Score: 499

A developer shares optimized training settings for LTX2.3 LoRA training on RTX 5090, reducing training time to 7 hours while avoiding temporal collapses and maintaining accuracy. The detailed configuration walkthrough provides practical guidance for video model fine-tuning, representing the kind of community knowledge-sharing that makes local experimentation accessible.

#image-generation #training

AI Signal - April 21, 2026

Same prompt for various models - Chroma, Z image, Klein, Qwen, Ernie r/StableDiffusion Score: 315

Systematic comparison of image generation models (Klein 9b distilled, Zetachroma development version, and others) using identical prompts to evaluate which performs best with certain themes and approaches Midjourney quality. Workflows included in images for reproducibility. This represents valuable empirical model comparison beyond benchmark scores.

#image-generation #open-source

AI Signal - April 14, 2026

Free Open-Source Tool to Instantly Rig and Animate Your Illustrations (Also With Mesh Deform) r/StableDiffusion Score: 1226

The `see-through` model — released the week prior — decomposes a single static anime image into 23 separate layers for rigging. The author built an open-source tool on top of it that handles mesh deformation and animation, eliminating the need for expensive manual rigging. This makes professional-quality 2D character animation accessible without specialized software or large budgets. 0.98 upvote ratio on 81 comments.

#image-generation #open-source
Forget About VAEs? SenseNova's NEO-unify Achieves 31.5 PSNR Without an Encoder — Native Image Gen Is Coming r/StableDiffusion Score: 247

SenseNova's NEO-unify model operates directly on pixels without the conventional CLIP + VAE + diffusion architecture that has defined image generation since Stable Diffusion 1.0. It achieves 31.5 PSNR — a strong reconstruction quality score — eliminating the VAE bottleneck that causes color shift, detail loss, and latent space artifacts. If this architecture proves scalable, it could fundamentally change how image generation models are built.

#image-generation
Update: Distilled v1.1 Is Live (LTX-2.3) r/StableDiffusion Score: 518

LTX-2.3's distilled model gets a v1.1 checkpoint with improved audio quality and refined visual aesthetics. Updated ComfyUI workflows included. The 0.99 upvote ratio on 115 comments indicates this is a clean, uncontroversial improvement release. The companion post ([#29](/tags/29/)) provides a quantitative before/after comparison showing the audio mumbling issue from v1.0 is addressed.

#image-generation #open-source
ERNIE Image Released r/StableDiffusion Score: 168

Baidu released ERNIE Image and ERNIE Image Turbo on HuggingFace (baidu/ERNIE-Image and baidu/ERNIE-Image-Turbo). Low score but 88 comments and a 0.99 upvote ratio suggest genuine community interest. Another Chinese lab entering the open image generation space, worth tracking as a comparison point to FLUX and SD3.

#image-generation #open-source
LTX Distilled LoRA 1.1 vs. 1.0 Comparison r/StableDiffusion Score: 263

Side-by-side video comparison using identical settings and seeds, showing v1.1's improved audio output over v1.0's mumbling first-stage results. Provides the empirical before/after that complements the official release announcement ([#22](/tags/22/)). Useful for practitioners deciding whether to upgrade.

#image-generation

AI Signal - April 07, 2026

FLUX.2 [dev] works really well in ComfyUI now r/StableDiffusion Score: 254

ComfyUI's new low-VRAM optimizations enable FLUX.2 [dev] to run on consumer GPUs (RTX 4060Ti 16GB). While slower than Klein (75s vs 15s), it achieves superior character consistency across all open-weight image generation models.

#image-generation #open-source
Flux2Klein EXACT Preservation (No Lora needed) r/StableDiffusion Score: 254

ComfyUI-Flux2Klein-Enhancer node pack achieves exact character preservation without LoRA training by improving prompt adherence and style consistency. Demonstrates architectural improvements to FLUX.2 Klein's capabilities through better node configurations.

#image-generation #open-source
Ace-step 1.5XL's already up r/StableDiffusion Score: 98

Ace-step v1.5 XL released with ComfyUI support in nightly builds. Multiple variants available (turbo, merge, SFT) optimized for different speed/quality tradeoffs in image generation workflows.

#image-generation #open-source

AI Signal - March 24, 2026

daVinci-MagiHuman: This new opensource video model beats LTX 2.3 r/StableDiffusion Score: 359

New 15B open-source Audio-Video model from GAIR claiming to beat LTX 2.3. Expanding capabilities for local video generation with audio synchronization.

#image-generation #open-source

AI Signal - March 17, 2026

Showing real capability of LTX loras! Dispatch LTX 2.3 LORA with multiple characters + style r/StableDiffusion Score: 751

Impressive demonstration of LTX 2.3 LORA training with 440 clips from the game Dispatch, achieving multiple character and style preservation in text-to-video generation. The training included 6+ characters with distinct voices and game aesthetics. Shows progress in controllable video generation with LoRA fine-tuning.

#image-generation #open-source

AI Signal - March 10, 2026

ComfyUI launches App Mode and ComfyHub r/StableDiffusion Score: 334

ComfyUI introduced App Mode (internally called "comfyui 1111"), which transforms complex workflows into simple, shareable UIs. Users can select input parameters and create web UI-like interfaces from any workflow. ComfyHub provides a centralized workflow repository, lowering the barrier to entry for non-technical users while preserving ComfyUI's node-based power for advanced users.

#image-generation #open-source

AI Signal - February 24, 2026

ZIB vs ZIT vs Flux 2 Klein r/StableDiffusion Score: 250

Comprehensive comparison of Z-image Base, Z-image Turbo, and Flux 2 Klein across different prompt complexities and qualities. Tests both high-quality long prompts (overall generation quality) and short/low-quality prompts (creative gap-filling ability). Provides detailed visual comparisons and analysis of each model's strengths and weaknesses.

#image-generation #open-source
Just with a single prompt and this result is insane for first attempt in Seedance 2.0 r/singularity Score: 2841

User generated impressive Transformers-style video (plane transforming into robot and attacking city) using Seedance 2.0 with single Chinese prompt. The video shows Hollywood-level visual effects, mechanical detail, physics simulation, and destruction effects—all from one text prompt. This demonstrates rapid progress in video generation quality and complexity.

#image-generation
I created this time travel short scene using Seedance 2.0 in just one day for under $200. r/ChatGPT Score: 2129

Creator produced polished time travel short film using Seedance 2.0 in one day for under $200. Demonstrates accessibility of high-quality video generation for independent creators and rapid iteration capabilities. The speed and cost represent orders of magnitude improvement over traditional video production.

#image-generation

AI Signal - February 10, 2026

Qwen-Image-2.0 is out - 7B unified gen+edit model with native 2K and actual text rendering r/LocalLLaMA Score: 327

Qwen's new 7B image model combines generation and editing in a single pipeline with native 2K resolution and improved text rendering. Currently API-only but likely to receive open-weight release based on Qwen's track record with v1.

#image-generation #local-models
Simple, Effective and Fast Z-Image Headswap for characters V1 r/StableDiffusion Score: 1257

Workflow for character headswapping in Stable Diffusion with minimal variables to adjust. The simplicity and effectiveness make it accessible for users wanting consistent character transfer across images.

#image-generation
Seedance 2.0 Generates Realistic 1v1 Basketball Against Lebron Video r/singularity Score: 1999

Video generation showing dramatic improvements in physics simulation, body dynamics, and cloth simulation. Marks a significant step forward from models that struggled with acrobatic movements and realistic physics.

#image-generation
I asked AI to remodel my ugly apartment kitchen, then did it in real life...(photos) r/ChatGPT Score: 6255

Practical application of AI image generation for real-world design decisions, followed through to actual implementation. Demonstrates the practical utility of AI tools for visualization and planning.

#image-generation
Coloring Book Qwen Image Edit LoRA r/StableDiffusion Score: 357

LoRA trained for Qwen-Image-Edit that converts photographic scenes into coloring book art with high precision. Created as part of a Tongyi Lab + ModelScope hackathon with full training walkthrough available.

#image-generation
Did creativity die with SD 1.5? r/StableDiffusion Score: 373

Discussion lamenting the shift from artistic experimentation in early Stable Diffusion to current focus on photorealism. Questions whether AI art has become over-trained and market-driven rather than exploratory.

#image-generation

AI Signal - February 03, 2026

Qwen-Image2512 is a severely underrated model (realism examples) r/StableDiffusion Score: 889

Qwen-Image2512 delivers exceptional realism and responds particularly well to LoRAs, yet receives less attention than ZIT or Klein in community discussions. Users report it excels at realistic image generation and general refining tasks, offering quality that rivals more hyped alternatives.

#image-generation #open-source
Z-Image Edit is basically already here, but it is called LongCat r/StableDiffusion Score: 123

While the community awaits Alibaba's Z-Image Edit, Meituan's LongCat ecosystem offers comparable image editing capabilities now. LongCat uses a larger vision-language encoder (Qwen 2.5-VL 7B vs Z-Image's Qwen 3 4B), enabling the model to actually see and understand images during editing tasks, not just text descriptions.

#image-generation #open-source
New fire just dropped: ComfyUI-CacheDiT ⚡ r/StableDiffusion Score: 286

ComfyUI-CacheDiT delivers 1.4-1.6x speedup for Diffusion Transformer models through intelligent residual caching with zero configuration required. The optimization works transparently across DiT models with minimal quality impact, representing the kind of practical performance optimization that compounds across the ecosystem.

#image-generation #development-tools
New Anime Model, Anima is Amazing. Can't wait for the full release r/StableDiffusion Score: 360

Anima, a new anime-focused image generation model, shows impressive artist style recognition that users prefer over established alternatives like Illustrious or Pony. The model demonstrates strong prompt adherence and authentic style reproduction, though it's currently just a preview with the full trained version pending release.

#image-generation #open-source