• tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    8 months ago

    While it looks nice, I’m kind of bothered by fragmentation.

    There are several different Web frontends for Stable Diffusion that I’ve looked at, all of which have some nice attributes:

    • Automatic1111. This is pretty straightforward, and widely-used. Extension system with many extensions.

    • Easy Diffusion. This supports queuing – not just setting up a batch, but adjusting settings and then firing off a new job while a previous one is still running.

    • ComfyUI. This provides a powerful interface to the directed acyclic graph that is built by frontends. Working with this is slower, but provides a lot more control over what’s happening (especially if one wants to create a process and tweak earlier steps in the process and re-run them).

    • Now this ENFUGUE, which it looks like has a focus on looking more like a traditional image-manipulation app – like, I see standard, non-AI image manipulation settings treated as what looks like first-class citizens there. Menus.

    To some extent, there are mutually-incompatible approaches being used, and that diversity isn’t bad…but I kind of regret that this is causing duplicate effort and some functionality in one not to be available elsewhere.

    I guess what I’d like to see is something like the following:

    1. An extension API that can generally be used with multiple front-ends, so that someone writing an extension for one can have their extension be used elsewhere. Obviously, that won’t work for everything – some Automatic1111 extensions are tightly-tied to the UI. But others, like adding a new “module” that takes a given set of inputs and has a given set of outputs, could be provided that target some least-common-denominator API, maybe with the ability to provide per-frontend information to take advantage of features on a given frontend.

    2. At least access to something like the directed acyclic graph that ComfyUI uses, so that one can have an image built using multiple steps, but where it’s easier to go back and tweak an earlier step in the process and rerun the whole thing. I think that for many people, that isn’t an intuitive way to work with things…but many software packages internally use something like that, even if they present a different interface on it. Adobe Photoshop doesn’t (well, that I’m aware of, haven’t used it in many years) show a DAG, but by using layers, one can generate a “process” and then go back and tweak earlier steps in that process, though the user doesn’t see something like the DAG.

    3. A sandbox for extensions. There are a lot of people writing new extensions. I think that it’d be nice to be able to use untrusted extensions, particularly since it’d let Person/Company A, which owns a bunch of servers and parallel processing hardware, let Person/Company B, who just wants to use the hardware, to add their own extensions to the mix. If extensions represent a security hole, they can’t do that. As things stand, if someone wants to use, say, Midjourney as part of a workflow with custom steps, they have to shuffle stuff back and forth from their local system to Midjourney. I think that Stable Diffusion could do better.

    4. Standardizing, as far as is practical, export and import of data. At the beginning, there wasn’t a lot – like, an image is entirely defined by a prompt. That’s easily portable across frontends. But then say there’s inpainting happening. Runs with different models in Automatic1111 img2img (manually) or ComfyUI. Various extensions running. As this gets more-complex and the workflow on a given image longer, people are going to want to save and restore state, and it’d be nice to do so, insofar as possible, without splitting things up among different frontends. My ideal would be something like a tarball or similar containing a DAG and images for things like inpainting masks.