The issue with not having negative prompts.

M0oP0o@mander.xyz · 6 months ago

The issue with not having negative prompts.

j4k3@lemmy.world · 6 months ago

Diffusion models do not parse natural language like a LLM. All behavior that appears like NLP is illusionary. For the most part. You can get away with some things because of what is present in the training corpus. However, any time you use a noun, you are making a weighted image priority. By repeating “shuttle” in this prompt, you’ve heavily biased to feature the shuttle regardless of the surrounding context. It is not contextualising, it is ‘word weighting’. There is a relationship to the other words of the prompt, but they are not conceptually connected.

In a LLM there are special tokens that are used to dynamically ensure that the key points of the input are connected to the output, but this system is not present in generative AI.

To illustrate, I like to download LoRA’s to use on offline models, I use a few tools to probe them and determine how they were made, like the tags used with training images, what base model was used, and the training settings they used. Around a third of LoRA’s I have downloaded contain natural language in the images that were tagged. This means the LoRA related term I use for generating should be done with natural language.

This is the same principal required for any model. You should always ask yourself, how often is this terminology occurring in the tags below an image. You might check out gelbooru or danbooru just to have a look at the tags system used there for all images. That is very similar to how training happens for the vast majority of imagery. It is very simplified overall.

The negative prompt is very different in how it is processed compared to the positive. If you look at the respective documentation for the tool you’re using, they might make some syntax available to create a negative line, but they likely want you to use their API with a more advanced tool.