Creating a dataset is first and arguably most important part of model training. Using bad or wrong dataset cause bad results no matter what you change in settings. This guide will teach you how to find or create a good dataset for different type models.
What Is The Dataset?
Training Dataset is source of our models, it's essential to use correct size, tagging and edit to create good dataset.
- Usually 20~40 images are enough for many type of LoRAs but Checkpoint (Model) training require a lot more image (minimum 100, which is max of SeaArt's dataset), so it's recommended to only focus on training LoRAs at the moment to save our time and credits. If you are going to create detailed LoRA you can prefer to use more images but remind that, More Images Doesn't Mean Better LoRA. It's best option to choose various images with same content for our LoRAs.
- Dataset can be found in web images but also can be created via AI image generation. Dataset must have no wrong, bad part so it's kinda tricky to use AI to create dataset. However, if you trust your skills and luck you are free to create dataset with image generation.
Selecting/Creating Images For Dataset
There is a different type LoRAs and they require different type of images but there is a few important point which applies to all datasets.
Can i use something can't be generated with AI?
Images should be something can be generated on AI. (Doesn't include main subject of LoRA) For example, you want to create a character LoRA of a game character. It's okay to can't generate that character with AI but you must be able to generate that spesific image with AI for best possible results. Character: Frieren (can't be generated with model you want to use.) Dataset must be images AI could generate if it knew who is Frieren.
Examples
Image of Frieren standing and smiling to viewer
- Ai can create standing and smiling characters even if it's not Frieren, so image can be used.
Portrait of Frieren with sad expression
- Ai can create portrait of a woman with sad expression even it's not Frieren, so image can be used.
Image of Frieren's upper body stucked in mimic(chest monster)
- Many of models can't create proper image of an human got stuck in a mimic even if it's not Frieren, so image can't be used for models that can't generate it.
Let's say i want to create a style LoRA, then style is not problem if AI can't generate it but it must be able to generate subject.
Style: Dark, gothic style (can't be generated with model you want to use.)
Examples
Illustration of a castle with dark, gothic style
- Illustration of a castle can be generated with AI even if it's not style you want, so image can be used.
Image of a half apple, half cat armored cyborg with dark, gothic style
- Many models fail at generating half apple, half cat armored cyborg even it's not style we want, so image can't be used.
It can sound absurd but it's very important for correct tagging and best LoRAs comes with correctly tagged datasets. Main point is, Image must doesn't include any confusing, hard to generate part in itself unless it's not the subject of your LoRA.
Selection of Dataset for Character LoRA
- Your character's style must be similar to model you want to use. For example:
Anime style Naruto LoRA on Anime Model: Dataset must be anime style images. Photorealistic Luffy LoRA on Photorealistic Model: Dataset must include realistic images of subject not anime style.
- Usually 25-40 image of character is enough, if you are not going to create a LoRA with a lot of detail. Using a lot of image can cause overcooked LoRAs.
- Use same character images with a few different poses, angles, views, clothes, expressions, background etc. and combine them to create various images.
Example
Character: Luffy from One Piece
Pose: Standing, sitting, action poses (could be waving hand, peace sign with hand, wide smile with closed eyes or anything you desire but must be something can generated on AI)
Angle/View: Front view, upper-body shot, full-body shot, portrait, back view etc.
Clothes: Original clothes of Luffy, formal wear, t-shirt and jeans, suit etc.
Expressions (recommended to use in only portraits or close views): Smile, angry, sad, confused, shy etc.
Background: City, beach, sea, ship, home etc.
Images
- upper-body shot, Luffy, smiling, standing, formal wear, beach
- upper-body shot, Luffy, sad, crying, waving to viewer, original clothes, ship-full-body shot, Luffy, sitting, t-shirt and jeans, home ...
After finding/creating spesific images with combination of different terms, now you have a good dataset with various images. It increase flexibility and generation capability of LoRA.
If you use portrait in all images, it will push portrait shots and cause deformations in different angles. If you use original clothes of Luffy in every image, it will tend to create straw hat, red jacket like clothes everytime even you don't want to.
If you use beach as a background in every image, it will generate images with beach background even if you don't ask for it.
- Sweetpoint is using a term max in 3-6 images. Angle/View can be used more, for example:
Dataset with 30 image: 12 portrait shot, 4 beach background, 4 smiling expression, 6 image with original clothes, 4 backview, 6 upper body, 8 full body, 6 sitting, 8~10 standing etc.
- Try to find/create different combinations, don't use similar tags for one term. If you create six image of Luffy with orijinal clothes and all of them are back view then, either you can get back view while trying to create him with original clothes or you can get his clothes while generating back view such as red jacket, straw hat even if you don't ask for it.
Selection of Dataset for Style LoRA
- Base model you are going go use must be flexible or similar to style you want to create.
- Subject must be something AI can create.
- Use various images with different subjects but same style.
- 30-40 image is enough for a style LoRA.
- 15-20 image, if style of your images are confident and similar. Can be used for pixel art like styles.
- +50 image, if you are going to make very detailed flexible style LoRA that can be used on any subject. (Pick different subjects as much as possible)
Example for 30-40 image dataset, values can be adjusted depending on your total image count.
Note: All images must have same style (desired style to create LoRA)
Image counts are suggested by myself, depending on what people prioritize and generate with AI. You can change values for different purposes but if you are going to make standart style LoRA, i highly recommend to use similar image count.
- 10-14 woman image (different characteristics, angles, views, poses, clothes, background)
- 8-10 man image (different characteristics, angles, views, poses, clothes, background)
- 6 landscape/scenery image with atleast 3 different place (without any main object)
- Various objects with different views, can be an fruit, car, house, pen, fire or anything you desire. Important part is having same style in images. Good for big datasets to increase flexibility of LoRA.
Crop Mode
Crop is important to make our dataset in correct resolution. There is 3 different option for crop mode and they have different effects.
Center Crop: Take center of image as a reference point and crop your images to correct size.
Focus Crop: Take main subject as a reference point and crop your images to correct size.
No Crop: Doesn't crop your images, it can be used if images are already in correct size.
- Focus Crop is recommended for character LoRAs, if images are not in correct size.
Resolution
- 512x512, 512x768, 768x512 is recommen ded for Stable Diffusion 1.5 LoRAs. Although, you can use 768x768 if you want to create highly detailed LoRA.
- 1024x1024 is recommended for SDXL, Flux and Stable Diffusion 3.5 training.
Tagging The Dataset
Tagging is very important for better LoRAs, it can take some of your time but results will be worth of your time.
How tagging works and what does it mean?
It can look confusing in first sight but it's actually pretty similar to how we prompt in image generation. To explain it basically;
- We have a image in dataset but AI doesn't know what is it.
- We add tags to image in dataset, basically we create a prompt without any order so AI understands and learns what prompt (tags) created that image. Also that's why things can be generated with AI is recommended because if AI is not capable of generating that image with given tags, you will just teach AI to something it can't generate and it cause a lot of wrong/deformed generation.
- After that, AI learn that tags and image to be able create similar results with its dataset.
Tagging Algorithm
We can set tags by ourselves or use a tag algorithm to make it faster and easier for us. There is two different tagging algorithm we use in creating dataset.
BLIP
This system is more similar to natural language, it uses small and incomplete sentences to identify our images. It's recommended to use it on SDXL based models and Photorealistic style models. It's best option for Flux and SD3.5 trainings but tagging them by yourself is probably better, if you have enough time and patience.
Deepbooru
A tagging system that uses booru
website (Danbooru, Safebooru, e621 etc.) tags to identify our images. They are certain terms that used for tagging these website sharings and have pretty big database. It's recommended to use it on SD1.5 based LoRA creations and especially on Anime/Furry/Cartoon based models.
Tagging Threshold
It can be used between 0-1, adjusts the tag count and description level of algorithm.
- Lower values create more tag and aims to describe every detail of image, it sounds good but it add tons of unnecessary tag to database and it's something we don't want.
- Higher values create less tag and only identify general details of image, it can cause bad results to have undefined details in tagging process.
- Using it between 0.5-0.8 is recommended depending on what you make and how much detailed you want to tag the dataset images.
Editing Tags And Usage of Trigger Word
Unfortunately auto-tagging system is not 100% correct so we have to edit our database with removing unnecessary tags and add important tags. Most essential part of LoRA is trigger words
because they include the data we want to teach to AI.
Trigger Word
Trigger Word must be used for best working LoRAs, it's very important part of prompting phase.
Trigger word is activation key of LoRAs, it's basically tags doesn't written to data of image but shown in image. Consistent parts of LoRA must be identified by trigger words not with their own tag. For example, our character has a yellow hair and we want to make LoRA of her. Tag data must not include yellow hair, blonde
like terms so AI can take them as a part of trigger word.
How Should I Edit Tags?
- First thing to do is removing unnecessary tags from our LoRA, it can be tag of an small detail like mole, necklace etc. It's mostly depend on what you prioritize and try to create.
- Secondly, we have to remove tags of consistent details. If we want to have blonde character LoRA, then remove yellow hair, blonde like terms. If we want to get an ink illustration style LoRA, then remove tags related to ink, illustration, art style etc.
- Last step, add important tags that describe image but they must be something not related to your LoRA's main subject.
For example, add tags related to background, lighting, blur, pose, expression that describes image for a character LoRA, or add tags that describe main subject of image with other details for a style LoRA.
End of Guide
Now your dataset has correctly prepared images with important tags that describe image as a prompt does so AI will understand what created that images.
Also your trigger word
will be placeholder of your LoRAs main subject (missing tags of images you used), with this way AI will learn what does your trigger word
means and educate itself on this topic.
Example
LoRA: Naruto Character LoRA
Image: Naruto standing in front of a tree and smiling
Tags: 1boy, standing, tree, smiling etc.
Trigger Word: Naruto
Trigger word is placeholder of all spesific characteristics of Naruto such as yellow hair, blue eyes, cheek marks etc.
Don't add these type characteristics to tags so AI will consider your Trigger Word
as a activation key of these features.
That's why we actually train LoRAs, to make our Trigger Word
an activation key and meaningful word.