Model

From Civitai Wiki
Revision as of 14:36, 6 February 2024 by MajMorse (talk | contribs) (redrafted "model" to expand on (some) of the available types of model commonly found in early SD usage)
Jump to navigation Jump to search


In the context of Stable Diffusion and AI as a whole, a "model" refers to a computational framework designed to perform tasks like generating text, images, or making predictions based on input data. These models are trained on vast datasets and use complex algorithms to learn patterns, features, and relationships within the data. In stable diffusion and similar generative AI models, the focus is on creating or transforming media (e.g., images, text) in novel ways.

Types of Model

There are several file types for models that you may run into. As AI is still developing rapidly, new technologies are added all the time.

Base Model (Checkpoint)

Also refered to as a Checkpoint, in the context of AI, a base model is the foundational neural network that has been pre-trained on a large dataset to learn a broad understanding of its domain (e.g., text, images). For stable diffusion models, the base model is trained to understand and generate images based on textual descriptions. This model serves as the starting point for further customization or fine-tuning for specific tasks or to improve performance on certain types of data.

LoRA (Low-Rank Adaptation)

LoRA is a technique used to adapt pre-trained models with minimal changes to their parameters, making the models more flexible for specific tasks without the need for extensive retraining. In the context of generative models like stable diffusion, LoRA can be applied to fine-tune the model on a smaller, specialized dataset to improve its ability to generate specific types of images or styles without compromising the overall knowledge learned during pre-training.

Textual Inversion

This concept involves training a model to associate specific, often novel, terms with particular visual concepts or styles that are not present in the base model's training data. In stable diffusion, textual inversion allows users to create custom "tokens" or "keywords" that can be used in prompts to generate images with unique attributes or styles specific to a smaller dataset. It effectively extends the model's vocabulary to include new, user-defined concepts.

Embedding

In AI, an embedding is a representation of data (e.g., words, images) in a high-dimensional space that captures the relationships between data points in a way that is meaningful for machine learning tasks. Embeddings allow models to understand and process complex input data (like text or images) by mapping them to vectors in a space where similar items are closer together. In stable diffusion, embeddings are crucial for interpreting textual prompts and translating them into visual elements in the generated images.

File Types:

Below are some of the most popular model file types which are in use today:

type extention notes popularity
Safetensor .safetensor Created to patch the security hole in .ckpt, can't execute code. currently most popular model type on Civitai.
Checkpoint .ckpt A "one file" version of a diffusion model, criticized for potential code vulnerability (potential to execute code if maliciously modified after creation). These show in the Civitai download as PickleTensor files. Somewhat distrusted, most people prefer safetensor
diffusion (folder of files) Oldest of the model types listed here, diffusers have a folder structure with multiple files that make up the entire model. Not available through Civitai, but often show up on Hugging Face.