Model

In the context of Stable Diffusion and AI as a whole, a "model" refers to a "computational framework" designed to perform tasks like generating text, images, or making predictions based on input data. These models are trained on vast datasets and use complex algorithms to learn patterns, features, and relationships within the data. In Stable Diffusion and similar generative AI models, the focus is on creating or transforming media (e.g., images, text) in novel ways.

Types of Model

When you start creating images using Stable Diffusion, there are several file types for models that you will need to use. Below are some of the models which are most commonly used in a typical Stable Diffusion set up to create new content:

Base Model (Checkpoint)

Also refered to as a Checkpoint, in the context of AI, a base model is the foundational neural network that has been pre-trained on a large dataset to learn a broad understanding of its domain (e.g., text, images). For stable diffusion models, the base model is trained to understand and generate images based on textual descriptions. This model serves as the starting point for further customization or fine-tuning for specific tasks or to improve performance on certain types of data.

Variational Autoencoder (VAE)

Variational Autoencoders are widely used in various applications, including image generation, anomaly detection, and as a tool for learning compact representations in unsupervised learning tasks. They are particularly noted for their ability to generate new data points that are similar but not identical to the data they were trained on, facilitating creativity and exploration in AI-driven content creation.

LoRA (Low-Rank Adaptation)

LoRA is a technique used to adapt pre-trained models with minimal changes to their parameters, making the models more flexible for specific tasks without the need for extensive retraining. In the context of generative models like stable diffusion, LoRA can be applied to fine-tune the model on a smaller, specialized dataset to improve its ability to generate specific types of images or styles without compromising the overall knowledge learned during pre-training.

Textual Inversion (Embeddings)

Textual Inversions, also referred to as embeddings, train a model to associate specific, often novel, terms with particular visual concepts or styles that are not present in the base model's training data. In stable diffusion, textual inversion allows users to create custom "tokens" or "keywords" that can be used in prompts to generate images with unique attributes or styles specific to a smaller dataset. It effectively extends the model's vocabulary to include new, user-defined concepts.

Other common Models

A few of the other commonly encoutered types of models for more advanced or experienced set ups are:

ControlNet - used most often to use details from an existing image to influence the generation of a new one.
CodeFormer - Face restoration and detail enhancement.
HyperNetwork -
Upscalers - an important technique to increase the resolution of an image while enhancing the quality and details

AI is still a rapidly developing field, with new technologies and capabilities added all the time.

Common File Types:

Below are some of the most popular model file types which are in use today:


type	extention	notes	popularity
Safetensor	.safetensor	Created to patch the security hole in .ckpt, can't execute code.	currently most popular model type on Civitai.
Checkpoint	.ckpt	A "one file" version of a diffusion model, criticized for potential code vulnerability (potential to execute code if maliciously modified after creation).	These show in the Civitai download as PickleTensor files. Somewhat distrusted, most people prefer safetensor
diffusion	(folder of files)	Oldest of the model types listed here, diffusers have a folder structure with multiple files that make up the entire model.	Not available through Civitai, but often show up on Hugging Face.