Clip Skip: Difference between revisions

From Civitai Wiki
Jump to navigation Jump to search
(Created page with "The Text Encoder uses a mechanism called "CLIP", made up of 12 layers (corresponding to the 12 layers of the Stable Diffusion neural network). Clip Skip specified the layer number Xth from the end. Clip Skip of 2 will send the penultimate layer's output vector to the Attention block. Unless the base model you're training against was trained (or Mixed) with Clip Skip 2, you can use 1. SDXL does not benefit from Clip Skip 2.")
 
(No difference)

Latest revision as of 07:19, 11 October 2023

The Text Encoder uses a mechanism called "CLIP", made up of 12 layers (corresponding to the 12 layers of the Stable Diffusion neural network). Clip Skip specified the layer number Xth from the end. Clip Skip of 2 will send the penultimate layer's output vector to the Attention block. Unless the base model you're training against was trained (or Mixed) with Clip Skip 2, you can use 1. SDXL does not benefit from Clip Skip 2.