Clip Skip

The Text Encoder uses a mechanism called "CLIP", made up of 12 layers (corresponding to the 12 layers of the Stable Diffusion neural network). Clip Skip specified the layer number Xth from the end. Clip Skip of 2 will send the penultimate layer's output vector to the Attention block. Unless the base model you're training against was trained (or Mixed) with Clip Skip 2, you can use 1. SDXL does not benefit from Clip Skip 2.

Clip Skip

Navigation menu