Mixed Precision

Weight data is originally 32bit units, but we can gain considerable VRAM savings by training with 16bit precision. LoRA can be successfully trained with FP16 (16bit precision). BF16 is a format devised to provide the VRAM savings of FP16 with the accuracy of FP32 (32bit). BF16 may only work on the latest generation GPUs.