Enables intermediate saving of gradients, reduces overall training speed but uses less VRAM. Has no effect on the training results.