Adafactor

From Civitai Wiki
Revision as of 17:17, 30 March 2024 by Pitpe11 (talk | contribs) (Create Adafactor page. Link to paper. Link to other optimizers.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Adafactor is an optimization algorithm for training deep learning models, particularly well-suited for large-scale distributed settings. It is designed to address some limitations of traditional optimization algorithms, such as adaptive learning rate methods like Adam and RMSprop. Adafactor was introduced by Noam Shazeer and Mitchell Stern in their paper titled “Adafactor: Adaptive Learning Rates with Sublinear Memory Cost” in 2018.