Adafactor: Difference between revisions

From Civitai Wiki
Jump to navigation Jump to search
(Create Adafactor page. Link to paper. Link to other optimizers.)
 
m (Added disclaimer for external links)
 
Line 1: Line 1:
Adafactor is an optimization algorithm for training [[Deep Learning|deep learning]] models, particularly well-suited for large-scale distributed settings. It is designed to address some limitations of traditional optimization algorithms, such as adaptive learning rate methods like [[Adam]] and [[Root Mean Square Propagation|RMSprop]]. Adafactor was introduced by Noam Shazeer and Mitchell Stern in their paper titled [https://arxiv.org/abs/1804.04235 “Adafactor: Adaptive Learning Rates with Sublinear Memory Cost”] in 2018.
Adafactor is an optimization algorithm for training [[Deep Learning|deep learning]] models, particularly well-suited for large-scale distributed settings. It is designed to address some limitations of traditional optimization algorithms, such as adaptive learning rate methods like [[Adam]] and [[Root Mean Square Propagation|RMSprop]]. Adafactor was introduced by Noam Shazeer and Mitchell Stern in their paper titled [https://arxiv.org/abs/1804.04235 “Adafactor: Adaptive Learning Rates with Sublinear Memory Cost”] in 2018.
{{Disclaim-external-links}}

Latest revision as of 17:27, 30 March 2024

Adafactor is an optimization algorithm for training deep learning models, particularly well-suited for large-scale distributed settings. It is designed to address some limitations of traditional optimization algorithms, such as adaptive learning rate methods like Adam and RMSprop. Adafactor was introduced by Noam Shazeer and Mitchell Stern in their paper titled “Adafactor: Adaptive Learning Rates with Sublinear Memory Cost” in 2018.


External Links

Please note that the content of external links are not endorsed or verified by us and can change with no notice. Use at your own risk.