跳到主要导航 跳到搜索 跳到主要内容

Orthogonalized sgd and nested architectures for anytime neural networks

  • Chengcheng Wan*
  • , Henry Hoffmann
  • , Shan Lu
  • , Michael Maire
  • *此作品的通讯作者
  • The University of Chicago

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

We propose a novel variant of SGD customized for training network architectures that support anytime behavior: such networks produce a series of increasingly accurate outputs over time. Efficient architectural designs for these networks focus on re-using internal state; subnetworks must produce representations relevant for both immediate prediction as well as refinement by subsequent network stages. We consider traditional branched networks as well as a new class of recursively nested networks. Our new optimizer, Orthogonalized SGD, dynamically re-balances task-specific gradients when training a multitask network. In the context of anytime architectures, this optimizer projects gradients from later outputs onto a parameter subspace that does not interfere with those from earlier outputs. Experiments demonstrate that training with Orthogonalized SGD significantly improves generalization accuracy of anytime networks.

源语言英语
主期刊名37th International Conference on Machine Learning, ICML 2020
编辑Hal Daume, Aarti Singh
出版商International Machine Learning Society (IMLS)
9749-9759
页数11
ISBN(电子版)9781713821120
出版状态已出版 - 2020
已对外发布
活动37th International Conference on Machine Learning, ICML 2020 - Virtual, Online
期限: 13 7月 202018 7月 2020

出版系列

姓名37th International Conference on Machine Learning, ICML 2020
PartF168147-13

会议

会议37th International Conference on Machine Learning, ICML 2020
Virtual, Online
时期13/07/2018/07/20

指纹

探究 'Orthogonalized sgd and nested architectures for anytime neural networks' 的科研主题。它们共同构成独一无二的指纹。

引用此