Behnam Neyshabur

Senior Staff Research Scientist & Team Lead

Google DeepMind, Blueshift Team

neyshabur(at)google.com
bneyshabur(at)gmail.com
Curriculum Vitae


I am a senior staff research scientist at Google DeepMind, co-leading Blueshift Team. Our team is based in the US and as a part of Gemini team, we are focused on improving Gemini's abilities in solving hard reasoning/planning problems in areas such as STEM.. Before this, I was a postdoctoral researcher at New York University, and a member of Theoretical Machine Learning program at Institute for Advanced Study (IAS) in Princeton. In summer 2017, I received a PhD in computer science at TTI-Chicago.

Current highlights:


We are hiring! Apply here and mention your location/team preference.

Virtual MLC office hours (open to everyone): The purpose of this office hour is to help with career choices, research directions or anything else that one might need some help with. You can book them here.

Anonymous Feedback (open to everyone): Send me anonymous feedback through this form.

Academic Service: I have been serving an Area Chair for NeurIPS and ICLR conferences and an Editorial Board member of JMLR and TMLR journals.

(co-)hosted Interns (not considering interns at this points):
Publications (Google Scholar)

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.,
Gemini Team et al.
Technical Report, 2024.
[Tech Report]

Gemini: a family of highly capable multimodal models,
Gemini Team et al.
arXiv preprint, 2023.
[arXiv:2312.11805]

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models.,
Singh, Avi, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Peter J. Liu, James Harrison et al.
arXiv preprint, 2023.
[arXiv:2312.06585]

Long Range Language Modeling via Gated State Spaces,
Harsh Mehta, Ankit Gupta, Ashok Cutkosky, Behnam Neyshabur.
International Conference on Learning Representations (ICLR), 2023.
[arXiv:2206.13947]

REPAIR: REnormalizing Permuted Activations for Interpolation Repair,
Keller Jordan, Hanie Sedghi, Olga Saukh, Rahim Entezari, Behnam Neyshabur.
International Conference on Learning Representations (ICLR), 2023.
[arXiv:2211.08403] [code]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models,
A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, et. al.
Transactions on Machine Learning Research (TMLR), 2023.
[arXiv:2206.04615]

Teaching Algorithmic Reasoning via In-context Learning,
Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron Courville, Behnam Neyshabur*, Hanie Sedghi*.
arXiv preprint, 2022.
[arXiv:2211.09066]

Solving Quantitative Reasoning Problems with Language Models,
Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur*, Guy Gur-Ari*, Vedant Misra*.
Neural Information Processing Systems (NeurIPS), 2022.
[arXiv:2206.14858] [Google AI Blog][Sample Explorer]

Exploring Length Generalization in Large Language Models,
Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ramasesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, Behnam Neyshabur.
Neural Information Processing Systems (NeurIPS), 2022 (oral).
[arXiv:2207.04901]

Block-Recurrent Transformers,
DeLesley Hutchins, Imanol Schlag, Yuhuai Wu, Ethan Dyer, Behnam Neyshabur.
Neural Information Processing Systems (NeurIPS), 2022.
[arXiv:2203.07852] [code]

Revisiting Neural Scaling Laws in Language and Vision,
Ibrahim Alabdulmohsin, Behnam Neyshabur, Xiaohua Zhai.
Neural Information Processing Systems (NeurIPS), 2022.
[arXiv:2209.06640]

Data Scaling Laws in NMT: The Effect of Noise and Architecture,
Yamini Bansal, Behrooz Ghorbani, Ankush Garg, Biao Zhang, Maxim Krikun, Colin Cherry, Behnam Neyshabur, Orhan Firat.
The International Conference on Machine Learning (ICML), 2022.
[arXiv:2202.01994]

Convexifying Transformers: Improving optimization and understanding of transformer networks,
Tolga Ergen, Behnam Neyshabur, Harsh Mehta.
arXiv preprint, 2022.
[arXiv:2211.11052]

Layer-Stack Temperature Scaling,
Amr Khalifa, Michael C Mozer, Hanie Sedghi, Behnam Neyshabur, Ibrahim Alabdulmohsin.
arXiv preprint, 2022.
[arXiv:2211.10193]

Exploring the Limits of Large Scale Pre-training,
Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, Hanie Sedghi.
International Conference on Learning Representations (ICLR), 2022 (spotlight).
[arXiv:2110.02095]

The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks,
Rahim Entezari, Hanie Sedghi, Olga Saukh, Behnam Neyshabur.
International Conference on Learning Representations (ICLR), 2022.
[arXiv:2110.06296] [code]

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance,
Saurabh Garg, Sivaraman Balakrishnan, Zachary C Lipton, Behnam Neyshabur, Hanie Sedghi.
International Conference on Learning Representations (ICLR), 2022.
[arXiv:2201.04234]

A Loss Curvature Perspective on Training Instability in Deep Learning,
Justin Gilmer, Behrooz Ghorbani, Ankush Garg, Sneha Kudugunta, Behnam Neyshabur, David Cardoze, George Dahl, Zachary Nado, Orhan Firat.
International Conference on Learning Representations (ICLR), 2022.
[arXiv:2110.04369]

The evolution of out-of-distribution robustness throughout fine-tuning,
Anders Andreassen, Yasaman Bahri, Behnam Neyshabur, Rebecca Roelofs.
Transactions on Machine Learning Research (TMLR), 2022.
[arXiv:2106.15831]

Deep Learning Through the Lens of Example Difficulty,
Robert JN Baldock, Hartmut Maennel, Behnam Neyshabur.
Neural Information Processing Systems (NeurIPS), 2021.
[arXiv:2106.09647]

Methods and Analysis of The First Competition in Predicting Generalization of Deep Learning,
Yiding Jiang, Parth Natekar, Manik Sharma, Sumukh K. Aithal, Dhruva Kashyap, Natarajan Subramanyam, Carlos Lassance, Daniel M. Roy, Gintare Karolina Dziugaite, Suriya Gunasekar, Isabelle Guyon, Pierre Foret, Scott Yak, Hossein Mobahi, Behnam Neyshabur, Samy Bengio.
NeurIPS 2020 Competition and Demonstration Track, PMLR, 2021.
[link]

When Do Curricula Work?,
Xiaoxia Wu, Ethan Dyer, Behnam Neyshabur.
International Conference on Learning Representations (ICLR), 2021.(oral)
[arXiv:2012.03107] [code]

Sharpness-Aware Minimization for Efficiently Improving Generalization,
Pierre Foret, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur.
International Conference on Learning Representations (ICLR), 2021.(spotlight)
[arXiv:2010.01412] [code]

Understanding the Failure Modes of Out-of-Distribution Generalization,
Vaishnavh Nagarajan, Anders Andreassen, Behnam Neyshabur.
International Conference on Learning Representations (ICLR), 2021.
[arXiv:2010.15775] [code]

The Deep Bootstrap: Good Online Learners are Good Offline Generalizers,
Preetum Nakkiran, Behnam Neyshabur, Hanie Sedghi
International Conference on Learning Representations (ICLR), 2021.
[arXiv:2010.08127] [code]

Are wider nets better given the same number of parameters?,
Anna Golubeva, Behnam Neyshabur, Guy Gur-Ari
International Conference on Learning Representations (ICLR), 2021.
[arXiv:2010.14495] [code]

Extreme Memorization via Scale of Initialization,
Harsh Mehta, Ashok Cutkosky, Behnam Neyshabur.
International Conference on Learning Representations (ICLR), 2021.
[arXiv:2008.13363] [code]

Towards Learning Convolutions from Scratch,
Behnam Neyshabur.
Neural Information Processing Systems (NeurIPS), 2020.
[arXiv:2007.13657]

What is being transferred in transfer learning?,
Behnam Neyshabur*, Hanie Sedghi*, Chiyuan Zhang*.
Neural Information Processing Systems (NeurIPS), 2020.
[arXiv:2008.11687] [code]

NeurIPS 2020 Competition: Predicting Generalization in Deep Learning,
Yiding Jiang, Pierre Foret, Scott Yak, Daniel M Roy, Hossein Mobahi, Gintare Karolina Dziugaite, Samy Bengio, Suriya Gunasekar, Isabelle Guyon, Behnam Neyshabur.
arXiv Preprint, 2020.
[arXiv:2012.07976]

Fantastic Generalization Measures and Where to Find Them,
Yiding Jiang*, Behnam Neyshabur*, Hossein Mobahi, Dilip Krishnan, Samy Bengio.
International Conference on Learning Representations (ICLR), 2020.
[arXiv:1912.02178]

The intriguing role of module criticality in the generalization of deep networks,
Niladri Chatterji, Behnam Neyshabur, Hanie Sedghi.
International Conference on Learning Representations (ICLR), 2020 (spotlight).
[arXiv:1912.00528]

Observational Overfitting in Reinforcement Learning,
Xingyou Song, Yiding Jiang, Yilun Du, Behnam Neyshabur.
International Conference on Learning Representations (ICLR), 2020.
[arXiv:1912.02975]

Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks,
Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, Nathan Srebro.
International Conference on Learning Representations (ICLR), 2019.
[arXiv:1805.12076] [code] [poster]

Predicting Protein-Protein Interactions through Sequence-based Deep Learning,
Somaye Hashemifar, Behnam Neyshabur, Aly Azeem Khan, Jinbo Xu.
Bioinformatics, 2018.
[link] [code]

Stronger Generalization Bounds for Deep Nets via a Compression Approach,
Sanjeev Arora, Rong Ge, Behnam Neyshabur, Yi Zhang.
The 35th International Conference on Machine Learning (ICML), 2018.
[arXiv:1802.05296]

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks,
Behnam Neyshabur, Srinadh Bhojanapalli, Nathan Srebro.
International Conference on Learning Representation (ICLR), 2018.
[arXiv:1707.09564]

Implicit Regularization in Deep Learning,
Behnam Neyshabur.
PhD Thesis, 2017.
[arXiv:1709.01953]

Exploring Generalization in Deep Learning,
Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, Nathan Srebro.
Neural Information Processing Systems (NeurIPS), 2017.
[arXiv:1706.08947] [code]

Implicit Regularization in Matrix Factorization,
Suriya Gunasekar, Blake Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro.
Neural Information Processing Systems (NeurIPS), 2017 (spotlight).
[arXiv:1705.09280]

Stabilizing GAN Training with Multiple Random Projections,
Behnam Neyshabur, Srinadh Bhojanapalli, Ayan Chakrabarti.
arXiv Preprint, 2017.
[arXiv:1705.07831] [code]

Corralling a Band of Bandit Algorithms,
Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, Robert E. Schapire.
The 30th Conference on Learning Theory (COLT), 2017.
[arXiv:1612.06246]

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations,
Behnam Neyshabur*, Yuhuai Wu*, Ruslan Salakhutdinov, Nathan Srebro.
Neural Information Processing Systems (NeurIPS), 2016.
[arXiv:1605.07154]

Global Optimality of Local Search for Low Rank Matrix Recovery,
Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro.
Neural Information Processing Systems (NeurIPS), 2016.
[arXiv:1605.07221]

Data-Dependent Path Normalization in Neural Networks,
Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro.
International Conference on Learning Representations (ICLR), 2016.
[arXiv:1511.06747]

Path-SGD: Path-Normalized Optimization in Deep Neural Networks,
Behnam Neyshabur, Ruslan Salakhutdinov, Nathan Srebro.
Neural Information Processing Systems (NeurIPS), 2015.
[arXiv:1506.02617] [code]

Norm-Based Capacity Control in Neural Networks,
Behnam Neyshabur, Ryota Tomioka, Nathan Srebro.
The 28th Conference on Learning Theory (COLT), 2015.
[arXiv:1503.00036]

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning,
Behnam Neyshabur, Ryota Tomioka, Nathan Srebro.
International Conference on Learning Representations (ICLR) workshop track, 2015.
[arXiv:1412.6614] [ICLR poster]

On Symmetric and Asymmetric LSHs for Inner Product Search,
Behnam Neyshabur and Nathan Srebro.
The 32nd International Conference on Machine Learning (ICML), 2015.
[arXiv:1410.5518] [code]

Joint Inference of Tissue-specific Networks with a Scale Free Topology,
Somaye Hashemifar, Behnam Neyshabur, Jinbo Xu.
IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM), 2015.

Clustering, Hamming Embedding, Generalized LSH and the Max Norm,
Behnam Neyshabur, Yury Makarychev, Nathan Srebro.
The 25th International Conference on Algorithmic Learning Theory (ALT), 2014.
[arXiv:1405.3167] [slides]

Sparse Matrix Factorization: Simple rules for growing neural nets,
Behnam Neyshabur and Rina Panigrahy.
arXiv preprint, 2014.
[arXiv:1311.3315] [slides]

The Power of Asymmetry in Binary Hashing,
Behnam Neyshabur, Payman Yadollahpour, Yury Makarychev, Ruslan Salakhutdinov, Nathan Srebro.
Neural Information Processing Systems (NeurIPS), 2013.
[arXiv:1311.7662] [NeurIPS poster] [slides] [code]

NETAL: a new graph-based method for global alignment of protein-protein interaction networks,
Behnam Neyshabur, Ahmadreza Khadem, Somaye Hashemifar, Seyed Shahriar Arab.
Bioinformatics, 29(13): 1654-1662 (2013).
[link] [server] [code]

Alaska
Other Activities