Shuxin Zheng

Deputy Director of Zhongguancun Institute of Artificial Intelligence

Research Interests

general artificial intelligenceGenerative AIscientific intelligence

Publications

Invited Talks

About

Dr. Shuxin Zheng graduated from the MSRA-USTC Joint PhD Program. He currently serves as the Deputy Director of Zhongguancun Institute of Artificial Intelligence, Director of the Strategic Planning Office at the Beijing Zhongguancun Academy, Enterprise Mentor at Chinese Academy of Sciences, and Associate Editor of "AI for Science" journal. Previously, he was a Principal Researcher at Microsoft Research and Head of Microsoft Scientific Foundation Models. He has won multiple world championships in artificial intelligence and trained the largest scientific foundation model to date. Dr. Zheng has published over 20 first-author and corresponding-author papers in Nature subsidiary journals and top international AI conferences, with more than 5,000 citations. He serves as a visiting lecturer at Tsinghua University, Chinese Academy of Sciences, and Microsoft AI Academy, where he has long taught courses such as "Fundamentals of Machine Learning Methods and Applications" and "Advanced Machine Learning."

Research Activities

Education

PhD in Computer Science

University of Science and Technology of China (USTC) & Microsoft Research Asia

2018

BSc in Computer Science

University of Science and Technology of China (USTC)

2014

Teaching Experience

Advanced Machine Learning

graduate courseInstitute of Computing Technology, Chinese Academy of SciencesSpring 2021

Foundations of Machine Learning

undergraduate general education courseTsinghua UniversityFall 2021

Advanced Machine Learning

graduate courseDepartment of EE, Tsinghua UniversitySpring 2021

Advanced Machine Learning

in AI SchoolMicrosoftFall 2020

Advanced Machine Learning

graduate courseDepartment of EE, Tsinghua UniversityFall 2019

Publications

Predicting equilibrium distributions for molecular systems with deep learning

Nature Machine Intelligence2024140 citations

Do Transformers Really Perform Badly for Graph Representation?

Thirty-Fifth Conference on Neural Information Processing Systems (NIPS)20211760 citations

On layer normalization in the transformer architecture

Proceedings of the 37th International Conference on Machine Learning20201360 citations

Asynchronous stochastic gradient descent with delay compensation

Proceedings of the 34th International Conference on Machine Learning, PMLR2017406 citations

Invertible Image Rescaling

European Conference on Computer Vision (ECCV) 20202020309 citations

Cross-Iteration Batch Normalization

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)2020154 citations

One transformer can understand both 2d & 3d molecular data

ICLR 20222022134 citations

Deep learning for prediction of the air quality response to emission changes

Environmental science & technology2020116 citations

Benchmarking graphormer on large-scale molecular modeling datasets

arXiv preprint202288 citations

How could Neural Networks understand Programs?

Proceedings of International Conference on Machine Learning (ICML)202175 citations

Molecule generation for target protein binding with structural motifs

The eleventh international conference on learning representations202372 citations

The impact of large language models on scientific discovery: a preliminary study using gpt-4

arXiv preprint202370 citations

Your transformer may not be as powerful as you expect

Advances in Neural Information Processing Systems202265 citations

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

Advances in Neural Information Processing Systems, 2021 (NeurIPS 2021)202160 citations

Scalable emulation of protein equilibrium ensembles with generative deep learning

Science202541 citations

π-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

Proceedings of the 7th International Conference on Learning Representations201840 citations

Invertible rescaling network and its extensions

International Journal of Computer Vision202336 citations

Quantized training of gradient boosting decision trees

Advances in neural information processing systems202233 citations

Modeling Lost Information in Lossy Image Compression

arXiv preprint202032 citations

Capacity control of relu neural networks by basis-path norm

Proceedings of the 33rd AAAI Conference on Artificial Intelligence201827 citations

Shuxin Zheng

Shuxin Zheng

Research Interests

About

Research Activities

Distributional Graphormer: beyond AlphaFold2

Graphormer: a general-purpose backbone for graph data

AI for Sustainability, Microsoft Research Summit

Modeling Lost Information in Signal Processing

Education

PhD in Computer Science

BSc in Computer Science

Teaching Experience

Advanced Machine Learning

Foundations of Machine Learning

Advanced Machine Learning

Advanced Machine Learning

Advanced Machine Learning

Publications

Predicting equilibrium distributions for molecular systems with deep learning

Do Transformers Really Perform Badly for Graph Representation?

On layer normalization in the transformer architecture

Asynchronous stochastic gradient descent with delay compensation

Invertible Image Rescaling

Cross-Iteration Batch Normalization

One transformer can understand both 2d & 3d molecular data

Deep learning for prediction of the air quality response to emission changes

Benchmarking graphormer on large-scale molecular modeling datasets

How could Neural Networks understand Programs?

Molecule generation for target protein binding with structural motifs

The impact of large language models on scientific discovery: a preliminary study using gpt-4

Your transformer may not be as powerful as you expect

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

Scalable emulation of protein equilibrium ensembles with generative deep learning

π-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

Invertible rescaling network and its extensions

Quantized training of gradient boosting decision trees

Modeling Lost Information in Lossy Image Compression

Capacity control of relu neural networks by basis-path norm