My primary research explores "geometric primitives" for enhancing language model reasoning, improving neural architectures, discovering new scaling paradigms, and grounding generative vision models with physics.
I advocate for "full-stack" ML development. I'm fascinated by algorithms that bring together mathematical theory, hardware realization, and real-world/scientific applications across modalities.
-
Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding
International Conference on Machine Learning (ICML), 2025
We enhance the algorithmic reasoning ability of large language models via contextualizing positional encoding with equivariance constraints.
[Paper]
[Code]

-
Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning
International Conference on Neuro-symbolic Systems (NeuS), 2025, Disruptive Idea Awards
We theoretically demonstrate that symbolic structures are inherent in neural weight space and gradient descent under geometric constraints can find these solutions.
[Paper]
-
Polynomial Width is Sufficient for Set Representation with High-dimensional Features
International Conference on Learning Representations (ICLR), 2024
We theoretically show that polynomial many neurons are sufficient for set representation with the DeepSets architecture.
[Paper]
-
Learning to Grow Pretrained Models for Efficient Transformer Training
International Conference on Learning Representations (ICLR), 2023, Spotlight
This paper proposes to accelerate transformer training by re-using pretrained models via a learnable, linear and sparse model growth operator.
[MIT News]
[Paper]
[Code]

-
Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice
International Conference on Learning Representations (ICLR), 2022
We prove that self-attention is no more than low-pass filter, and propose two simple yet effective methods to counteract excessive smoothening.
[Paper]
[Code]

Keep thy heart with all diligence; for out of it are the issues of life.
My journey in computer programming began at ten years old when my father introduced me to Scratch programming.
I spent my leisure time during high school developing a database management software myBase 7.
At my college, I was fortunate to work with Prof. Jingyi Yu on 3D vision and computational imaging, and to study algebra with Prof. Manolis C. Tsakiris.
I also worked with Prof. Jianbo Shi on graph learning and spectral graph theory during a summer.
Starting out in computer science is never easy, and I am deeply grateful to those who had a profound impact on the early stages of my journey, guiding me through the novice landscape of academia.