-
Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding
International Conference on Machine Learning (ICML), 2025
We enhance the addressing ability of large language models via contextualizing positional encoding with equivariance constraints.
[Paper]
[Code]

-
Why Neural Network Can Discover Symbolic Structures with Gradient-based Training: An Algebraic and Geometric Foundation for Neurosymbolic Reasoning
International Conference on Neuro-symbolic Systems (NeuS), 2025, Disruptive Idea Awards
We theoretically demonstrate that symbolic structures are inherent in neural weight space and gradient descent under geometric constraints can find these solutions.
[Paper]
-
Polynomial Width is Sufficient for Set Representation with High-dimensional Features
International Conference on Learning Representations (ICLR), 2024
We theoretically show that polynomial many neurons are sufficient for set representation with the DeepSets architecture.
[Paper]
-
Learning to Grow Pretrained Models for Efficient Transformer Training
International Conference on Learning Representations (ICLR), 2023, Spotlight
This paper proposes to accelerate transformer training by re-using pretrained models via a learnable, linear and sparse model growth operator.
[MIT News]
[Paper]
[Code]

-
Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice
International Conference on Learning Representations (ICLR), 2022
We prove that self-attention is no more than low-pass filter, and propose two simple yet effective methods to counteract excessive smoothening.
[Paper]
[Code]

Keep thy heart with all diligence; for out of it are the issues of life.
My journey in computer programming began at ten years old when my father introduced me to Scratch programming.
I spent my leisure time during high school developing a database management software myBase 7.
At my college, I was fortunate to work with Prof. Jingyi Yu on 3D vision and computational imaging,
and studied algebra from Prof. Manolis C. Tsakiris.
I also worked with Prof. Jianbo Shi on graph learning and spectral graph theory.
It is always hard to be a starter, hereby I would express my sincere gratitude to those who guided me walk through the novice village of academia.