Graph Convolutional Networks
Graph convolutional networks (GCNs) have harvested considerable attention in the learning representation field, generalizing the power of CNNs to processing data organized in graph structure (e.g., citation networks, 3D meshes). However, according to the following challenges, we provide our research directions:
1) GCNs suffer from the over-smoothing problem. The work of Hoang et al. blame this to the constant low-pass filtering. We further generalize this challenge to achieve full filter representation power. And we show that merely adopting second-order polynomial filters can overcome filtering deficiency, contrasting to the elaborate tricks mentioned in GCNII (Chen et al.).
2) GCNs cannot go deeper even with versatile layers on the spectral domain. However, we managed to attain the opposite results on the synthetic data by linear-only filtering. This phenomenon motivates us to study the nonlinearity in GCNs. At first glance, element-wise functions (e.g., ReLU) can be adopted directly. However, from the perspective of Graph Signal Processing (GSP), they are not shift-invariant. Therefore, we are motivated to search for a more suitable nonlinear activation for GCNs to benefit them from shift-invariance.
3) The selection of graph shift operator remains an open question. The commonly applied Graph Laplacian matrix plays a vitally important role in GSP and GCNs. They can characterize frequencies of graph signals. However, more algebraic perspectives (Deri et al.) only regard it as a shift function on graph topology. We recall that models like GAT contain the trainable aggregator, which corresponds to the shift other than the Laplacian matrix. Hence, we naturally doubt the usage of graph Laplacian for extending the graph spectrum.
Neural Representation and Optimization
Multi-layer perceptron (MLP), proved to be a universal approximator, has the potential to encode every digital object as an implicit function. For example, images, 3D surface/volumes, and human avatars. This latent representation can empower reconstruction algorithms with a differential forward model and stochastic gradient descent optimization procedures, such as Neural Radiance Field (NeRF), a successful instance of this idea. Our research lies in a broader field of neural representation and reconstruction in computer graphics, computational imaging, and bioinformatics. Specifically, we explore the possibility of applications in Electron Transmission Microscopy (TEM), Non-Line-Of-Sight imaging (NLOS), and depth from defocus.
Moreover, we are studying how to interact with the implicit representation, in which we can impose more structural correlations and constraints to facilitate the reconstruction. As we refer to MLP as a universal interpolator, we are doing theoretical research on the working mechanism behind the representation capacity of MLP.
3D Pose Estimation and Understanding
Complex poses, data annotation, and motion-blur still remain as challenges for 3D pose estimation. We observed that current learning-based models significantly suffer from the domain gap between both human motions (e.g., competitive diving vs. walking) and the surrounding scenes (e.g., indoor vs. in-the-wild). Specifically, a pose estimator trained on dedicated datasets (e.g., Diving48) takes advantages in both accuracy and efficiency. However, those datasets with high-quality 3D annotations for challenging poses (e.g., competitive sports) are usually hard to acquire. A possible solution by exploiting synthetic data, however, cannot reach comparable performance. To this end, we intend to employ semi-supervised inverse rendering to extract pose and appearance representations separately, and borrow content-based domain adaptation to eliminate the domain gaps between virtual data and real-captured data on these two representation domains, respectively. So that we can obtain large-scale datasets with realistic human images and the autonomously annotated ground truth.