Yongming Rao

	NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo Yi Wei , Shaohui Liu, Yongming Rao, Wang Zhao, Jiwen Lu , Jie Zhou IEEE International Conference on Computer Vision (ICCV), 2021 Oral Presentation [arXiv] [Code] [Project page] [Video] We present a new multi-view depth estimation method that utilizes both conventional SfM reconstruction and learning-based priors over the recently proposed neural radiance fields (NeRF).
	Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification Yongming Rao, Guangyi Chen, Jiwen Lu , Jie Zhou IEEE International Conference on Computer Vision (ICCV), 2021 [arXiv] [Code] We propose to learn the attention with counterfactual causality, which provides a tool to measure the attention quality and a powerful supervisory signal to guide the learning process.
	Structure-Preserving Image Super-Resolution Cheng Ma , Yongming Rao, Jiwen Lu , Jie Zhou IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI, IF: 24.31), 2021 [arXiv] [Code] We propose to learn a neural structure extractor unsupervisedly to extract structural patterns in images and use it to supervise SR models.
	Towards Interpretable Deep Metric Learning with Structural Matching Wenliang Zhao, Yongming Rao*, Ziyi Wang, Jiwen Lu , Jie Zhou IEEE International Conference on Computer Vision (ICCV)*, 2021 [arXiv] [Code] We present a deep interpretable metric learning (DIML) that adopts a structural matching strategy to explicitly aligns the spatial embeddings by computing an optimal matching flow between feature maps of the two images.
	Group-aware Contrastive Regression for Action Quality Assessment Xumin Yu, Yongming Rao*, Wenliang Zhao, Jiwen Lu , Jie Zhou IEEE International Conference on Computer Vision (ICCV)*, 2021 [arXiv] [Code] We propose a new contrastive regression (CoRe) framework to learn the relative scores by pair-wise comparison, which highlights the differences between videos and guides the models to learn the key hints for action quality assessment.
	PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds Yi Wei , Ziyi Wang, Yongming Rao, Jiwen Lu , Jie Zhou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2021 [arXiv] [Code] We present point-voxel correlation fields for 3D scene flow estimation which migrates the high performance of RAFT and provides a solution to build structured all-pairs correlation fields for unstructured point clouds.
	Multi-Proxy Wasserstein Classifier for Image Classification Benlin Liu , Yongming Rao, Jiwen Lu , Jie Zhou , Cho-Jui Hsieh AAAI Conference on Artificial Intelligence (AAAI), 2021 [PDF] We present a new Multi-Proxy Wasserstein Classifier to imporve the image classification models by calculating a non-uniform matching flow between the elements in the feature map of a sample and multiple proxies of a class using optimal transport theory.
	Temporal Coherence or Temporal Motion: Which is More Critical for Video-based Person Re-identification? Guangyi Chen , Yongming Rao, Jiwen Lu , Jie Zhou European Conference on Computer Vision (ECCV), 2020 [PDF] We show temporal coherence plays a more critical role than temporal motion for video-based person re-identification and develop a Adversarial Feature Augmentation (AFA) to highlight temporal coherence.
	MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation Benlin Liu , Yongming Rao, Jiwen Lu , Jie Zhou , Cho-Jui Hsieh European Conference on Computer Vision (ECCV), 2020 [arXiv] We boost the performance of CNNs by learning soft targets for shallow layers via meta-learning.
	Structure-Preserving Super Resolution with Gradient Guidance Cheng Ma , Yongming Rao, Yean Cheng, Ce Chen, Jiwen Lu , Jie Zhou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020 [arXiv] [Code] We propose to leverage gradient information as an extra supervision signal to restore structures while generating natural SR images.
	Deep Face Super-Resolution with Iterative Collaboration between Attentive Recovery and Landmark Estimation Cheng Ma , Zhenyu Jiang , Yongming Rao, Jiwen Lu , Jie Zhou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020 [arXiv] [Code] We propose a deep face super-resolution method with iterative collaboration between two recurrent networks which focus on facial image recovery and landmark estimation respectively
	COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis Yansong Tang , Dajun Ding, Yongming Rao, Yu Zheng, Danyang Zhang, Lili Zhao, Jiwen Lu , Jie Zhou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019 [arXiv] [Project Page] [Annotation Tool] COIN is the largest and most comprehensive instructional video analysis dataset with rich annotations.
	Learning Discriminative Aggregation Network for Video-based Face Recognition and Person Re-identification Yongming Rao, Jiwen Lu , Jie Zhou International Journal of Computer Vision (IJCV, IF: 6.07), 2019 [PDF] [Code] We propose a discriminative aggregation network (DAN) method for video-based face recognition and person re-identification, which aims to integrate information from video frames for feature representation effectively and efficiently.
	Learning Globally Optimized Object Detector via Policy Gradient Yongming Rao, Dahua Lin , Jiwen Lu , Jie Zhou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018 Spotlight Presentation [PDF] [Supplement] We propose a simple yet effective method to learn globally optimized detector for object detection by directly optimizing mAP using the REINFORCE algorithm.
	Runtime Neural Pruning Ji Lin , Yongming Rao, Jiwen Lu , Jie Zhou Conference on Neural Information Processing Systems (NeurIPS), 2017 [PDF] [Code] We propose a Runtime Neural Pruning (RNP) framework which prunes the deep neural network dynamically at the runtime.
	Learning Discriminative Aggregation Network for Video-Based Face Recognition Yongming Rao, Ji Lin , Jiwen Lu , Jie Zhou IEEE International Conference on Computer Vision (ICCV), 2017 Spotlight Presentation [PDF] [Code] [Supplement] We propose a discriminative aggregation network (DAN) method for video face recognition, which aims to integrate information from video frames effectively and efficiently.
	Attention-aware Deep Reinforcement Learning for Video Face Recognition Yongming Rao, Jiwen Lu , Jie Zhou IEEE International Conference on Computer Vision (ICCV), 2017 [PDF] We propose an attention-aware deep reinforcement learning (ADRL) method for video face recognition, which aims to discard the misleading and confounding frames and find the focuses of attentions in face videos for person recognition.
	V-tree: Efficient KNN Search on Moving Objects with Road-Network Constraints Bilong Shen, Ying Zhao, Guoliang Li, Weimin Zheng, Yue Qin, Bo Yuan, Yongming Rao IEEE International Conference on Data Engineering (ICDE), 2017 [PDF] We propose a new tree structure for moving objects kNN search with road-network constraints, which can be used in many real-world applications like taxi search.