Research
I'm interested in computer vision, neural architecture search, generative model and 3D vision. My previous research is mainly about designing efficient and effective neural network automatically, while I am now focusing on generative models. Representative papers are highlighted.
|
|
DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing
Minghao Chen,
Iro Laina ,
Andrea Vedaldi ,
ECCV , 2024  
arXiv /
bibtex /
project page /
code
We introduce Direct Gaussian Editor (DGE), a novel method for fast 3D editing. We consider the task of 3D editing as a two-stage process, where the first stage focuses on achieving multi-view consistent 2D editing, followed by a secondary stage dedicated to precise 3D fitting.
|
|
SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds
Minghao Chen,
Junyu Xie ,
Iro Laina ,
Andrea Vedaldi ,
CVPR , 2024  
arXiv /
bibtex /
project page /
code /
demo
We present a method, named SHAP-EDITOR, aiming at fast 3D editing. We propose to learn a universal editing function that can be applied to different objects within one second.
|
|
Training-Free Layout Control with Cross-Attention Guidance
Minghao Chen,
Iro Laina ,
Andrea Vedaldi ,
WACV, 2024  
arXiv /
bibtex /
project page /
code /
demo
We present a method for controlling the layout of images generated by large pre-trained text-to-image models by guiding the cross-attention patterns.
|
|
Expanding Language-Image Pretrained Models for General Video Recognition
Bolin Ni,
Houwen Peng ,
Minghao Chen,
Songyang Zhang,
Gaofeng Meng,
Jianlong Fu,
Shiming Xiang,
Haibin Ling,
ECCV, 2022   (Oral Presentation)
arXiv /
bibtex /
code
A new framework adapting language-image foundation models to general video recognition.
|
|
Searching the Search Space of Vision Transformer
Minghao Chen,
Kan Wu,
Bolin Ni,
Houwen Peng,
Bei Liu,
Jianlong Fu,
Hongyang Chao,
Haibin Ling,
NeurIPS, 2021
arXiv /
bibtex /
code
We propose to search the optimal search space of vision transformer models with AutoFormer training strategy.
|
|
Rethinking and Improving Relative Position Encoding for Vision Transformer
Kan Wu,
Houwen Peng,
Minghao Chen,
Jianlong Fu,
Hongyang Chao,
ICCV, 2021
arXiv /
bibtex /
code
A new relative position encoding methods dedicated to 2D images, considering directional relative distance modeling.
|
|
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen,
Houwen Peng ,
Jianlong Fu,
Haibin Ling,
ICCV, 2021
arXiv /
bibtex /
code /
A Once-for-all one-shot architecture search framework dedicated to vision transformer search.
|
|
One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking
Minghao Chen,
Houwen Peng,
Jianlong Fu,
Haibin Ling,
CVPR, 2021
arXiv /
bibtex /
code
We present a novel one-shot neural architecture method searching for optimal architectures for model ensemble.
|
Services
Reviewer
CVPR 2022 2023 2024, ECCV 2022, ICCV 2023, NeurIPS 2023, 2024, ICLR 2024, WACV 2024, 2025, 3DV 2024, ACM MM 2022, 2021
Teaching Assistant
- COMS 4246 Algorithm for Data science, Columbia University, Department of Computer Science, Fall 2019
- COMS 4731 Computer Vision, Columbia University, Department of Computer Science, Fall 2019
|
This website template is borrowed from Jon Barron. Thanks!
|
|