Research
I'm interested in computer vision, neural architecture search, generative model and 3D vision. My previous research is mainly about designing efficient and effective neural network automatically, while I am now focusing on generative models. Representative papers are highlighted.
|
|
PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models
Minghao Chen,
Roman Shapovalov,
Tom Monnier,
Jianyuan Wang,
David Novotny,
Iro Laina ,
Andrea Vedaldi ,
arXiv, 2024  
arXiv /
bibtex /
project page /
video
We introduce PartGen, a novel method for compositional/part-level 3D generation and reconstruction from various modalities, including text, image, 3D models. PartGen also enables 3D part editing.
|
|
DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing
Minghao Chen,
Iro Laina ,
Andrea Vedaldi ,
ECCV , 2024  
arXiv /
bibtex /
project page /
code
We introduce Direct Gaussian Editor (DGE), a novel method for fast 3D editing. We consider the task of 3D editing as a two-stage process, where the first stage focuses on achieving multi-view consistent 2D editing, followed by a secondary stage dedicated to precise 3D fitting.
|
|
SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds
Minghao Chen,
Junyu Xie ,
Iro Laina ,
Andrea Vedaldi ,
CVPR , 2024  
arXiv /
bibtex /
project page /
code /
demo
We present a method, named SHAP-EDITOR, aiming at fast 3D editing. We propose to learn a universal editing function that can be applied to different objects within one second.
|
|
Training-Free Layout Control with Cross-Attention Guidance
Minghao Chen,
Iro Laina ,
Andrea Vedaldi ,
WACV, 2024  
arXiv /
bibtex /
project page /
code /
demo
We present a method for controlling the layout of images generated by large pre-trained text-to-image models by guiding the cross-attention patterns.
|
|
Expanding Language-Image Pretrained Models for General Video Recognition
Bolin Ni,
Houwen Peng ,
Minghao Chen,
Songyang Zhang,
Gaofeng Meng,
Jianlong Fu,
Shiming Xiang,
Haibin Ling,
ECCV, 2022   (Oral Presentation)
arXiv /
bibtex /
code
A new framework adapting language-image foundation models to general video recognition.
|
|
Searching the Search Space of Vision Transformer
Minghao Chen,
Kan Wu,
Bolin Ni,
Houwen Peng,
Bei Liu,
Jianlong Fu,
Hongyang Chao,
Haibin Ling,
NeurIPS, 2021
arXiv /
bibtex /
code
We propose to search the optimal search space of vision transformer models with AutoFormer training strategy.
|
|
Rethinking and Improving Relative Position Encoding for Vision Transformer
Kan Wu,
Houwen Peng,
Minghao Chen,
Jianlong Fu,
Hongyang Chao,
ICCV, 2021
arXiv /
bibtex /
code
A new relative position encoding methods dedicated to 2D images, considering directional relative distance modeling.
|
|
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen,
Houwen Peng ,
Jianlong Fu,
Haibin Ling,
ICCV, 2021
arXiv /
bibtex /
code /
A Once-for-all one-shot architecture search framework dedicated to vision transformer search.
|
|
One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking
Minghao Chen,
Houwen Peng,
Jianlong Fu,
Haibin Ling,
CVPR, 2021
arXiv /
bibtex /
code
We present a novel one-shot neural architecture method searching for optimal architectures for model ensemble.
|
Services
Reviewer
CVPR 2022 2023 2024 2025, ECCV 2022 2024, ICCV 2023, NeurIPS 2023, 2024, 2025, ICML 2025, ICLR 2024 2025, WACV 2024, 2025, 3DV 2024, ACM MM 2022, 2021
Teaching Assistant
- COMS 4246 Algorithm for Data science, Columbia University, Department of Computer Science, Fall 2019
- COMS 4731 Computer Vision, Columbia University, Department of Computer Science, Fall 2019
|
This website template is borrowed from Jon Barron. Thanks!
|
|