Minghao Chen

Hi, I am currently a third-year DPhil student at Visual Geometry Group, Oxford, advised by Prof. Andrea Vedaldi and Dr. Iro Laina.

This summer, I am now doing a research scientist internship at Meta GenAI, London!

Before that, I was a Ph.D. student at Stony Brook University, supervised by Prof. Haibin Ling from 2020 to 2022. I interned at MSR Asia working with Dr. Houwen Peng form 2020 to 2021. I received my M.S. from Columbia University in 2020 and B.S. from Beihang University in 2018.

I am always open to new opportunities and collaborations. Feel free to contact me!

Email  /  Google Scholar  /  Github

profile photo
Research

I'm interested in computer vision, neural architecture search, generative model and 3D vision. My previous research is mainly about designing efficient and effective neural network automatically, while I am now focusing on generative models. Representative papers are highlighted.

b3do DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing
Minghao Chen, Iro Laina , Andrea Vedaldi ,
ECCV , 2024  
arXiv / bibtex / project page / code

We introduce Direct Gaussian Editor (DGE), a novel method for fast 3D editing. We consider the task of 3D editing as a two-stage process, where the first stage focuses on achieving multi-view consistent 2D editing, followed by a secondary stage dedicated to precise 3D fitting.

b3do SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds
Minghao Chen, Junyu Xie , Iro Laina , Andrea Vedaldi ,
CVPR , 2024  
arXiv / bibtex / project page / code / demo

We present a method, named SHAP-EDITOR, aiming at fast 3D editing. We propose to learn a universal editing function that can be applied to different objects within one second.

b3do Training-Free Layout Control with Cross-Attention Guidance
Minghao Chen, Iro Laina , Andrea Vedaldi ,
WACV, 2024  
arXiv / bibtex / project page / code / demo

We present a method for controlling the layout of images generated by large pre-trained text-to-image models by guiding the cross-attention patterns.

b3do Expanding Language-Image Pretrained Models for General Video Recognition
Bolin Ni, Houwen Peng , Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling,
ECCV, 2022   (Oral Presentation)
arXiv / bibtex / code

A new framework adapting language-image foundation models to general video recognition.

b3do Searching the Search Space of Vision Transformer
Minghao Chen, Kan Wu, Bolin Ni, Houwen Peng, Bei Liu, Jianlong Fu, Hongyang Chao, Haibin Ling,
NeurIPS, 2021
arXiv / bibtex / code

We propose to search the optimal search space of vision transformer models with AutoFormer training strategy.

b3do Rethinking and Improving Relative Position Encoding for Vision Transformer
Kan Wu, Houwen Peng, Minghao Chen, Jianlong Fu, Hongyang Chao,
ICCV, 2021
arXiv / bibtex / code

A new relative position encoding methods dedicated to 2D images, considering directional relative distance modeling.

AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen, Houwen Peng , Jianlong Fu, Haibin Ling,
ICCV, 2021
arXiv / bibtex / code /

A Once-for-all one-shot architecture search framework dedicated to vision transformer search.

b3do One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking
Minghao Chen, Houwen Peng, Jianlong Fu, Haibin Ling,
CVPR, 2021
arXiv / bibtex / code

We present a novel one-shot neural architecture method searching for optimal architectures for model ensemble.

Services

Reviewer

CVPR 2022 2023 2024, ECCV 2022, ICCV 2023, NeurIPS 2023, 2024, ICLR 2024, WACV 2024, 2025, 3DV 2024, ACM MM 2022, 2021

Teaching Assistant

  • COMS 4246 Algorithm for Data science, Columbia University, Department of Computer Science, Fall 2019
  • COMS 4731 Computer Vision, Columbia University, Department of Computer Science, Fall 2019

This website template is borrowed from Jon Barron. Thanks!