AutoPartGen: Autogressive 3D Part
Generation and Discovery

Minghao Chen^1,2, Jianyuan Wang^1,2, Roman Shapovalov², Tom Monnier²
Hyunyoung Jung², Dilin Wang², Rakesh Ranjan², Iro Laina¹ Andrea Vedaldi^1,2

¹Visual Geometry Group, University of Oxford ²Meta AI

Paper Code 🤗 Demo

tl;dr: We introduce AutoPartGen, a pipeline that generates compositional 3D objects in an autoregressive manner. AutoPartGen can operate automatically or guided by 2D masks. We also show that AutoPartGen can be applied to scene and city generation.

Abstract

We introduce AutoPartGen, a model that generates objects composed of 3D parts in an autoregressive manner. This model can take as input an image of an object, 2D masks of the object's parts, or an existing 3D object, and generate a corresponding compositional 3D reconstruction. Our approach builds upon 3DShape2VecSet, a recent latent 3D representation with powerful geometric expressiveness. We observe that this latent space exhibits strong compositional properties, making it particularly well-suited for part-based generation tasks. Specifically, AutoPartGen generates object parts autoregressively, predicting one part at a time while conditioning on previously generated parts and additional inputs, such as 2D images, masks, or 3D objects. This process continues until the model decides that all parts have been generated, thus determining automatically the type and number of parts. The resulting parts can be seamlessly assembled into coherent objects or scenes without requiring additional optimization. We evaluate both the overall 3D generation capabilities and the part-level generation quality of AutoPartGen, demonstrating that it achieves state-of-the-art performance in 3D part generation.

Overview

AutoPartGen generates parts autoregressively. At each step, a 3D latent diffusion model generate the next part, conditioned on the previously generated parts \( z^{(1,\dots,k)} \), the overall object \( \tilde z \), and, optionally, an image \(I\) of the object and an image \( J^{(k)} \) of the part. The latent representation uses 3DShape2VecSet and the diffusion model is a DiT.

City Generation

Select a city from the gallery below to view the mesh.

Combined

Exploded

Empty

Complete

Combined

Exploded

Instructions: Click on any image to load the corresponding mesh pair.
Explode: Use the explode slider to separate parts in the right mesh.
Autoregressive part generation: Use the autoregressive slider to control the generation process of the parts.

Scene Generation

Select a scene from the gallery below to view the mesh.

Combined

Exploded

Empty

Complete

Combined

Exploded

Image to Parts

Select images from the gallery below to compare Combined and exploded meshes.

Combined

Exploded

Empty

Complete

Combined

Exploded

Object to Parts

Select an object from the gallery below to compare combined and exploded meshes.

Combined

Exploded

Empty

Complete

Combined

Exploded

Mask to Parts

Select a image and mask pair from the gallery below to compare combined and exploded meshes.

Combined

Exploded

Empty

Complete

Combined

Exploded

BibTeX

@article{chen2025autopartgen,
  title={AutoPartGen: Autogressive 3D Part Generation and Discovery},
  author={Minghao Chen and Jianyuan Wang and Roman Shapovalov and Tom Monnier and Hyunyoung Jung and Dilin Wang and Rakesh Ranjan and Iro Laina and Andrea Vedaldi},
  journal={arXiv preprint arXiv:2507.13346},
  year={2025}
}