DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model

1Dept. of Electrical and Computer Engineering, 2Interdisciplinary Program in Artificial Intelligence *These authors contributed equally to this work
Seoul National University, Korea

DITTO-NERF 3D NeRF models generated from (a), (b) the text "a hamburger" and (c) (d) a reference image along with the text "a yellow fire hydrant" using our DITTO-NeRF and prior arts Stable-Dreamfusion and NeuralLift-360.

Abstract

The increasing demand for high-quality 3D content creation has motivated the development of automated methods for creating 3D object models from a single image and/or from a text prompt. However, the reconstructed 3D objects using state-of-the-art image-to-3D methods still exhibit low correspondence to the given image and low multi-view consistency. Recent state-of-the-art text-to-3D methods are also limited, yielding 3D samples with low diversity per prompt with long synthesis time. To address these challenges, we propose DITTO-NERF, a novel pipeline to generate high-quality 3D NeRF model from a text prompt or a single image. Our DITTO-NERF consists of constructing high-quality partial 3D object for limited in-boundary (IB) angles using the given or text-generated 2D image from the frontal view and then iteratively reconstructing the remaining 3D NeRF using inpainting latent diffusion model. We propose progressive 3D object reconstruction schemes in terms of scales (low to high resolution), angles (IB angles initially to outer-boundary (OB) later) and masks (object to background boundary) in our DITTO-NERF so that high-quality information on IB can be propagated into OB. Our DITTO-NERF outperforms state-of-the-art methods in terms of fidelity and diversity qualitatively and quantitatively with much faster training times than prior arts on image / text-to-3D such as Dreamfusion, NeuralLift-360.

Video

Convert single image into 3D output

Cabin

Reference image

Diet-NeRF

NeuralLift-360

Ours(DITTO-NERF)

Apple

Reference image

Diet-NeRF

NeuralLift-360

Ours(DITTO-NERF)

Convert text prompt into 3D output

Suitcase

Stable-Dreamfusion

Latent-NeRF

Ours(DITTO-NERF)

Astronaut

Stable-Dreamfusion

Latent-NeRF

Ours(DITTO-NERF)

Additional results