Publications
Here are my publications (including preprints).
2024
- Hayeon Kim, Gwanghyun Kim, Hoigi Seo, Dong Un Kang, and 1 more authorIn European Conference on Computer Vision (ECCV), 2024ECCV
Generating higher-resolution human-centric scenes with de- tails and controls remains a challenge for existing text-to-image diffusion models. This challenge stems from limited training image size, text en- coder capacity (limited tokens), and the inherent difficulty of generat- ing complex scenes involving multiple humans. While current methods attempted to address training size limit only, they often yielded human- centric scenes with severe artifacts. We propose BeyondScene, a novel framework that overcomes prior limitations, generating exquisite higher- resolution (over 8K) human-centric scenes with exceptional text-image correspondence and naturalness using existing pretrained diffusion mod- els. BeyondScene employs a staged and hierarchical approach to initially generate a detailed base image focusing on crucial elements in instance creation for multiple humans and detailed descriptions beyond token limit of diffusion model, and then to seamlessly convert the base image to a higher-resolution output, exceeding training image size and incorporating details aware of text and instances via our novel instance-aware hierar- chical enlargement process that consists of our proposed high-frequency injected forward diffusion and adaptive joint diffusion. BeyondScene sur- passes existing methods in terms of correspondence with detailed text descriptions and naturalness, paving the way for advanced applications in higher-resolution human-centric scene creation beyond the capacity of pretrained diffusion models without costly retraining.
- Hayeon Kim Dongwon Park, and Se Young ChunIn European Conference on Computer Vision (ECCV), 2024ECCV
Recently, pre-trained model and efficient parameter tuning have achieved remarkable success in natural language processing and high-level computer vision with the aid of masked modeling and prompt tuning. In low-level computer vision, however, there have been limited in- vestigations on pre-trained models and even efficient fine-tuning strategy has not yet been explored despite its importance and benefit in various real-world tasks such as alleviating memory inflation issue when inte- grating new tasks on AI edge devices. Here, we propose a novel efficient parameter tuning approach dubbed contribution-based low-rank adap- tation (CoLoRA) for multiple image restorations along with effective pre-training method with random order degradations (PROD). Unlike prior arts that tune all network parameters, our CoLoRA effectively fine- tunes small amount of parameters by leveraging LoRA (low-rank adap- tation) for each new vision task with our contribution-based method to adaptively determine layer by layer capacity for that task to yield comparable performance to full tuning. Furthermore, our PROD strat- egy allows to extend the capability of pre-trained models with improved performance as well as robustness to bridge synthetic pre-training and real-world fine-tuning. Our CoLoRA with PROD has demonstrated its superior performance in various image restoration tasks across diverse degradation types on both synthetic and real-world datasets for known and novel tasks.
2023
- Hayeon Kim, Hoigi Seo, Gwanghyun Kim, and Se Young ChunIn , 2023arXiv
The increasing demand for high-quality 3D content creation has motivated the development of automated methods for creating 3D object models from a single image and/or from a text prompt. However, the reconstructed 3D objects using state-of-the-art image-to-3D methods still exhibit low correspondence to the given image and low multi-view consistency. Recent state-of-the-art text-to-3D methods are also limited, yielding 3D samples with low diversity per prompt with long synthesis time. To address these challenges, we propose DITTO-NERF, a novel pipeline to generate high-quality 3D NeRF model from a text prompt or a single image. Our DITTO-NERF consists of constructing high-quality partial 3D object for limited in-boundary (IB) angles using the given or text-generated 2D image from the frontal view and then iteratively reconstructing the remaining 3D NeRF using inpainting latent diffusion model. We propose progressive 3D object reconstruction schemes in terms of scales (low to high resolution), angles (IB angles initially to outer-boundary (OB) later) and masks (object to background boundary) in our DITTO-NERF so that high-quality information on IB can be propagated into OB. Our DITTO-NERF outperforms state-of-the-art methods in terms of fidelity and diversity qualitatively and quantitatively with much faster training times than prior arts on image / text-to-3D such as Dreamfusion, NeuralLift-360.