About
The project supports research on image editing through requests expressed in natural language. Image editing has, so far, required some degree of mastery of image editing tools. Allowing image editing to be performed by describing the desired alteration in terms of unconstrained natural language will unlock access for many more users, besides being an interesting research challenge in and of itself.
The Challenge
The main challenge has two parts: first, to understand what is being requested; and second, to perform the requested edit while avoiding doing any other alterations to the image.
The Solution
The solution combines existing systems in a novel, modular way. The understanding of the natural language request is handled by a generative large language model, while the generation of the editing image is done by a image diffusion model. A distinguishing feature of our approach is that no model training is necessary, that is the underlying systems are used as-is, and thus the results will improve as better versions of these systems are developed.
Services Provided
INCD provided computation resources under the project “Language Driven Image Design with Diffusion”, funded by FCT (2022.15880.CPCA.A1).
Impact
The INCD resources were used to run some of the experiments reported in: R. Santos, J. Silva, and A. Branco, 2024, “Leveraging LLMs for On-the-fly Instruction Guided Image Editing”, in Proceedings of EPIA 2024 (International Conference on Artificial Intelligence).
Partners Involved
- PORTULAN CLARIN – Research Infrastructure for the Science and Technology of Language, funded by Lisboa 2020, Alentejo 2020 and FCT (PINFRA/22117/2016)
- ACCELERAT.AI – Multilingual Intelligent Contact Centers, funded by IAPMEI (C625734525-00462629);
- IMPROMPT – Image Alteration with Language Prompts, funded by FCT (CPCA-IAC/AV/590897/2023)