Google’s artificial intelligence (AI) photo editing tool enables users to employ descriptive language to specify the desired modifications.
Artificial intelligence and machine learning have long been the focal points of Google’s endeavors. The keynote address at I/O 2023 underscored this commitment. Among the captivating applications of this technology is image creation, and Google’s efforts have materialized in the form of Imagen—an image generation tool based on textual input, similar to Midjourney and DALL-E 2. Presently, Google is presenting its research on Imagen Editor, a tool that allows localized photo editing through textual prompts and basic sketches.
Google’s Imagen utility is already proficient at generating images entirely based on textual prompts. However, if the results do not meet your expectations, you typically have to restructure your prompt, refine it, and make another attempt with the image generator. This is because Imagen does not currently support editing specific elements of the generated images that you may be dissatisfied with. To address this limitation, Google has recently shared research on Imagen Editor and EditBench. Although these utilities are still in beta, they are capable of guiding edits using text prompts.
Rather than creating new images from prompts, Imagen Editor requires a photo that requires editing, a text prompt defining the desired changes, and a masked region indicating where the edits should be applied. As a result, the edits are limited to the defined region and are tailored to the provided prompt. Furthermore, the results are exceptionally realistic and natural.
The masked region and the results obtained with Imagen Editor include a bouquet of red flowers, two trees, an Imagen Editor sign, a bush with green leaves, and a bush without leaves.
Technically referred to as inpainting, the process employed by Google’s new tool is akin to image restoration or, in simpler terms, a combination of Google AI and Adobe Photoshop’s Content Aware Fill. The researchers have developed new encoders for Imagen Editor and have also incorporated an object detection module into the AI to compensate for incomplete or inaccurate masks.
The research also encompasses a tool called EditBench, which evaluates the results of text-guided inpainting. This benchmark, based on a dataset of 240 images, assesses edits made to both human-made and AI-generated images using parameters such as modified objects, their attributes (shape, size, number), and suitability for the scene. Google observed that object masking enhances image-text alignment, making Imagen Editor superior to alternatives such as DALL-E 2 and StableDiffusion in all categories assessed by EditBench.
Regrettably, Google has expressed concerns regarding the responsible use of AI, which is why they have decided not to release Imagen Editor to the general public. The company has recently proposed a framework to ensure the responsible development of AI, and it is hoped that certain limits can be established before granting people access to tools like Imagen Editor. On a positive note, EditBench is available in its entirety, free of charge, to aid further AI research. Meanwhile, there is optimism that the base model, Imagen, will soon be integrated into Gboard.