Image 2 – Google DeepMind


Our most advanced text-to-image technology

Imagen 2 is our most advanced text-to-image publishing technology, delivering high-quality photorealistic output that is closely aligned and consistent with the user prompt. It can create more realistic images using a normal distribution of its training data, rather than adopting a pre-programmed pattern.

Imagen 2’s powerful text-to-image technology is available to developers and cloud customers via Imagen API in Google Cloud Vertex AI.

The Google Arts & Culture team is also deploying our Imagen 2 technology in… Cultural icons Experience, allowing users to explore, learn and test their cultural knowledge with the help of Google AI.

Improve understanding of image captions

Text-to-image models learn how to generate images that match a user’s prompt from details in the images and captions of their training datasets. But the quality of detail and accuracy in this pairing can vary widely for each image and caption.

To help create higher-quality, more accurate images that better align with the user prompt, more description has been added to the image captions in the Imagen 2 training dataset, helping Imagen 2 learn different styles of captions and generalize to better understand a wider range of user prompts. better.

See also  This Oregon arcade has been named the best pinball venue in the world

These improved image-caption pairings help Imagen 2 better understand the relationship between images and words – increasing its understanding of context and nuance.

Here are examples of quick understanding of Imagen 2:

Generate more realistic images

The Imagen 2 dataset and model developments have brought improvements in several areas where text-to-image tools often struggle, including rendering realistic human hands and faces and keeping images free of distracting visual effects.

Examples of Imagen 2 generating realistic human hands and faces.

AI-generated images using the “flower” vector, with lower aesthetic scores (left) to higher aesthetic scores (right).

Fluid pattern adaptation

Imagen 2’s diffusion-based technologies provide a high degree of flexibility, making it easy to control and adjust the image style. By providing reference style images with a text prompt, we can adapt Imagen 2 to create new images that follow the same style.

Visualization of how Imagen 2 makes it easier to control output style using reference images alongside a text prompt.

Advanced painting and exterior painting

Imagen 2 also allows image editing capabilities such as “inpainting” and “outpainting”. By providing a reference image and an image mask, users can create new content directly in the original image using a technique called inpainting, or extend the original image beyond its boundaries using outpainting. This technology is planned for Google Cloud’s Vertex AI in the new year.

Imagen 2 can create new content directly in the original image using inpainting.

Imagen 2 can expand the original image beyond its borders through overpainting.

Responsible for design

To help mitigate the potential risks and challenges of text-to-image technology, we have strong guardrails in place, from design, development to deployment in our products.

Imagen 2 is integrated with SynthID, our cutting-edge toolset for watermarking and identifying AI-generated content, enabling permitted Google Cloud customers to add an imperceptible digital watermark directly to image pixels, without compromising image quality . This allows the watermark to remain discoverable by SynthID, even after applying modifications such as filters, cropping, or saving using lossy compression schemes.

Before we release capabilities to users, we conduct robust security testing to reduce the risk of harm. From the beginning, we invested in data integrity training for Imagen 2, and added technical guardrails to reduce problematic output such as violent, abusive, or sexually explicit content. We apply sanity checks to the training data and input and output claims that the system generates at build time. For example, we implement comprehensive security filters to avoid creating potentially problematic content, such as images of named individuals. As we expand the capabilities and launch of Imagen 2, we are also continually evaluating it for safety.

See also  Leaked schematics of the iPhone 16 Pro show the capture button

Thanks and appreciation

This work was made possible thanks to key research and engineering contributions from:

Aaron van den Oord, Ali Rizvi, Benigno Oria, Çağlar Unlu, Charlie Nash, Chris Wolfe, Conor Durkan, David Ding, Dowd Gurney, Evgeni Gladchenko, Felix Riedel, Hang Qi, Jacob Kelly, Jacob Bauer, Jeff Donahue, Junlin Zhang, Mateusz Malinowski, Mikołaj Binkowski, Pauline Luke, Robert Riacci, Robin Strudel, Sander Dielemann, Tobina Peter Igoe, Jaroslaw Janin, Zach Eaton-Rosen.

Thanks to: Ben Bariach, Don Bloxwich, Ed Hirst, Elspeth White, Gemma Jennings, Jenny Brennan, Komal Singh, Louis C. Kubo, Miaozen Wang, Nick Pizzuti, Nicole Breshtova, Nidhi Vyas, Nina Anderson, Norman Casagrande, Sasha Braun, Sven Jawwal, Tulsi Doshi, Will Hawkins, Yelin Kim, Zahra Ahmed for delivery leadership; Douglas Ek, Nando De Freitas, Oriol Viñales, Eli Collins, Demis Hassabis for their advice.

Thanks also to the many others who contributed via Google DeepMind, including our partners at Google.

Leave a Reply

Your email address will not be published. Required fields are marked *