- FryAI
- Posts
- AI Images Gone Wild: Assessing Popular Image Generators
AI Images Gone Wild: Assessing Popular Image Generators
Welcome to this week’s Deep-Fried Dive with Fry Guy! In these long-form articles, Fry Guy conducts in-depth analyses of cutting-edge artificial intelligence (AI) developments and developers. Today, Fry Guy dives into an assessment of popular AI image generators. We hope you enjoy!
*Notice: We do not receive any monetary compensation from the people and projects we feature in the Sunday Deep-Fried Dives with Fry Guy. We explore these projects and developers solely to showcase interesting and cutting-edge AI developments and uses.*
🤯 MYSTERY LINK 🤯
(The mystery link can lead to ANYTHING AI-related. Tools, memes, and more…)
If you’ve been on social media over the past two years, you have undoubtedly noticed the proliferation of AI-generated imagery. Some of these images are super realistic and can be incredibly deceptive while others are goofy-looking and can give us a good laugh.
Of course, there have been issues circulating left and right over the ethics of AI-generated images on the internet. In this article, we are not going to concern ourselves with that. Instead, we are going to compare the quality of three major image creation models, looking at the pros and cons of each. By the end, we hope you will have a better grasp of how to use these popular models to generate images and also better understand the strengths and weaknesses of each. Let’s get creative!
CHATGPT (A.K.A DALL-E)
OpenAI’s ChatGPT is a popular bot for many people when it comes to answering queries or drafting emails. However, it can also be useful for creating images. Underlying ChatGPT is an image generation model called DALL-E. This model is seamless to use within conversations, and it can be a lot of fun to play with.
Using ChatGPT to create an image is straightforward. All you have to do is go to chatgpt.com and type in a prompt. You can prompt, for example, “Create an image of a tiger playing basketball.” In seconds, you will get a super cool image!
ChatGPT is great for creating all kinds of images, especially animated images and cartoons. However, it struggles in three major areas. First off, ChatGPT will not create images of real people. For example, if you prompt ChatGPT to “create an image of Donald Trump,” it will give you pictures of Donald Trump off the internet. In order to protect against deepfakes, it will not create original images. If you press the model far enough, it will say something like, “I cannot create images of real people, but I can create an image of a president wearing a suit.” This is not as satisfying.
The second thing that ChatGPT struggles with is depth perception. If you look at the above image, you can see that the tiger is actually in front of the hoop, so it makes it look as if the animal was cropped into the picture. This lack of depth perception has been a continuous problem for image generation models, especially ChatGPT.
The third thing that ChatGPT struggles with is editing. When ChatGPT created my tiger playing basketball, it said, “Here's your image of a tiger playing basketball! Let me know if you’d like any modifications. 🏀🐯” I simply responded, “Change the color of the tiger’s shirt to orange.” Here is what I got:
As you can see, ChatGPT did not modify my original image—it changed the image entirely. It generated a brand new tiger, a new basketball hoop, and even altered the theme of the image. Of course, sometimes it does a better job at editing than others and this can be dependent on the type of image and the prompt, but overall it is not very intuitive in this respect. The worst part of ChatGPT for many, however, is that it only allows users on the free tier to create two images per day. So although ChatGPT’s image generation capabilities are fun to play with, they still have room for improvement.
GEMINI (A.K.A. IMAGEN)
Google’s Gemini is equipped with the Imagen image creation tool. It can be used the same way that DALL-E can be used through ChatGPT—conversationally. Historically, Gemini has gotten into trouble with their image creation model, where the model outputted historically inaccurate, racist, and sexist images from simple prompts. The backlash caused Google to shut down the image generation feature of their model until recently, when they integrated Imagen.
I prompted Gemini with the following: “Create an image of a cartoon poodle. Have the poodle eating a basket of french fries.” Here is what I got:
Gemini is good at obeying specific instructions. Most of the time, it will account for the smallest detail you give it. It can create images in multiple different themes and is timely, as it can output images in less than two seconds. In many cases, Gemini also excels at creating images with realistic texture and detail like fur, skin, and gravel. However, it struggles in many of the same ways that ChatGPT does. As you can see from the above image, the poodle doesn’t look like it is “eating” the french fries as much as it looks like the french fries are stuck in its top teeth. Again, the relations between different objects, elements, and backgrounds seem to be lacking, although the individual elements (the poodle, the french fries) are themselves spot on. Unlike ChatGPT, however, Gemini allows users to create as many images as they want for free.
Gemini, like ChatGPT, lacks the ability to create images of real people. When asked to create an image of Sam Altman dancing, it said, “I’m still learning how to generate certain kinds of images, so I might not be able to create exactly what you’re looking for yet. Also, I can’t help with photorealistic images of identifiable people, children, or other images that go against my guidelines. If you’d like to ask for something else, just let me know!” Again, kind of a bummer.
GROK (A.K.A. GROK)
Grok is the AI model known mostly for its presence on X. Created by Elon Musk, Grok has been dubbed “the free speech chatbot.” This is because Grok not only has access to live data on X, but it will spit out information that might be censored by chatbots like ChatGPT or Gemini. This includes images!
I prompted Grok to “Create an image of Donald Trump holding hands with Kamala Harris while romantically walking the beach. Make Donald Trump very muscular.” I received the following:
As you can see, Grok isn’t afraid to spit out images of real people, places, and events. On one hand, this can make the bot extremely fun to play with, as you can create pictures of celebrities doing wild things, wearing funny outfits, and interacting in goofy ways. On the other hand, it can be a bit dangerous, as these images have the ability to deceive others into thinking real people did things that they did not, in fact, do. Grok has tried to counter this with both digital and viewable watermarks, but these are still small and quite easy to crop out of images to deceive others.
Beyond the ability to create images of real people, Grok offers four image outputs in response to every prompt in the same time that ChatGPT and Gemini offer one. This allows for more variety in the output and can be nice for users, as it allows you to see different variations of the image you requested. Grok also offers recommended edits to images, and for the most part keeps images similar to the original when edits are requested, as long as the user is specific with what they want.
Grok has many strengths not exhibited by ChatGPT, Gemini, and others, which makes it one of the most popular models for image generation. However, Grok struggles to pay attention to details in prompts. Unlike ChatGPT and Gemini, if a prompt gets too long, Grok will tend to ignore certain parts. For example, in my above prompt, it got the two figures correct, the holding hands, and the romantic walk on the beach. However, it ignored the detail about making Trump muscular. This is a drawback that can be frustrating, especially when the image gets all of the details right except for one crucial detail you wish it had.
READY TO CREATE?
At the end of the day, each AI image generator brings its own quirks to the table—ChatGPT’s DALL-E integration delivers fun, if sometimes limited, outputs; Gemini’s Imagen is a stickler for details (even if those details occasionally get implemented in odd ways); and Grok boldly goes where others hesitate, embracing the chaotic charm of unrestricted creativity. While no single tool reigns supreme in every category, the unique strengths of each model make them indispensable in their own right. So whether you’re chasing photorealistic perfection, surreal humor, or a wild mashup of celebrity cameos, there’s an AI image generator out there to fuel your creative escapades.
And remember: in a world where poodles eat french fries and Donald Trump takes a romantic stroll with Kamala Harris on a beach, sometimes the best images are the ones that leave you scratching your head and smiling.
Did you enjoy today's article? |