• FryAI
  • Posts
  • Behind Autonomous AI: An Inside Look at Auto-GPT with Its Key Developer

Behind Autonomous AI: An Inside Look at Auto-GPT with Its Key Developer

Welcome to this week’s Deep-fried Dive with Fry Guy! In these long-form articles, Fry Guy conducts an in-depth analysis of a cutting-edge AI development. Today, our dive is about Auto-GPT. We hope you enjoy!

Have you ever thought about what life might look like if you had your own personal helper, available 24/7 to accomplish tasks on your behalf? This is the idea behind Auto-GPT.

Auto-GPT is an experimental AI agent with the goal of accomplishing directed tasks which are assigned by a user. At its core, it works by splitting a problem apart and attempting to solve it step-by-step with the guidance of the user.

The project was founded by Tauren Bruce Richard and now boasts a team of over 300 volunteer developers who contribute to the MIT-licensed, open-source project. This growth has all occurred in the past few months, since developments on the project began. To say this project has exploded in popularity is an understatement. It currently has more Github stars than Node.js, React Native, and Javascript to name a few.

HOW DOES AUTO-GPT WORK?

After starting up the software, the user provides a prompt to the AI algorithm, describing what kind of task they would like accomplished. This initiates what is called the “agent loop.” The agent loop involves planning what steps need to be taken to accomplish the given task and then executing those steps. Then, the AI will plan and execute again … and again. This process is looped until the AI reaches its goal.

Part of the agent loop includes a step where Auto-GPT will ask the user for clarification and feedback. This can be optionally turned off in “continuous mode“ where the AI just completes the task without clarifying steps along the way with the user, however, this runs a higher risk of the task being completed poorly or inaccurately and can increase bills. These loops can take anywhere from 5 seconds to 20-30 seconds dependent on the amount of steps being undertaken by the AI and bandwidth capabilities.

The Auto-GPT team currently doesn’t do any data reporting, but they are working on implementing such processes to help detect bugs and improve the project. Because user prompts are not tracked, it’s not clear what Auto-GPT is being used for by the public. However, it is suspected that most use cases are prosumer cases, such as helping users write a massive amount of emails or doing research on a given subject and producing a report.

The tasks that can be done by Auto-GPT are up to the imagination of the user. They can be anything from reading or writing a file to interacting with an application programming interface (API) or creating another agent to do tasks. Some tasks work better than others as of now, but experimenting with more advanced and nuanced tasks is a key step along the experimentation pathway for the project.

Much of the success of a given task has to do with the way prompts are formulated by the user. The default mode will ask for a one-line prompt from the user to generate an objective and goals, or these goals can be generated manually (which tends to be more effective). If one would like to use more effective prompts, Auto-GPT has a page on their Discord channel for this.

People have been building plug-ins for a vast array of fascinating tasks, from working with Auto-GPT in telegram to having AI read and respond to the user’s text messages. Nicholas Tindle, a leading developer on Auto-GPT, said his favorite plug-in is one that tells the user how many astronauts are in space at any given time. He believes this sort of plug-in shows how Auto-GPT can allow developers to create things they never thought possible.

HOW MUCH DOES AUTO-GPT COST?

Auto-GPT does not cost any money in-and-of-itself, but the project calls out to various APIs that do. The most prominent one is OpenAI. OpenAI charges the user for each API call which can range from 3-15 cents, depending on how much is sent and how much is replied. These costs can add up pretty quickly, as each agent loop can include 3-5 calls. So each step can cost anywhere from 9 cents to a dollar. For this reason, it’s important for the end user to setup OpenAI API spending limits to mitigate this risk.

Nicholas Tindle says the Auto-GPT team is currently working on ways to lower the cost for users, such as enabling the user to change API endpoints, so users can have access to cheaper tools. Prompt flexibility is another focus of the Auto-GPT team. As prompting gets significantly easier and large language model (LLM) providers compete for a stake in the game, the number of unsuccessful prompts will likely diminish, decreasing the cost of Auto-GPT usage.

HOW CAN AUTO-GPT IMPROVE?

Currently, Tindle believes one of the biggest issues with Auto-GPT is its convoluted codebase. The plug-in architecture of Auto-GPT was originally designed without much context on the type of things people would want to change out. The software currently operates off a system which allows for the interruption of the code-operate flow at a multitude of points (about 15 or so) and change it. So the code flow can respond to different things, and the expectations of what can be responded to are complicated.

Tindle said this makes the process difficult for developers who want to build on top of the project. “Building a plug-in is, at best, hard. Swapping out our LLM provider is basically impossible.” He said the team is working to resolve this, as they have “just recently merged a core of a new re-achitechted system that is easier to contribute to, easier to to swap components out of, and overall less tightly coupled into a spaghetti ball and more into a well-interfaced system.”

The current install process, which involves a complicated local download, is also non-trival for the average user. Tindle doesn’t see this as much of a long-term issue, however, as there is already a waitlist for a simpler, web user interface. Tindle said the team is currently working on a lot of stuff to make that possible. The main part, he said, was the rearchitecture as previously mentioned, “to be able to swap out the user-facing portions (which is right now a command-line application) with something more web-facing, like an API.”

WHAT IS THE FUTURE OF AUTO-GPT?

The potential of Auto-GPT lies in the ability of the project to make a variety of existing processes more productive and efficient.

Auto-GPT has massive potential for both developers and consumers. As a tool for developers, the re-architecture of Auto-GPT will allow developers to build their projects more easily on top of the platform without having to deal with extraneous issues. For consumers, Tindle sees massive potential for continued growth on the prosumer side, as a tool to help with research and workflows. Long term, he thinks that the potential is up to the imagination of individuals who can find a way to develop plug-ins for their tasks.

A non-experimental version of this project has yet to be released, but with the massive growth of the contributing team and the growing popularity of the project, this seems like a very real possibility in the future. As one of the top 25 largest project on Github (that’s all projects, not just AI projects), Auto-GPT is continuing to soar in development and popularity. As Tindle says, “We have ridden a rocket to the top, and we are hoping it doesn’t stop any time soon.”