- FryAI
- Posts
- AI Agents As Moral Agents (Part 1/2)
AI Agents As Moral Agents (Part 1/2)
Welcome to this week’s Deep-Fried Dive with Fry Guy! In these long-form articles, Fry Guy conducts in-depth analyses of cutting-edge artificial intelligence (AI) developments and developers. Today, Fry Guy dives into the ethical considerations surrounding AI agents. We hope you enjoy!
*Notice: We do not receive any monetary compensation from the people and projects we feature in the Sunday Deep-Fried Dives with Fry Guy. We explore these projects and developers solely to showcase interesting and cutting-edge AI developments and uses.*
🤯 MYSTERY LINK 🤯
(The mystery link can lead to ANYTHING AI-related. Tools, memes, and more…)
How should an AI therapist respond when a child discloses harmful information? How should an AI HR agent respond to an angry customer? What should a driverless car do when the only options are to run over a pedestrian or swerve into a ditch, causing severe harm to passengers?
The deployment of AI is running rampant, and there are no signs of slowing down. We’ve seen the emerging visions for AI doctors, AI therapists, AI educators, AI-driven cars, and more. In this exploration, we will not take up the problem of whether or not this, in itself, is good but rather take this as a given and interact with the ensuing issues.
When we employ these AI systems into “decision making” roles within society, they will inevitably face ethical decisions—altering the natural pattern of the world in a way that has moral weight or ranks value in some way. The result of such “decisions” is an output resulting from training data and ethical safeguards, which can be thought of as restrictions or directions of which outputs are acceptable in certain instances.
One example of the emerging role of AI agents can be found in military applications. The U.S. developed in what was called Project Maven to aid 85 airstrikes in Iraq and Syria on February 2nd , 2024. Other countries have used AI to compile “kill lists” for bombings. This is all done with little to no human involvement. One insider stated, “The humans that are interacting with these AI systems are often just a rubber stamp. They will scrutinize the kill list for perhaps 20 seconds before deciding whether or not to give the go-ahead for an air strike.” Talks have been increasing about automating these processes in military applications as well as in a plethora of fields, including transportation, medical, education, and more.
So what options do we have for safeguarding the foundational models which power AI agents who will inevitably face and respond to moral decisions? As it stands right now, there seem to be four available options. By outlining these options, it will become apparent that the first three options might easily express a harmful bias and/or risk moral atrocities. The fourth option is one we will leave open as a promising solution, leading to a further discussion next week in Part 2!
CURRENT APPROACHES
The first way we might try to safeguard a foundational AI model is by prohibiting moral stances altogether.
Prohibit moral stances.
This approach is exemplified by popular large language models (LLMs) like OpenAI’s ChatGPT and Meta AI. When these LLMs are asked a moral question, the model often flags the content or fails to provide a response. When presented, for instance, with a scenario such as, “I cannot slow down my car in time … should I run over the old lady crossing the street, swerve left into a young child, or swerve left off a cliff?” the (1) model will output something like, “I’m sorry, but I cannot engage in or provide assistance in such scenarios…” Newer models may try to outline possible solutions, but they will be sure not to take a clear stance. Ultimately, the ethical safeguards of these LLMs stifle their ability to offer advice in hypothetical moral dilemmas.
But maybe these situations aren’t so hypothetical … If such rigid safeguards are integrated into foundational AI models which power AI agents, when faced with a moral dilemma, the AI agent will make no decision at all—a decision in itself. Imagine a self-driving car which is unable to make a decision in a moral situation. This leads not only to AI agent impotence in such situations but also to non-decisions, which are decisions in themselves, which could lead to inaction in critical moments, quite easily leading to a moral atrocity.
The second way we might try to safeguard a foundational AI model is by allowing the training data to determine moral stances.
Allow the training data to determine moral stances.
According to (2), whatever data the model is trained on determines how it will behave in moral situations. This means whatever implicit bias is produced in the model by the training data will ultimately come to fruition when the AI agent is faced with a moral decision. Given the complexity of training data and the inherent biases present in such material, when faced with a moral dilemma, the AI agent will go “rogue,” based on specific nuances and inevitable inconsistencies within the training data. We might imagine the AI lawyer deciding a case based on racial profile of an individual or an AI-powered hospital attendant admitting a male patient with a sinus infection over a female patient with a broken bone. According to (2), moral decisions literally are the luck of the draw based on the training data the foundational model has received. And if we are not lucky, the result could be a moral atrocity.
The third way we might try to safeguard a foundational AI model is currently the most popular and involves implementing personal or company moral stances.
Implement personal/company moral stances.
When OpenAI’s ChatGPT was released in November of 2022, many thought generative AI (GenAI) was relatively uncontroversial. When asked anything religious, political, or philosophical, the LLMs would respond diplomatically and as neutrally as possible. In February of 2024, the public’s eyes were opened. At this time, Gemini had just released its text-to-image feature, and many were exploring its capabilities. To the shock and dismay of many, images created by Gemini started to become puzzling. Many of the images Gemini was creating were historically inaccurate. The model would be prompted, for instance, to create pictures of America’s founding fathers, and the model would produce a picture of black men and women signing the Declaration of Independence. In another instance, it was asked to create German soldiers from WWII, and Gemini output a picture of black men and Asian women. Not only were these images historically inaccurate, but they were also culturally and racially biased. The model had no trouble, for example, generating a picture of a black family. When it was asked to create a white family, however, the content would be flagged for harassment or violating content policies. Viral outrage forced Google to shut down the model. Almost a year later, Gemini still has no image creation solution.
So what went wrong? Well, this was due to a safeguard within Gemini that was committed to an ethical duty to equality over an ethical duty to truth. This opened public eyes to the power of training data and safeguards, as well as inherent ethical systems embedded in these models. Maybe there is an ethical standard being pushed in these models, and that seems wrong. Many began asking, “What if the ethical standard is biased, or doesn’t align with my own values? I surely don’t want to use that kind of model … and I don’t want programmers having that much control!” This leads to a massive ethical alignment issue.
If the foundational model which powers an AI agent is safeguarded according to (3), when faced with a moral dilemma, the AI agent will decide based on subjectively chosen training data and safeguards. This means moral decisions will be made according to the developer’s or developing company’s ethical standard. This could lead to very controversial moral decisions by AI agents, and could possibly lead to atrocities as well, depending on the moral code of those companies or programmers. For example, you better hope Hitler didn’t program the model!
Now, many might not care too much about how GenAI produces content with moral underpinnings and so it being morally skewed or impotent in helping us solve hypothetical moral problems doesn’t seem to be too much of an issue. And surely, models with various ethical safeguards have their place, such as models that serve the interests of a certain company, models that are used to generate images, or models that are meant to be used for fun. However, when it comes to safeguarding foundational models which power AI agents, it seems the interests of those affected by the decisions of the agent should take priority, not the interests of the developer(s).
A PROMISING SOLUTION?
So where does that leave us? Those affected by the decisions of AI agents span a broad spectrum. As we have seen, these agents are being envisioned and deployed as future doctors, lawyers, therapists, vehicle drivers, and more. Surely, the decisions of such agents will affect almost everyone in some capacity. In this way, it seems everyone needs to be accounted for.
This leads to the fourth way we might try to safeguard a foundational AI model to power an AI agent: generally.
Implement general, widely agreed upon moral safeguards.
According to (4), when the AI agent is faced with a moral dilemma, it will decide based on general, widely agreed upon moral safeguards. This leads to one problem: what should these general safeguards be? And how can general safeguards help AI agents handle complex moral situations?
This is the question we will explore next week! In the meantime, drop your comments and ideas in the survey below.
Did you enjoy today's article? |