FryAI
Posts
"Humanity's Final Exam"

"Humanity's Final Exam"

Hunter Kallay
September 19, 2024

Good morning, fry friends. It’s Thursday—so close to FRY-day we can practically taste it! Let’s fry up some AI insights to make it through this final stretch. 🍟

🤯 MYSTERY AI LINK 🤯

(The mystery link can lead to ANYTHING AI-related: tools, memes, articles, videos, and more…)

Today’s Menu

Appetizer: Introducing “Humanity’s Last Exam” 📝

Entrée: Commerce Dept. proposes mandatory AI reporting 🫡

Dessert: Google’s DataGemma looks to reduce AI hallucinations 😵‍💫

🔨 AI TOOLS OF THE DAY

👂 Earkick: Your personal mental health chatbot. → Check it out

🧳 Rakun Packing: Have AI help you pack for your next trip. → Check it out

💬 Scripe: Create viral LinkedIn posts in seconds. → Check it out

INTRODUCING “HUMANITY’S LAST EXAM” 📝

Q: What wild animal does well on exams, despite not studying?

A: The cheetah. 🐆

What’s going on? A group of technology experts has launched a global initiative called “Humanity's Last Exam,” seeking the most challenging questions to test AI systems.

Why? Current AI models have easily handled widely used benchmark tests, leading experts to seek more rigorous and nuanced challenges. For example, while models like OpenAI’s o1 excel at reasoning benchmarks, they struggle with tasks requiring abstract reasoning and planning, such as pattern-recognition puzzles. By raising the bar, we can better assess emerging models and challenge them against a higher level of human expertise and reasoning.

How will this work? Organized by the non-profit Center for AI Safety (CAIS) and the startup Scale AI, “Humanity’s Last Exam” will feature 1,000 peer-reviewed, crowd-sourced questions, with monetary prizes for top contributors. The project aims to determine when AI has reached expert-level capabilities and is designed to stay relevant as AI capabilities continue to evolve, ensuring that future tests remain challenging for advanced systems. The organizers emphasize that the exam will exclude questions about weapons and other sensitive topics for ethical reasons.

“We desperately need harder tests for expert-level models to measure the rapid progress of AI.”

-Alexandr Wang, CEO of Scale AI

COMMERCE DEPT. PROPOSES MANDATORY AI REPORTING 🫡

More government, more problems? 🤷‍♀️

What happened? The U.S. Department of Commerce’s Bureau of Industry and Security (BIS) has proposed new mandatory reporting rules for developers of advanced AI models and cloud service providers.

Why? These regulations aim to bolster national defense by requiring companies to report on AI development activities, cybersecurity measures, and red-teaming tests, which assess risks like AI misuse in cyberattacks or dangerous weapons creation.

What are the concerns? Like most regulations, tech companies are pushing back on these mandates. Rules like this may increase operational costs for companies, potentially slowing innovation. As a result of such mandates, businesses will likely need to expand compliance teams and enhance data management practices to meet reporting requirements. These concerns have led analysts to raise questions about the balance between maintaining national security and encouraging innovation. While the regulations are designed to safeguard against AI risks, they may also stifle creativity in the tech industry, a challenge that the U.S. must navigate carefully moving forward. We don’t want to fall behind, but we don’t want to forge ahead recklessly.

GOOGLE’S DATAGEMMA LOOKS TO REDUCE AI HALLUCINATIONS 😵‍💫

We’re releasing DataGemma: open models that enhance LLM factuality by grounding them with real-world data from @Google's Data Commons. 💡
It tackles hallucinations in AI models to generate more accurate and useful responses.
Here’s how they work 🧵 dpmd.ai/47nWbvKx.com/i/web/status/1…
— Google DeepMind (@GoogleDeepMind)
11:26 AM • Sep 13, 2024

Q: What kind of shoes does Google wear?

A: Re-boots. 🥾

What’s up? Google is working to develop DataGemma, a model aimed at eliminating hallucinations, especially on sensitive topics that inform decision making.

How does this work? To combat hallucinations, DataGemma uses two novel approaches: Retrieval-Interleaved Generation (RIG) and Retrieval-Augmented Generation (RAG). These techniques help the model pull accurate, relevant information from Data Commons to minimize hallucinations. Early research shows promising improvements in factual accuracy, making these models more reliable for users seeking trustworthy AI-driven insights.

“AI hallucinations are incorrect or misleading results that AI models generate. These errors can be caused by a variety of factors, including insufficient training data, incorrect assumptions made by the model, or biases in the data used to train the model.”

-Google Cloud

What is Data Commons? Data Commons is a publicly available knowledge graph containing over 240 billion rich data points across hundreds of thousands of statistical variables. It sources this public information from trusted organizations like the United Nations (UN), the World Health Organization (WHO), Centers for Disease Control and Prevention (CDC), and Census Bureaus.

TASTE-TEST THURSDAY 🍽️

Do you think the government should get more involved or less involved in AI?

(Leave a comment explaining your answer, and we might feature it tomorrow with the results!)

HAS AI REACHED SINGULARITY? CHECK OUT THE FRY METER BELOW:

The Singularity Meter Rises 1.0%: AI helps find missing girl