HLE Quiz – Simplified Version
🧠 Can You Beat AI? Test Your Intelligence Against Grok 4, Gemini & More! 🧠

Humanity’s Last Exam

The Ultimate Human vs AI Intelligence Challenge

Think You’re Smarter Than AI? Prove It!

Humanity’s Last Exam (HLE) features expert-level questions spanning mathematics, physics, biology, humanities, computer science, and more. Current AI models achieve modest accuracy: Grok 4 leads at 25.4%, followed by Gemini 2.5 Pro at 21.6% and OpenAI’s o3 at 20.3%.

2,500+
Expert Questions
25.4%
AI Leader (Grok 4)
500+
Institutions
1,000+
Contributors

🎯 Choose Your Challenge Level

📊 How You Compare to Leading AI Models

Model/System Accuracy Notes
🧠 Your Score Human Intelligence
🤖 Grok 4 25.4% Current leader
🤖 Gemini 2.5 Pro 21.6% Google’s flagship
🤖 OpenAI o3 20.3% Latest reasoning model
🤖 Claude 4 Opus 10.7% Anthropic’s flagship
🎲 Random Chance ~20% Pure guessing
HLE Quiz SEO Content

🏆Current AI Leaderboard

See how today’s most advanced AI models perform on the Humanity’s Last Exam benchmark

25.4%
Grok 4 (Leader)
21.6%
Gemini 2.5 Pro
20.3%
OpenAI o3
18.1%
o4-mini
10.7%
Claude 4 Opus
~20%
Random Chance

Updated July 2025 • Source: lastexam.ai

🧠What is HLE?

Humanity’s Last Exam is a comprehensive AI benchmark featuring 2,500+ expert-level questions across 100+ academic subjects. Created by nearly 1,000 global experts from 500+ institutions, it represents the most challenging test for both artificial and human intelligence.

🎯Why Take This Quiz?

Test your intelligence directly against leading AI models like Grok 4, Gemini 2.5 Pro, and OpenAI’s o3. This is your chance to see how human cognition compares to the world’s most advanced artificial intelligence systems across diverse knowledge domains.

🏅Challenge Level

These are PhD-level questions spanning mathematics, physics, biology, computer science, philosophy, linguistics, chess, poker, and specialized trivia. Even subject matter experts find these challenging – any score above 30% is impressive!

Intelligent Selection

Our quiz uses smart algorithms to ensure diverse question selection across all knowledge domains. Whether you choose 3, 5, or 10 questions, you’ll get a balanced mix that truly tests your breadth of knowledge.

🚀 Did You Know? Even the world’s most advanced AI models struggle with these questions. The current leader, Grok 4, achieves only 25.4% accuracy – barely better than random guessing. Can your human intelligence do better?

🤔Frequently Asked Questions

What is Humanity’s Last Exam (HLE)?

HLE is a comprehensive AI benchmark featuring 2,500 expert-level questions across 100+ subjects including mathematics, physics, biology, computer science, philosophy, and more. Created by nearly 1,000 global experts from 500+ institutions, it’s designed to be the most challenging academic test for both AI and humans.

How do current AI models perform on HLE?

Even the most advanced AI models struggle with HLE. The current leader is Grok 4 at 25.4%, followed by Gemini 2.5 Pro at 21.6% and OpenAI’s o3 at 20.3%. These low scores highlight the significant gap between current AI capabilities and expert-level human knowledge.

What makes HLE questions so challenging?

HLE questions are expert-level, require deep understanding across multiple domains, cannot be easily answered through internet searches, and test genuine reasoning rather than memorization. They’re designed by subject matter experts to push the boundaries of both human and artificial intelligence.

How does the quiz ensure question diversity?

Our intelligent selection algorithm ensures you get a balanced mix of questions across different categories including STEM fields, humanities, arts, games, and cultural knowledge. Small quizzes guarantee variety, while larger ones provide proportional distribution across all subject areas.

What’s a good human score on this quiz?

Given that the world’s best AI models only achieve 20-25% accuracy, any score above 30% would be impressive for a human. Remember, these are PhD-level questions across multiple disciplines that even experts in their specific fields find challenging.

Can I use this to test my knowledge against AI?

Absolutely! This quiz gives you a direct comparison with leading AI models like Grok 4, Gemini 2.5 Pro, and OpenAI’s o3. It’s a unique opportunity to see how human intelligence performs on the same benchmark used to evaluate cutting-edge AI systems.

Are the questions really from the official HLE dataset?

Yes, all questions are sourced from the official Humanity’s Last Exam dataset. To improve user experience, some questions have been converted from short-answer format to multiple choice format while preserving the original question content and difficulty. The benchmark results shown are from the official lastexam.ai leaderboard.

How often are the AI model results updated?

We update our AI model comparison table regularly based on the official HLE leaderboard. The current results reflect the state of AI capabilities as of April 2025, with Grok 4 leading at 25.4% accuracy.

Who created this quiz tool?

This interactive quiz was developed by DataClysm, Perth’s leading agentic AI consultancy. We specialize in building autonomous AI systems for mining, logistics, infrastructure, and healthcare enterprises across Western Australia, and created this tool to demonstrate the current capabilities gap between human and artificial intelligence.

Is this quiz suitable for educational purposes?

Yes! Educators and students can use this quiz to explore the current state of AI capabilities, discuss the nature of intelligence, and examine challenging problems across multiple academic disciplines. It’s an excellent tool for AI literacy and critical thinking development.

Can I share my results?

Absolutely! We encourage sharing your results on social media to challenge friends and colleagues. The quiz includes built-in sharing options for X (Twitter), LinkedIn, and Facebook to help spread awareness about the current state of human vs artificial intelligence.

🤖Perth’s Leading Agentic AI Consultancy

DataClysm builds enterprise-grade agentic AI systems that autonomously perceive, reason, and act within complex business environments. We specialize in transforming chaotic data landscapes into intelligent, actionable insights for mining, logistics, infrastructure, and healthcare enterprises across Western Australia.

Unlike traditional analytics firms, DataClysm creates truly autonomous AI agents that integrate with your existing enterprise stacks (Azure/AWS) to deliver continuous value through advanced dashboards, automated reporting, and orchestrated decision support.

🚀Discover Our AI Solutions →

📢Share the Challenge!

Think your friends and colleagues can beat AI? Challenge them to take the Humanity’s Last Exam quiz!

Academic Citation

This quiz is based on the official Humanity’s Last Exam dataset. If you use this tool for research or educational purposes, please cite:

Phan, L., Gatti, A., Han, Z., Li, N., et al. (2025). Humanity’s Last Exam. arXiv preprint arXiv:2501.14249. Available at: https://lastexam.ai

For more information about the benchmark methodology and results, visit the official HLE website at lastexam.ai.

🔬Research Applications

The Humanity’s Last Exam benchmark is being used by researchers worldwide to:

  • Evaluate AI model capabilities across diverse knowledge domains
  • Study the gap between human and artificial intelligence
  • Develop new training methodologies for large language models
  • Assess the current state of artificial general intelligence (AGI)
  • Create educational tools for AI literacy and critical thinking