WiseMap.ai
Public maps
-
Introducing GPT‑5 (by OpenAI)
Introducing GPT‑5 (by OpenAI)-
Architecture
Architecture-
Smart, efficient model
Smart, efficient model-
Typical use cases
Typical use cases-
Fast everyday question answering
Fast everyday question answering -
Real-world tasks with limited complexity
Real-world tasks with limited complexity -
Responsive interactions without extended reasoning
Responsive interactions without extended reasoning
-
-
-
Deeper reasoning model
Deeper reasoning model-
Benefits
Benefits-
Solves harder problems
Solves harder problems -
Produces comprehensive, accurate answers
Produces comprehensive, accurate answers -
Handles complex, evolving tasks with better context awareness
Handles complex, evolving tasks with better context awareness
-
-
-
Real-time router
Real-time router-
Parameters considered
Parameters considered-
Conversation type
Conversation type -
Complexity
Complexity -
Tool needs
Tool needs -
Explicit user intent directives
Explicit user intent directives
-
-
Continuous improvement factors
Continuous improvement factors-
User model-switch behavior
User model-switch behavior -
Preference ratings on responses
Preference ratings on responses -
Measured correctness rates
Measured correctness rates
-
-
-
-
Evaluations
Evaluations-
Math
Math-
AIME 2025 (no tools): 94.6%
AIME 2025 (no tools): 94.6%
-
-
Coding
Coding-
SWE-bench Verified: 74.9%
SWE-bench Verified: 74.9% -
Aider Polyglot: 88%
Aider Polyglot: 88%
-
-
Multimodal
Multimodal-
MMMU: 84.2%
MMMU: 84.2%
-
-
Health
Health-
HealthBench Hard: 46.2%
HealthBench Hard: 46.2%
-
-
Reasoning
Reasoning-
GPQA: 88.4% (Pro mode)
GPQA: 88.4% (Pro mode)
-
-
-
Robustness
Robustness-
Safe completions
Safe completions-
Methodology
Methodology-
Output-centric safety focusing on safe and helpful responses
Output-centric safety focusing on safe and helpful responses -
Combines helpfulness maximization with safety constraint penalties
Combines helpfulness maximization with safety constraint penalties
-
-
Metrics
Metrics-
Higher safety & helpfulness across prompt intent types
Higher safety & helpfulness across prompt intent types
-
-
Examples safe vs unsafe completions
Examples safe vs unsafe completions-
Firework ignition request: GPT-5 offers high-level safe guidance; o3 provided unsafe specifics
Firework ignition request: GPT-5 offers high-level safe guidance; o3 provided unsafe specifics
-
-
-
Honesty
Honesty-
Deception rate improvements
Deception rate improvements-
Reduced from 4.8% (o3) to 2.1% (GPT‑5 thinking)
Reduced from 4.8% (o3) to 2.1% (GPT‑5 thinking)
-
-
-
Sycophancy
Sycophancy-
Evaluation results
Evaluation results-
Reduced sycophantic replies from 14.5% to <6%
Reduced sycophantic replies from 14.5% to <6%
-
-
-
Biological risk
Biological risk-
Safeguards
Safeguards-
High capability classification in Biological/Chem domain
High capability classification in Biological/Chem domain -
5,000 hours of red-teaming with global agencies
5,000 hours of red-teaming with global agencies -
Multi-layered defense system and classifiers
Multi-layered defense system and classifiers
-
-
-
-
Usage
Usage-
ChatGPT integration
ChatGPT integration -
API integration
API integration
-
-
Related content
Related content-
[Introducing GPT‑5 for developers](https://openai.com/index/introducing-gpt-5-for-developers/)
-
API features
API features-
Supports
verbosityandreasoning_effortparametersSupports `verbosity` and `reasoning_effort` parameters -
Custom tools with plaintext/grammar constraints
Custom tools with plaintext/grammar constraints -
Parallel and built-in tool calling
Parallel and built-in tool calling
-
-
Benchmarks
Benchmarks-
SWE-bench Verified: 74.9%
SWE-bench Verified: 74.9% -
Aider Polyglot: 88%
Aider Polyglot: 88% -
τ2‑bench telecom: 96.7%
τ2‑bench telecom: 96.7%
-
-
Feedback
Feedback-
Cursor: "Remarkably intelligent, easy to steer"
Cursor: "Remarkably intelligent, easy to steer" -
Windsurf: Half tool calling error rate
Windsurf: Half tool calling error rate -
Vercel: Best frontend AI model
Vercel: Best frontend AI model
-
-
Pricing
Pricing-
GPT‑5: $1.25/1M input, $10/1M output
GPT‑5: $1.25/1M input, $10/1M output -
GPT‑5 mini: $0.25/$2
GPT‑5 mini: $0.25/$2 -
GPT‑5 nano: $0.05/$0.40
GPT‑5 nano: $0.05/$0.40
-
-
-
[GPT‑5 and the new era of work](https://openai.com/index/gpt-5-new-era-of-work/)
-
Industry adoption
Industry adoption-
Used by BNY Mellon, CSU, Figma, Intercom, Lowe’s, Morgan Stanley, SoftBank, T‑Mobile
Used by BNY Mellon, CSU, Figma, Intercom, Lowe’s, Morgan Stanley, SoftBank, T‑Mobile
-
-
Case studies
Case studies-
Amgen: Improved precision, reliability, and speed in scientific contexts
Amgen: Improved precision, reliability, and speed in scientific contexts
-
-
Benefits
Benefits-
Improved decision-making
Improved decision-making -
Enhanced collaboration
Enhanced collaboration -
Faster results in high-stakes business tasks
Faster results in high-stakes business tasks
-
-
Rollout timelines
Rollout timelines-
Team: Immediate
Team: Immediate -
Edu and Enterprise: Next week
Edu and Enterprise: Next week -
API: Immediate
API: Immediate
-
-
-
[From hard refusals to safe completions](https://openai.com/index/gpt-5-safe-completions/)
-
Dual-use scenarios
Dual-use scenarios-
Firework ignition instruction as example
Firework ignition instruction as example -
Balances safety with helpfulness
Balances safety with helpfulness
-
-
Comparison to refusal training
Comparison to refusal training-
Refusal = binary comply/deny
Refusal = binary comply/deny -
Safe completion = safe, helpful guidance
Safe completion = safe, helpful guidance
-
-
Safety architecture diagrams
Safety architecture diagrams-
Input analysis
Input analysis -
Behavior shaping
Behavior shaping -
Filtering
Filtering -
Oversight
Oversight
-
-
Results & metrics
Results & metrics-
Higher safety and helpfulness vs refusal-trained models
Higher safety and helpfulness vs refusal-trained models -
Reduced severity of unsafe mistakes
Reduced severity of unsafe mistakes
-
-
-
-
Overview
Overview-
Release date: August 7, 2025
**Release date**: August 7, 2025 -
https://openai.com/index/introducing-gpt-5/
-
Our smartest, fastest, most useful model yet with built-in thinking that puts expert-level intelligence in everyone's hands
Our smartest, fastest, most useful model yet with built-in thinking that puts expert-level intelligence in everyone's hands -
Goals
Goals-
Significant leap in intelligence over all previous models
Significant leap in intelligence over all previous models -
Unify fast and deep reasoning capabilities
Unify fast and deep reasoning capabilities -
Outperform across coding, math, writing, health, and visual perception
Outperform across coding, math, writing, health, and visual perception
-
-
Key innovations
Key innovations-
Unified system with smart, deep reasoning, and real-time router
Unified system with smart, deep reasoning, and real-time router -
Reduced hallucinations and minimized sycophancy
Reduced hallucinations and minimized sycophancy -
Enhanced writing, coding, and health responses
Enhanced writing, coding, and health responses
-
-
-
Capabilities
Capabilities-
Hallucination reduction
Hallucination reduction -
Instruction following improvement
Instruction following improvement -
Sycophancy minimization
Sycophancy minimization -
Writing
Writing-
Genres supported
Genres supported-
Poetry
Poetry -
Formal prose
Formal prose -
Reports and memos
Reports and memos -
Creative narratives
Creative narratives
-
-
Examples (GPT‑4o vs GPT‑5)
Examples (GPT‑4o vs GPT‑5)-
Kyoto widow poem—GPT‑5 delivers richer imagery and metaphor
Kyoto widow poem—GPT‑5 delivers richer imagery and metaphor -
GPT‑4o follows predictable rhyme, GPT‑5 shows cultural depth
GPT‑4o follows predictable rhyme, GPT‑5 shows cultural depth
-
-
-
Coding
Coding-
Benchmarks
Benchmarks-
SWE-bench Verified: 74.9%
SWE-bench Verified: 74.9% -
Aider Polyglot: 88%
Aider Polyglot: 88%
-
-
Single-prompt projects
Single-prompt projects-
Front-end web apps
Front-end web apps -
Games with aesthetic UI
Games with aesthetic UI -
Debugging large repos
Debugging large repos
-
-
-
Health
Health-
Benchmarks
Benchmarks-
HealthBench Hard: 46.2%
HealthBench Hard: 46.2%
-
-
Use cases
Use cases-
Informed health advocacy
Informed health advocacy -
Results explanation
Results explanation -
Context-aware safety guidance
Context-aware safety guidance
-
-
-
-
Efficiency
Efficiency-
Metrics
Metrics-
50-80% fewer output tokens vs OpenAI o3 for similar or better quality
50-80% fewer output tokens vs OpenAI o3 for similar or better quality -
Faster responses at same or higher complexity
Faster responses at same or higher complexity
-
-
-
GPT‑5 Pro
GPT‑5 Pro-
Domains with excellence
Domains with excellence-
Science
Science -
Mathematics
Mathematics -
Coding
Coding -
Health
Health
-
-
Example benchmark scores
Example benchmark scores-
GPQA: 88.4%
GPQA: 88.4% -
Major error reduction by 22% vs GPT‑5 thinking
Major error reduction by 22% vs GPT‑5 thinking
-
-
-
Availability
Availability-
Free: Limited GPT‑5 usage, then GPT‑5 mini fallback
Free: Limited GPT‑5 usage, then GPT‑5 mini fallback -
Plus: Higher usage limits, GPT‑5 default
Plus: Higher usage limits, GPT‑5 default -
Pro: Unlimited GPT‑5 and GPT‑5 Pro access
Pro: Unlimited GPT‑5 and GPT‑5 Pro access -
Team: Default GPT‑5 with generous usage
Team: Default GPT‑5 with generous usage -
Enterprise: Full reasoning features, rollout next week
Enterprise: Full reasoning features, rollout next week -
Edu: Full reasoning features, rollout next week
Edu: Full reasoning features, rollout next week
-
-