Introducing GPT‑5 (by OpenAI)

By twidi

Introducing GPT‑5 (by OpenAI)

Introducing GPT‑5 (by OpenAI)
- Architecture
  
  Architecture
  - Smart, efficient model
    
    Smart, efficient model
    - Typical use cases
      
      Typical use cases
      - Fast everyday question answering
        
        Fast everyday question answering
      - Real-world tasks with limited complexity
        
        Real-world tasks with limited complexity
      - Responsive interactions without extended reasoning
        
        Responsive interactions without extended reasoning
  - Deeper reasoning model
    
    Deeper reasoning model
    - Benefits
      
      Benefits
      - Solves harder problems
        
        Solves harder problems
      - Produces comprehensive, accurate answers
        
        Produces comprehensive, accurate answers
      - Handles complex, evolving tasks with better context awareness
        
        Handles complex, evolving tasks with better context awareness
  - Real-time router
    
    Real-time router
    - Parameters considered
      
      Parameters considered
      - Conversation type
        
        Conversation type
      - Complexity
        
        Complexity
      - Tool needs
        
        Tool needs
      - Explicit user intent directives
        
        Explicit user intent directives
    - Continuous improvement factors
      
      Continuous improvement factors
      - User model-switch behavior
        
        User model-switch behavior
      - Preference ratings on responses
        
        Preference ratings on responses
      - Measured correctness rates
        
        Measured correctness rates
- Evaluations
  
  Evaluations
  - Math
    
    Math
    - AIME 2025 (no tools): 94.6%
      
      AIME 2025 (no tools): 94.6%
  - Coding
    
    Coding
    - SWE-bench Verified: 74.9%
      
      SWE-bench Verified: 74.9%
    - Aider Polyglot: 88%
      
      Aider Polyglot: 88%
  - Multimodal
    
    Multimodal
    - MMMU: 84.2%
      
      MMMU: 84.2%
  - Health
    
    Health
    - HealthBench Hard: 46.2%
      
      HealthBench Hard: 46.2%
  - Reasoning
    
    Reasoning
    - GPQA: 88.4% (Pro mode)
      
      GPQA: 88.4% (Pro mode)
- Robustness
  
  Robustness
  - Safe completions
    
    Safe completions
    - Methodology
      
      Methodology
      - Output-centric safety focusing on safe and helpful responses
        
        Output-centric safety focusing on safe and helpful responses
      - Combines helpfulness maximization with safety constraint penalties
        
        Combines helpfulness maximization with safety constraint penalties
    - Metrics
      
      Metrics
      - Higher safety & helpfulness across prompt intent types
        
        Higher safety & helpfulness across prompt intent types
    - Examples safe vs unsafe completions
      
      Examples safe vs unsafe completions
      - Firework ignition request: GPT-5 offers high-level safe guidance; o3 provided unsafe specifics
        
        Firework ignition request: GPT-5 offers high-level safe guidance; o3 provided unsafe specifics
  - Honesty
    
    Honesty
    - Deception rate improvements
      
      Deception rate improvements
      - Reduced from 4.8% (o3) to 2.1% (GPT‑5 thinking)
        
        Reduced from 4.8% (o3) to 2.1% (GPT‑5 thinking)
  - Sycophancy
    
    Sycophancy
    - Evaluation results
      
      Evaluation results
      - Reduced sycophantic replies from 14.5% to <6%
        
        Reduced sycophantic replies from 14.5% to <6%
  - Biological risk
    
    Biological risk
    - Safeguards
      
      Safeguards
      - High capability classification in Biological/Chem domain
        
        High capability classification in Biological/Chem domain
      - 5,000 hours of red-teaming with global agencies
        
        5,000 hours of red-teaming with global agencies
      - Multi-layered defense system and classifiers
        
        Multi-layered defense system and classifiers
- Usage
  
  Usage
  - ChatGPT integration
    
    ChatGPT integration
  - API integration
    
    API integration
- Related content
  
  Related content
  - Introducing GPT‑5 for developers
    
    [Introducing GPT‑5 for developers](https://openai.com/index/introducing-gpt-5-for-developers/)
    - API features
      
      API features
      - Supports verbosity and reasoning_effort parameters
        
        Supports `verbosity` and `reasoning_effort` parameters
      - Custom tools with plaintext/grammar constraints
        
        Custom tools with plaintext/grammar constraints
      - Parallel and built-in tool calling
        
        Parallel and built-in tool calling
    - Benchmarks
      
      Benchmarks
      - SWE-bench Verified: 74.9%
        
        SWE-bench Verified: 74.9%
      - Aider Polyglot: 88%
        
        Aider Polyglot: 88%
      - τ2‑bench telecom: 96.7%
        
        τ2‑bench telecom: 96.7%
    - Feedback
      
      Feedback
      - Cursor: "Remarkably intelligent, easy to steer"
        
        Cursor: "Remarkably intelligent, easy to steer"
      - Windsurf: Half tool calling error rate
        
        Windsurf: Half tool calling error rate
      - Vercel: Best frontend AI model
        
        Vercel: Best frontend AI model
    - Pricing
      
      Pricing
      - GPT‑5: $1.25/1M input, $10/1M output
        
        GPT‑5: $1.25/1M input, $10/1M output
      - GPT‑5 mini: $0.25/$2
        
        GPT‑5 mini: $0.25/$2
      - GPT‑5 nano: $0.05/$0.40
        
        GPT‑5 nano: $0.05/$0.40
  - GPT‑5 and the new era of work
    
    [GPT‑5 and the new era of work](https://openai.com/index/gpt-5-new-era-of-work/)
    - Industry adoption
      
      Industry adoption
      - Used by BNY Mellon, CSU, Figma, Intercom, Lowe’s, Morgan Stanley, SoftBank, T‑Mobile
        
        Used by BNY Mellon, CSU, Figma, Intercom, Lowe’s, Morgan Stanley, SoftBank, T‑Mobile
    - Case studies
      
      Case studies
      - Amgen: Improved precision, reliability, and speed in scientific contexts
        
        Amgen: Improved precision, reliability, and speed in scientific contexts
    - Benefits
      
      Benefits
      - Improved decision-making
        
        Improved decision-making
      - Enhanced collaboration
        
        Enhanced collaboration
      - Faster results in high-stakes business tasks
        
        Faster results in high-stakes business tasks
    - Rollout timelines
      
      Rollout timelines
      - Team: Immediate
        
        Team: Immediate
      - Edu and Enterprise: Next week
        
        Edu and Enterprise: Next week
      - API: Immediate
        
        API: Immediate
  - From hard refusals to safe completions
    
    [From hard refusals to safe completions](https://openai.com/index/gpt-5-safe-completions/)
    - Dual-use scenarios
      
      Dual-use scenarios
      - Firework ignition instruction as example
        
        Firework ignition instruction as example
      - Balances safety with helpfulness
        
        Balances safety with helpfulness
    - Comparison to refusal training
      
      Comparison to refusal training
      - Refusal = binary comply/deny
        
        Refusal = binary comply/deny
      - Safe completion = safe, helpful guidance
        
        Safe completion = safe, helpful guidance
    - Safety architecture diagrams
      
      Safety architecture diagrams
      - Input analysis
        
        Input analysis
      - Behavior shaping
        
        Behavior shaping
      - Filtering
        
        Filtering
      - Oversight
        
        Oversight
    - Results & metrics
      
      Results & metrics
      - Higher safety and helpfulness vs refusal-trained models
        
        Higher safety and helpfulness vs refusal-trained models
      - Reduced severity of unsafe mistakes
        
        Reduced severity of unsafe mistakes
- Overview
  
  Overview
  - Release date: August 7, 2025
    
    **Release date**: August 7, 2025
  - https://openai.com/index/introducing-gpt-5/
    
    https://openai.com/index/introducing-gpt-5/
  - Our smartest, fastest, most useful model yet with built-in thinking that puts expert-level intelligence in everyone's hands
    
    Our smartest, fastest, most useful model yet with built-in thinking that puts expert-level intelligence in everyone's hands
  - Goals
    
    Goals
    - Significant leap in intelligence over all previous models
      
      Significant leap in intelligence over all previous models
    - Unify fast and deep reasoning capabilities
      
      Unify fast and deep reasoning capabilities
    - Outperform across coding, math, writing, health, and visual perception
      
      Outperform across coding, math, writing, health, and visual perception
  - Key innovations
    
    Key innovations
    - Unified system with smart, deep reasoning, and real-time router
      
      Unified system with smart, deep reasoning, and real-time router
    - Reduced hallucinations and minimized sycophancy
      
      Reduced hallucinations and minimized sycophancy
    - Enhanced writing, coding, and health responses
      
      Enhanced writing, coding, and health responses
- Capabilities
  
  Capabilities
  - Hallucination reduction
    
    Hallucination reduction
  - Instruction following improvement
    
    Instruction following improvement
  - Sycophancy minimization
    
    Sycophancy minimization
  - Writing
    
    Writing
    - Genres supported
      
      Genres supported
      - Poetry
        
        Poetry
      - Formal prose
        
        Formal prose
      - Reports and memos
        
        Reports and memos
      - Creative narratives
        
        Creative narratives
    - Examples (GPT‑4o vs GPT‑5)
      
      Examples (GPT‑4o vs GPT‑5)
      - Kyoto widow poem—GPT‑5 delivers richer imagery and metaphor
        
        Kyoto widow poem—GPT‑5 delivers richer imagery and metaphor
      - GPT‑4o follows predictable rhyme, GPT‑5 shows cultural depth
        
        GPT‑4o follows predictable rhyme, GPT‑5 shows cultural depth
  - Coding
    
    Coding
    - Benchmarks
      
      Benchmarks
      - SWE-bench Verified: 74.9%
        
        SWE-bench Verified: 74.9%
      - Aider Polyglot: 88%
        
        Aider Polyglot: 88%
    - Single-prompt projects
      
      Single-prompt projects
      - Front-end web apps
        
        Front-end web apps
      - Games with aesthetic UI
        
        Games with aesthetic UI
      - Debugging large repos
        
        Debugging large repos
  - Health
    
    Health
    - Benchmarks
      
      Benchmarks
      - HealthBench Hard: 46.2%
        
        HealthBench Hard: 46.2%
    - Use cases
      
      Use cases
      - Informed health advocacy
        
        Informed health advocacy
      - Results explanation
        
        Results explanation
      - Context-aware safety guidance
        
        Context-aware safety guidance
- Efficiency
  
  Efficiency
  - Metrics
    
    Metrics
    - 50-80% fewer output tokens vs OpenAI o3 for similar or better quality
      
      50-80% fewer output tokens vs OpenAI o3 for similar or better quality
    - Faster responses at same or higher complexity
      
      Faster responses at same or higher complexity
- GPT‑5 Pro
  
  GPT‑5 Pro
  - Domains with excellence
    
    Domains with excellence
    - Science
      
      Science
    - Mathematics
      
      Mathematics
    - Coding
      
      Coding
    - Health
      
      Health
  - Example benchmark scores
    
    Example benchmark scores
    - GPQA: 88.4%
      
      GPQA: 88.4%
    - Major error reduction by 22% vs GPT‑5 thinking
      
      Major error reduction by 22% vs GPT‑5 thinking
- Availability
  
  Availability
  - Free: Limited GPT‑5 usage, then GPT‑5 mini fallback
    
    Free: Limited GPT‑5 usage, then GPT‑5 mini fallback
  - Plus: Higher usage limits, GPT‑5 default
    
    Plus: Higher usage limits, GPT‑5 default
  - Pro: Unlimited GPT‑5 and GPT‑5 Pro access
    
    Pro: Unlimited GPT‑5 and GPT‑5 Pro access
  - Team: Default GPT‑5 with generous usage
    
    Team: Default GPT‑5 with generous usage
  - Enterprise: Full reasoning features, rollout next week
    
    Enterprise: Full reasoning features, rollout next week
  - Edu: Full reasoning features, rollout next week
    
    Edu: Full reasoning features, rollout next week

Connecting to server to enable editing...