LLM Artificial Analysis Index >08.2025<

LLM  Artificial Analysis Index >08.2025<
Photo by Igor Omilaev / Unsplash

👉https://artificialanalysis.ai

🏢 OpenAI (USA)

  • GPT‑4o: OpenAI’s flagship multimodal model with ~1.8 trillion parameters and ~128K token context window. High performance, fast inference, and multimedia support (explodingtopics.com, codingscape.com).
  • GPT‑4.1: A follow-up to GPT‑4 (released April 2025) with a 1 million token context window, optimized for reasoning, vision, and coding workloads (turn0search1, turn0search3).
  • GPT‑5 (in development): Expected release imminently (as of August 6, 2025), promising enhanced reasoning and “test‑time compute” capabilities (turn0news20, turn0search16).
  • GPT‑OSS‑120B & GPT‑OSS‑20B: Open‑weight models released August 5, 2025. The 20B model runs on consumer laptops (~16 GB VRAM); the 120B model requires a single GPU. Both perform comparably to OpenAI’s proprietary o3‑mini and o4‑mini models (turn0news25, turn0news24, turn0news23).

🌐 Google DeepMind / Google

  • Gemini 2.5 Pro: Released June 2025 with ~1 million token context window, full multimodal support (text, image, video, audio), top-tier benchmark performance (turn0search1, turn0search3, turn0search8).
  • Gemini 2.5 Flash / Flash‑Lite: Cost- and latency-optimized variants with same massive context window and multimodal capabilities (turn0search3, turn0search8).
  • Gemma 3 / Gemma 3n (open): Google’s open models (1B to 27B parameters), multilingual and multimodal, designed for on-device deployment (turn0search35, turn0search14).

🤖 Anthropic

  • Claude 3.7 Sonnet and Claude Opus 4 (released May 2025): Proprietary, multimodal, 200K token context window, focused on safety, chat-based conversational and reasoning tasks (turn0search4, turn0search3, turn0search16).

🏭 DeepSeek‑AI (China)

  • DeepSeek‑V3 and DeepSeek‑R1 (R1‑0528): Open‑source (open‑weight), ~671B parameters with ~37B active per token, 128K token context window. Excellent on coding, math, reasoning benchmarks; V3 released Dec 2024, R1 in May 2025 (turn0search36, turn0search3).

🇴🇲 Technology Innovation Institute (UAE)

  • Falcon 180B: Open‑weight 180B parameter model, Apache‑2.0 licensed, strong multilingual and coding benchmarks, widely used in open deployments (turn0search37, turn0search3).

☁️ Alibaba Cloud

  • Qwen 3‑235B: Released April–July 2025, open‑weight (Apache‑2.0), both dense and MoE variants (~22B active parameters), strong performance in Chinese/English and reasoning tasks (turn0search38, turn0search3).
  • Other Qwen models include Qwen 2.5 (dense and multimodal), up to ~72B parameters (turn0search36, [turn0search38]).

🔬 Mistral AI (France)

  • Mistral Large 2: Open‑weight ~123B parameter model with 128K token context window. Known for coding prowess and efficiency. Released late 2024, research license available (turn0search37).
  • Pixtral Large: Multimodal variant (124B + image encoder), research license (turn0search37).

🧪 Other Notable Models

  • Nemotron‑4 (NVIDIA): ~340B parameters, open‑weight under NVIDIA license; enterprise focus, less widely adopted than Falcon or LLaMA (turn0search14).
  • Gemma 3 family (Google open): Highly popular small‑to‑medium models, multilingual multimodal performance, deployed widely via Hugging Face (turn0search35).
  • MiniMax‑Text‑01 (Minimax): ~456B open‑source model released in early 2025 (turn0search36).

✅ Summary Table

Company / Model

Proprietary or Open‑Weight/Source

Approx. Params

Context Window

Highlights

OpenAI – GPT‑4o

Proprietary

~1.8T

~128K

Multimodal general-purpose flagship

OpenAI – GPT‑4.1

Proprietary

Not disclosed

~1M tokens

Reasoning and vision across domains

OpenAI – GPT‑OSS‑120B/20B

Open‑Weight (Apache 2.0)

120B / 20B

~200K

First open‑weight models from OpenAI since 2019

Google – Gemini 2.5 Pro

Proprietary

Not disclosed

~1M tokens

Multimodal, extreme context, reasoning leader

Google – Gemma 3 / 3n

Open‑Source

1–27B

~128K

Lightweight, multilingual, on-device capable

Anthropic – Claude Opus/3.7

Proprietary

Not disclosed

~200K

Safety-first, complex reasoning and chat-oriented

DeepSeek – V3 / R1

Open‑Weight / Open Source

671B (37B active)

~128K

Strong benchmarks, efficient sparse architecture

Alibaba – Qwen 3‑235B

Open‑Weight (Apache 2.0)

235B / ~22B active

~262K

Multilingual, large context, Chinese/English

TII – Falcon 180B

Open‑Weight (Apache 2.0)

180B

~2K

Efficient, widely adopted open model

Mistral – Large 2 / Pixtral

Open‑Weight / Research License

123–124B

~128K

Coding optimization, multimodal via Pixtral

NVIDIA – Nemotron‑4

Open‑Weight / NVIDIA license

~340B

~8K

Enterprise-grade open model

Minimax – MiniMax‑Text‑01

Open‑Source

456B

~4M tokens

Large open-text model with unique context length


🌟 Takeaways

  • Most powerful proprietary: OpenAI GPT‑4o, GPT‑4.1, Gemini 2.5 Pro, Claude Opus 4.
  • Most significant open source / open‑weight: DeepSeek V3/R1, Qwen 3‑235B, Mistral Large 2, Falcon 180B, Nemotron‑4, and Gemma 3.
  • Highest context windows: GPT‑4.1 and Gemini 2.5 Pro (~1M tokens).
  • Accessibility: GPT‑OSS models and open‑weight releases by DeepSeek, Qwen, Mistral, and Mistral’s Pixtral offer full-download and fine-tuning capability.

BEST OPEN SOURCE

Model

Company

~Params (Active)

License

Why It Stands Out

DeepSeek‑V3 / R1

DeepSeek‑AI

671B (37B active)

MIT (fully open)

Top benchmark scores; ultra-efficient; widely replicable (GitHub, Financial Times)

LLaMA 4

Meta AI

~405B (dense)

Meta License

Broad community use, flexible deployment (Instaclustr, Wikipedia)

Falcon 180B

TII (UAE)

180B (dense)

Apache‑2.0

Strong multilingual capabilities, open access (Bestarion, Reddit)

Qwen 3‑235B

Alibaba Cloud

235B (dense) / MoE

Apache‑2.0

Competitive in multi-language tasks; commercial‑grade quality (Exploding Topics, Bestarion)

DBRX

Databricks / Mosaic ML

~unknown large (MoE)

Open Model License

Excellent reasoning/math performance benchmarks (Wikipedia, Exploding Topics)

Nemotron‑4

NVIDIA

~340B

NVIDIA license

Open release by major tech company; still niche adoption (Wikipedia, Bestarion)

BIGGEST

Rank

Model

Params (approx.)

License

1

PanGu‑Σ

~1.085 T

Proprietary

2

GPT‑4o

~1.8 T (est.)

Proprietary (API)

3

DeepSeek‑V3 / R1

671 B (37 B active)

Open source

4

Qwen 3

235 B

Open source

5

Llama 4

up to 405 B

Open source

6

GLM‑4.5

~355 B

Proprietary

7

Grok 3

~314 B

Proprietary

8

Mistral Large 2

~123 B

Open / research

9

gpt‑oss‑120B

117 B

Open (Apache 2.0)

…

Others…

Various larger / smaller

Mixed