ai/glm-5-safetensors

Verified Publisher

By Docker

•Updated 2 months ago

744B MoE language model with 40B active params for reasoning, coding, and agentic tasks (FP8)

Model

10K+

Overview Tags

ai/glm-5-safetensors repository overview

⁠GLM-5

GLM-5 is a large-scale Mixture-of-Experts (MoE) language model designed for complex systems engineering and long-horizon agentic tasks. Developed by Z.ai, this model represents a significant advancement in scaling and efficiency, featuring 744B total parameters with 40B active parameters during inference. The model integrates DeepSeek Sparse Attention (DSA) to reduce deployment costs while maintaining exceptional long-context capabilities.

GLM-5 was trained on 28.5 trillion tokens and leverages an innovative asynchronous reinforcement learning infrastructure called slime to bridge the gap between competence and excellence in pre-trained models. The model delivers state-of-the-art performance among open-source models on reasoning, coding, and agentic tasks, achieving results competitive with leading frontier models across a wide range of academic benchmarks.

This FP8-quantized version provides an optimized deployment option, maintaining model quality while significantly reducing memory requirements and computational costs for practical applications.

⁠Characteristics

Attribute	Value
Provider	Z.ai
Architecture	GlmMoeDsaForCausalLM (MoE with DeepSeek Sparse Attention)
Total Parameters	744B (40B active)
Training Data	28.5T tokens
Languages	English, Chinese
Input modalities	Text
Output modalities	Text
Context Length	128K tokens (up to 202K with tools)
License	MIT
Quantization	FP8

⁠Using this model with Docker Model Runner

docker model run gml-5-safetensors

For more information, check out the Docker Model Runner docs⁠.

⁠Benchmarks

GLM-5 demonstrates exceptional performance across reasoning, coding, and agentic tasks, achieving best-in-class results among open-source models:

⁠Reasoning & Mathematical Tasks

Benchmark	GLM-5	GLM-4.7	DeepSeek-V3.2	Kimi K2.5	Claude Opus 4.5	Gemini 3 Pro	GPT-5.2 (xhigh)
HLE	30.5	24.8	25.1	31.5	28.4	37.2	35.4
HLE (w/ Tools)	50.4	42.8	40.8	51.8	43.4	45.8	45.5
AIME 2026 I	92.7	92.9	92.7	92.5	93.3	90.6	-
HMMT Nov. 2025	96.9	93.5	90.2	91.1	91.7	93.0	97.1
IMOAnswerBench	82.5	82.0	78.3	81.8	78.5	83.3	86.3
GPQA-Diamond	86.0	85.7	82.4	87.6	87.0	91.9	92.4

⁠Coding Benchmarks

Benchmark	GLM-5	GLM-4.7	DeepSeek-V3.2	Kimi K2.5	Claude Opus 4.5	Gemini 3 Pro	GPT-5.2 (xhigh)
SWE-bench Verified	77.8	73.8	73.1	76.8	80.9	76.2	80.0
SWE-bench Multilingual	73.3	66.7	70.2	73.0	77.5	65.0	72.0

⁠Agentic Tasks

Benchmark	GLM-5	GLM-4.7	DeepSeek-V3.2	Kimi K2.5	Claude Opus 4.5	Gemini 3 Pro	GPT-5.2 (xhigh)
Terminal-Bench 2.0 (Terminus 2)	56.2 / 60.7	41.0	39.3	50.8	59.3	54.2	54.0
Terminal-Bench 2.0 (Claude Code)	56.2 / 61.1	32.8	46.4	-	57.9	-	-
CyberGym	43.2	23.5	17.3	41.3	50.6	39.9	-
BrowseComp	62.0	52.0	51.4	60.6	37.0	37.8	-
BrowseComp (w/ Context Manage)	75.9	67.5	67.6	74.9	67.8	59.2	65.8
BrowseComp-Zh	72.7	66.6	65.0	62.3	62.4	66.8	76.1
τ²-Bench	89.7	87.4	85.3	80.2	91.6	90.7	85.5
MCP-Atlas (Public Set)	67.8	52.0	62.2	63.8	65.2	66.6	68.0
Tool-Decathlon	38.0	23.8	35.2	27.8	43.5	36.4	46.3
Vending Bench 2	$4,432.12	$2,376.82	$1,034.00	$1,198.46	$4,967.06	$5,478.16	$3,591.33

⁠Links

⁠Considerations

Resource Requirements: While FP8 quantization significantly reduces memory footprint, the model still requires substantial computational resources (8 GPUs recommended for inference with tensor parallelism)
Language Focus: The model is optimized primarily for English and Chinese; performance on other languages may be limited
Long-Context Optimization: Best performance is achieved with tasks that can leverage the model's extensive context window capabilities
Agentic Use Cases: The model is specifically designed for complex, long-horizon agentic tasks and may be over-engineered for simple text generation use cases
Tool Integration: For optimal performance on complex tasks, consider using the model with tool-calling capabilities enabled

⁠Generated by

This model card was automatically generated using cagent-action⁠. Want to learn more about Docker Model Runner? Check out the project repository: https://github.com/docker/model-runner⁠.

Tag summary

Recent tags

Content type

Model

Digest

sha256:f606751ff…

Size

704.3 GB

Last updated

2 months ago

docker model pull ai/glm-5-safetensors

This week's pulls

Pulls:

292

Last week

Learn more⁠