ai/qwen3.6-safetensors

Verified Publisher

By Docker

Updated 17 days ago

Multimodal LLM with 35B parameters for coding, agentic tasks, and vision-language understanding

Model
0

4.2K

ai/qwen3.6-safetensors repository overview

Qwen3.6-35B-A3B

Qwen3.6-35B-A3B is a multimodal large language model developed by Qwen (Alibaba Cloud) that combines vision and language understanding with advanced reasoning capabilities. Built on direct feedback from the community, this model prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience.

Following the February 2025 release of the Qwen3.5 series, Qwen3.6 represents the first open-weight variant with substantial upgrades in agentic coding and thinking preservation. The model now handles frontend workflows and repository-level reasoning with greater fluency and precision. A key innovation is the introduction of a new option to retain reasoning context from historical messages, streamlining iterative development and reducing overhead.

With 35 billion total parameters and 3 billion activated parameters through its Mixture of Experts architecture, Qwen3.6-35B-A3B delivers state-of-the-art performance across coding benchmarks, agent tasks, multimodal understanding, and general reasoning while maintaining efficient inference characteristics.


Characteristics

AttributeValue
ProviderQwen (Alibaba Cloud)
ArchitectureQwen3_5MoeForConditionalGeneration (Mixture of Experts)
LanguagesEnglish, Chinese, and multilingual
Input modalitiesText, Image, Video
Output modalitiesText
LicenseApache 2.0
Context Length262,144 tokens (natively), extensible to 1,010,000 tokens
Parameters35B total, 3B activated

Using this model with Docker Model Runner

docker model run qwen3.6-safetensors

For more information, check out the Docker Model Runner docs.

Benchmarks

Benchmark Overview

Coding Agent Benchmarks
BenchmarkQwen3.5-27BGemma4-31BQwen3.5-35BA3BGemma4-26BA4BQwen3.6-35BA3B
SWE-bench Verified75.052.070.017.473.4
SWE-bench Multilingual69.351.760.317.367.2
SWE-bench Pro51.235.744.613.849.5
Terminal-Bench 2.041.642.940.534.251.5
Claw-Eval (Avg)64.348.565.458.868.7
Claw-Eval (Pass^3)46.225.051.028.050.0
SkillsBench (Avg5)27.223.64.412.328.7
QwenClawBench52.241.747.738.752.6
NL2Repo27.315.520.511.629.4
QwenWebBench1068119797811781397
General Agent Benchmarks
BenchmarkQwen3.5-27BGemma4-31BQwen3.5-35BA3BGemma4-26BA4BQwen3.6-35BA3B
TAU3-Bench68.467.568.959.067.2
VITA-Bench41.843.029.136.935.6
DeepPlanning22.624.022.816.225.9
Tool Decathlon31.521.228.712.026.9
MCPMark36.318.127.014.237.0
MCP-Atlas68.457.262.450.062.8
WideSearch66.435.259.138.360.1
Knowledge Benchmarks
BenchmarkQwen3.5-27BGemma4-31BQwen3.5-35BA3BGemma4-26BA4BQwen3.6-35BA3B
MMLU-Pro86.185.285.382.685.2
MMLU-Redux93.293.793.392.793.3
SuperGPQA65.665.763.461.464.7
C-Eval90.582.690.282.590.0
STEM & Reasoning Benchmarks
BenchmarkQwen3.5-27BGemma4-31BQwen3.5-35BA3BGemma4-26BA4BQwen3.6-35BA3B
GPQA85.584.384.282.386.0
HLE24.319.522.48.721.4
LiveCodeBench v680.780.074.677.180.4
HMMT Feb 2592.088.789.091.790.7
HMMT Nov 2589.887.589.287.589.1
HMMT Feb 2684.377.278.779.083.6
IMOAnswerBench79.974.576.874.378.9
AIME 2692.689.291.088.392.7
Vision-Language Benchmarks
BenchmarkQwen3.5-27BClaude-Sonnet-4.5Gemma4-31BGemma4-26BA4BQwen3.5-35B-A3BQwen3.6-35B-A3B
MMMU82.379.680.478.481.481.7
MMMU-Pro75.068.476.973.875.175.3
MathVista (mini)87.879.879.379.486.286.4
ZEROBench_sub36.226.326.026.334.134.4
RealWorldQA83.770.379.375.681.982.1

Model Architecture

Qwen3.6-35B-A3B features an advanced Mixture of Experts (MoE) architecture:

  • Total Parameters: 35B with 3B activated per token
  • Hidden Dimension: 2048
  • Number of Layers: 40
  • Hidden Layout: 10 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE))
  • Gated DeltaNet:
    • Number of Linear Attention Heads: 32 for V and 16 for QK
    • Head Dimension: 128
  • Gated Attention:
    • Number of Attention Heads: 16 for Q and 2 for KV
    • Head Dimension: 256
    • Rotary Position Embedding Dimension: 64
  • Mixture of Experts:
    • Number of Experts: 256
    • Number of Activated Experts: 8 Routed + 1 Shared
    • Expert Intermediate Dimension: 512
  • Token Embedding: 248,320 (Padded)
  • Context Length: 262,144 tokens natively, extensible to 1,010,000 tokens
  • Multi-Token Prediction (MTP): Trained with multi-step prediction

Key Features

Agentic Coding

Qwen3.6 excels at handling frontend workflows and repository-level reasoning with greater fluency and precision. The model demonstrates state-of-the-art performance on coding agent benchmarks including SWE-bench, Terminal-Bench, and various frontend development tasks.

Thinking Preservation

A new option to retain reasoning context from historical messages enables more coherent multi-turn interactions, streamlining iterative development and reducing computational overhead.

Multimodal Understanding

Native support for image and video inputs alongside text, enabling comprehensive visual understanding tasks including document processing, chart analysis, and visual question answering.

Extended Context

With native support for 262K tokens and extensibility to over 1 million tokens, the model can process entire codebases, long documents, and complex multi-turn conversations.

Considerations

  • The model is optimized for coding and agent tasks, particularly excelling in repository-level operations and frontend development
  • Multimodal capabilities enable processing of images and videos alongside text inputs
  • Best performance is achieved with appropriate temperature settings (temp=1.0, top_p=0.95 for agent tasks; temp=0.6 for deterministic coding tasks)
  • The Mixture of Experts architecture provides efficient inference by activating only 3B of the 35B parameters per token
  • Extended context support (up to 1M tokens) requires appropriate memory allocation and may impact inference speed
  • Tool calling and function execution capabilities are built-in, making it well-suited for agent applications
Generated by

This model card was automatically generated using cagent-action. Want to learn more about Docker Model Runner? Check out the project repository: https://github.com/docker/model-runner.

Tag summary

Content type

Model

Digest

sha256:16c6b3d4c

Size

67 GB

Last updated

17 days ago

docker model pull ai/qwen3.6-safetensors:35B-A3B

This week's pulls

Pulls:

1,639

Last week