🎉 Kimi K2 Thinking now available on Hugging Face

Kimi K2 Thinking: Open Agentic Intelligence

Experience a trillion-parameter Mixture-of-Experts model built for deep reasoning, long-horizon tool use, and native INT4 efficiency. Deploy Kimi K2 Thinking with 256K context and state-of-the-art benchmark scores.

âš¡ Native INT4 quantization for faster, lighter deployments

Why Teams Choose Kimi K2 Thinking

Unlock deep reasoning, long-context planning, and native INT4 speed with the latest Moonshot AI model.

Deep Thinking & Tool Orchestration

End-to-end trained to interleave chain-of-thought reasoning with autonomous tool calling across hundreds of steps without drift.

Native INT4 Quantization

Quantization-aware training delivers lossless INT4 latency gains, reducing GPU memory while keeping top-tier accuracy.

Stable Long-Horizon Agency

Maintains coherent, goal-driven behavior across 200–300 tool invocations for research, coding, and automation workflows.

Benchmark-Proven Reasoning

Sets new highs on HLE, AIME25, BrowseComp, and agentic search benchmarks with consistent multi-step performance.

Open Tooling Ecosystem

Runs on vLLM, SGLang, and KTransformers with Hugging Face distribution and Moonshot-compatible APIs.

Enterprise-Friendly License

Released under Modified MIT, enabling commercial deployments with clear third-party notices and support channel.

Kimi K2 Thinking FAQ

Have another question? Reach out to the Moonshot AI team for deployment guidance.