Experience a trillion-parameter Mixture-of-Experts model built for deep reasoning, long-horizon tool use, and native INT4 efficiency. Deploy Kimi K2 Thinking with 256K context and state-of-the-art benchmark scores.
âš¡ Native INT4 quantization for faster, lighter deployments
Unlock deep reasoning, long-context planning, and native INT4 speed with the latest Moonshot AI model.
End-to-end trained to interleave chain-of-thought reasoning with autonomous tool calling across hundreds of steps without drift.
Quantization-aware training delivers lossless INT4 latency gains, reducing GPU memory while keeping top-tier accuracy.
Maintains coherent, goal-driven behavior across 200–300 tool invocations for research, coding, and automation workflows.
Sets new highs on HLE, AIME25, BrowseComp, and agentic search benchmarks with consistent multi-step performance.
Runs on vLLM, SGLang, and KTransformers with Hugging Face distribution and Moonshot-compatible APIs.
Released under Modified MIT, enabling commercial deployments with clear third-party notices and support channel.
Have another question? Reach out to the Moonshot AI team for deployment guidance.