资讯

This is largely due to the fact that current LLMs often struggle with complex code, multi-step logic, and abstract tasks, frequently exhibiting logical leaps, disorganized steps, and irrelevant ...
In the complex mathematical task benchmark tests, researchers calculated K2 Think's average scores in AIME24, AIME25, HMMT25, ...
On benchmark evaluations, K2 Think leads all other open-source models in competitive math performance. It scored 90.8 on AIME 2024, 81.2 on AIME 2025, and 73.8 on HMMT 2025, according to benchmarks ...