On April 10, the ByteDance Doubao Big Model Team officially opened the first multi-language SWE data set, Multi-Swe-bench, which can be used to evaluate and improve the big model's ability to “automatically fix bugs”. Based on SW-bench, multi-swe-bench covers 7 mainstream programming languages other than Python for the first time, and is truly an evaluation benchmark for “full-stack engineering.” The data is all from GitHub issues, and it took nearly a year to build to measure and improve the high-level programming intelligence level of large models as accurately as possible.

Zhitongcaijing · 04/10/2025 06:33
On April 10, the ByteDance Doubao Big Model Team officially opened the first multi-language SWE data set, Multi-Swe-bench, which can be used to evaluate and improve the big model's ability to “automatically fix bugs”. Based on SW-bench, multi-swe-bench covers 7 mainstream programming languages other than Python for the first time, and is truly an evaluation benchmark for “full-stack engineering.” The data is all from GitHub issues, and it took nearly a year to build to measure and improve the high-level programming intelligence level of large models as accurately as possible.