🏆 Software Engineering (SE) Chatbot Leaderboard: Community-Driven Evaluation of Top SE Chatbots

The SE Arena is an open-source platform designed to evaluate language models through human preference, fostering transparency and collaboration. Developed by researchers at Software Analysis and Intelligence Lab (SAIL), the platform empowers the community to assess and compare the performance of leading foundation models in SE tasks. For technical details, check out our paper.

{
  • "headers": [
    • "Rank",
    • "Model",
    • "Elo Score",
    • "Average Win Rate",
    • "Bradley-Terry Coefficient",
    • "Eigenvector Centrality Value",
    • "Newman Modularity Score",
    • "PageRank Score"
    ],
  • "data": [
    • [
      • 1,
      • "o1-mini",
      • 1002,
      • 100,
      • 0,
      • 0,
      • 0,
      • 0.65
      ],
    • [
      • 2,
      • "o3-mini",
      • 998,
      • 0,
      • 0,
      • 0,
      • 0,
      • 0.35
      ]
    ],
  • "metadata": null
}

Made with ❤️ for SE Arena. If this work is useful to you, please consider citing:

@misc{zhao2025searenabenchmarkingsoftware,
title={SE Arena: Benchmarking Software Engineering Chatbots with Iterative Interactions}, 
author={Zhimin Zhao},
year={2025},
eprint={2502.01860},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2502.01860}}