🏆 Software Engineering (SE) Chatbot Leaderboard: Community-Driven Evaluation of Top SE Chatbots
The SE Arena is an open-source platform designed to evaluate language models through human preference, fostering transparency and collaboration. Developed by researchers at Software Analysis and Intelligence Lab (SAIL), the platform empowers the community to assess and compare the performance of leading foundation models in SE tasks. For technical details, check out our paper.
- "headers": [
- "Rank",
- "Model",
- "Elo Score",
- "Average Win Rate",
- "Bradley-Terry Coefficient",
- "Eigenvector Centrality Value",
- "Newman Modularity Score",
- "PageRank Score"
- "data": [
- [
- 1,
- "o1-mini",
- 1002,
- 100,
- 0,
- 0,
- 0,
- 0.65
- [
- 2,
- "o3-mini",
- 998,
- 0,
- 0,
- 0,
- 0,
- 0.35
- [
- "metadata": null
Made with ❤️ for SE Arena. If this work is useful to you, please consider citing:
@misc{zhao2025searenabenchmarkingsoftware,
title={SE Arena: Benchmarking Software Engineering Chatbots with Iterative Interactions},
author={Zhimin Zhao},
year={2025},
eprint={2502.01860},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2502.01860}}
⚔️ Software Engineering (SE) Arena: Explore and Test the Best SE Chatbots with Long-Context Interactions
📜How It Works
- Blind Comparison: Submit a SE-related query to two anonymous chatbots randomly selected from up to 33 top models, including OpenAI-o3, Grok-2, Gemini-2.0, Claude-3.7, Deepseek-r1, Mistral-large, Llama-3.3, Qwen-2.5, and others.
- Interactive Voting: Engage in multi-turn dialogues with both chatbots and compare their responses. You can continue the conversation until you confidently choose the better model.
- Fair Play Rules: Votes are counted only if chatbot identities remain anonymous. Revealing a chatbot's identity disqualifies the session.
Note: Due to budget constraints, responses that take longer than 90 seconds to generate will be discarded.
Please sign in first to vote!
Once signed in, your votes will be recorded securely.
Timeout
One of the models did not respond within 1 minute. Please try again.
Thanks for your vote!
Terms of Service
Users are required to agree to the following terms before using the service:
- The service is a research preview. It only provides limited safety measures and may generate offensive content.
- It must not be used for any illegal, harmful, violent, racist, or sexual purposes.
- Please do not upload any private information.
- The service collects user dialogue data, including both text and images, and reserves the right to distribute it under a Creative Commons Attribution (CC-BY) or a similar license.