Background
I'm a developer from ByteDance (China), specializing in building AI agents for autonomous Android GUI operations. We've been using the AndroidWorld benchmark to evaluate our agent — which is powered by ByteDance's Doubao Visual Model — and recently achieved a score of ~95 points (following your official evaluation metrics).
We hope to submit this result to the official AndroidWorld leaderboard (linked here: AndroidWorld Leaderboard) to contribute to the open-source community and align with state-of-the-art results.
Questions
Could you please guide us on the following to complete the submission properly:
- What is the official workflow for submitting/updating scores to the AndroidWorld leaderboard? (e.g., PR submission, spreadsheet edit access, document verification via this Issue)
- What specific materials do we need to provide to validate our score? (e.g., full evaluation logs, reproducible test scripts, environment configuration details, task-wise performance breakdown)
- Are there any standard formats/metric calculation rules we need to follow to ensure consistency with existing leaderboard entries?
Additional Details
- Organization: ByteDance Inc. (China)
- Agent Model: Doubao Visual Model (for Android GUI automation)
- Benchmark Score: ~95 points (evaluated on the latest AndroidWorld task suite)
- Evaluation Environment: Aligned with the setup in AndroidWorld's
README.md (Android 13 emulator, Python 3.10, dependencies installed via requirements.txt)
We’re ready to provide any additional verification materials needed. Thank you for maintaining this valuable benchmark — it’s been critical for our agent’s development!
Background
I'm a developer from ByteDance (China), specializing in building AI agents for autonomous Android GUI operations. We've been using the AndroidWorld benchmark to evaluate our agent — which is powered by ByteDance's Doubao Visual Model — and recently achieved a score of ~95 points (following your official evaluation metrics).
We hope to submit this result to the official AndroidWorld leaderboard (linked here: AndroidWorld Leaderboard) to contribute to the open-source community and align with state-of-the-art results.
Questions
Could you please guide us on the following to complete the submission properly:
Additional Details
README.md(Android 13 emulator, Python 3.10, dependencies installed viarequirements.txt)We’re ready to provide any additional verification materials needed. Thank you for maintaining this valuable benchmark — it’s been critical for our agent’s development!