Skip to content

urban-toolkit/autark-experiments

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Autark — Experiments

This repository contains the supplemental material for the paper Autark: A Serverless Toolkit for Prototyping Urban Visual Analytics Systems. It accompanies the agentic development evaluation reported in Section 5.3 of the paper, in which an AI coding agent (Claude Code, Opus 4.6) was asked to implement five urban VA tasks of increasing complexity under two conditions:

  • autark — the agent was given Autark's documentation as its primary context and instructed to use only Autark's API.
  • general — the agent was explicitly told not to use Autark, and could freely choose any general-purpose libraries available online.

Each task was run independently, with no conversation history carried between trials. Both conditions shared the same model configuration, max-turns budget, and stop criterion: the generated project had to compile, build, and serve without errors before the trial could end.

Repository contents

.
├── README.md                # this file
├── supplemental.pdf         # compiled supplemental document
├── run_trials.sh            # the experiment driver script
├── metrics.csv              # per-trial code metrics + per-app and global averages
└── trials/                  # one folder per app, containing prompts and outputs
    ├── app1-subway-accessibility/
    │   ├── prompt-autark.md
    │   ├── prompt-general.md
    │   └── t<N>/
    │       ├── autark/{meta.json, log.jsonl, output/}
    │       └── general/{meta.json, log.jsonl, output/}
    ├── app2-noise-pollution/
    ├── app3-noise-scatterplot/
    ├── app4-street-network/
    └── app5-subway-picking/

Running the experiment

The experiment is reproduced via run_trials.sh:

# all apps, both conditions
./run_trials.sh

# one app, both conditions
./run_trials.sh app1-subway-accessibility

# one app, one condition
./run_trials.sh app1-subway-accessibility autark

The script creates a fresh tN trial directory under each app, runs Claude Code inside it, and writes a log.jsonl (full stream-json transcript) and a meta.json (model, duration, timestamp). Every prompt-{autark,general}.md file is concatenated with a shared appendix of common instructions (OSM querying guidelines, console logging requirements, and the build/serve validation loop) before being sent to the model. The full text of that appendix is in run_trials.sh.

Metrics

metrics.csv contains one row per trial plus per-app averages (trial=avg) and a global average row (app=ALL, trial=avg). All metrics were computed by static analysis of the final source tree the agent left in each trial's output/ directory after passing the validation loop. The number of trials per (app, condition) pair is uneven — some pairs were re-run while iterating on prompt wording. The averages reported in Section 5.3 of the paper correspond to the ALL, avg rows.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 76.7%
  • Python 10.4%
  • HTML 9.8%
  • Shell 2.1%
  • Other 1.0%