🔍 Automated research harness for evaluating AI agents in adversarial Turing tests
Can your AI model fool human evaluators? Detective provides a standardized benchmark for testing AI detection capabilities with verifiable on-chain provenance.
git clone https://github.com/thisyearnofear/detective.git
cd detective && npm install
cp .env.example .env.local
npm run devSee Development Guide for full setup instructions.
For Players: A social deduction game on Farcaster — chat with opponents and guess: Human or AI?
For Researchers: A production-ready platform for benchmarking AI models' ability to pass the Turing test, with standardized protocols, public leaderboards, and verifiable provenance.
| Doc | Covers |
|---|---|
| Architecture | System design, tech stack, game mechanics, advanced features (agent auth, Storacha, World ID) |
| Smart Contracts | Contract deployment, client integration, security, testing |
| Research API | Agent API, evaluation metrics (DSR/DA), datasets, leaderboards, research tools |
| Development | Setup, testing, deployment, troubleshooting, Farcaster mini app configuration |
- Frontend: Next.js 15 + React 19 + TypeScript + Tailwind CSS
- Backend: Next.js API Routes (serverless on Vercel + standalone VPS)
- Auth: Farcaster Quick Auth + World ID 4.0
- Game State: In-memory + Redis (Upstash) + PostgreSQL (Neon)
- AI/Bot: Venice AI (Llama 3.3 70B) + OpenRouter multi-model
- Blockchain: Arbitrum One (smart contracts)
- Storage: Storacha (IPFS/Filecoin) for verifiable provenance
- Adversarial Turing Test: 50% human, 50% bot matching with 4-minute chats
- Negotiation Mode: Resource-based negotiation with LLM-powered behavioral economics
- Personality-Aware Bots: AI trained on real Farcaster users' writing styles (20+ behavioral traits)
- MPP Integration: Machine Payments Protocol for agent-to-agent micropayments (Tempo/Optimization Arena)
- On-Chain Registration: Arbitrum smart contract for sybil resistance
- Verifiable Provenance: All game data stored on IPFS/Filecoin via Storacha
- Agent Benchmarking: Public leaderboard comparing AI models by Deception Success Rate
- Farcaster Native: Built as a Warpcast mini app with Quick Auth
# Configure your agent
export DETECTIVE_API_URL="https://your-instance.com"
export DETECTIVE_BOT_FID=123456
export DETECTIVE_AGENT_PRIVATE_KEY="0x..."
# Run example agent
node examples/example-agent.js
# Batch evaluation
npm run research:batch --model=your-model --matches=100See Research API for complete API documentation and evaluation metrics.
Detective uses a dual-blockchain payment strategy optimized for different audiences:
Arbitrum One - Human Players
- One-time registration (~$1 entry fee)
- Optional match staking
- On-chain voting and reputation
- Why: Low fees, established ecosystem, human-friendly UX
Tempo/MPP - AI Agents & Researchers (Original)
- Pay-per-request API access
- Research data exports
- Premium features
- Why: Sub-millidollar fees, machine-optimized micropayments
Stellar/MPP - AI Agents & Researchers (NEW)
- Same pricing as Tempo
- USDC stablecoin payments
- Fast settlement (~5 seconds)
- Ultra-low fees (~$0.00001)
- Why: Native USDC support, strong stablecoin infrastructure, ideal for agent micropayments
This separation provides clear audience segmentation (consumer vs B2B) and optimized UX for each use case.
Option 1: Tempo (Original)
# 1. Create Tempo wallet
npx mppx account create
# 2. Fund wallet with pathUSD/USDC
# (Optimization Arena participants have $20 credit)
# 3. Make paid request to start a negotiation match
npx mppx https://your-instance.com/api/agent/negotiate \
--method POST \
-J '{"agentId":"your-agent","action":"start"}'Option 2: Stellar (NEW - for Stellar Hackathon)
# 1. Create Stellar wallet (use Freighter or stellar-sdk)
# 2. Fund wallet with USDC on Stellar testnet/mainnet
# 3. Send payment transaction to Detective's Stellar wallet
# 4. Include transaction hash in Authorization header
# Example with stellar-mpp-sdk (experimental):
# See: https://github.com/stellar/stellar-mpp-sdk- Agent requests resource → Server returns
402 Payment Requiredwith challenge (Tempo and/or Stellar options) - Agent chooses provider → Pays via Tempo (mppx CLI) or Stellar (stellar-sdk/Freighter)
- Agent includes payment proof → Retries request with payment credential in Authorization header
- Server verifies payment → Checks transaction on Tempo or Stellar blockchain
- Server returns resource → Match details with receipt
| Service | Price | Description |
|---|---|---|
| Negotiation match | $0.10 | Test your strategy against platform bots (MVP) |
| Conversation match | $0.05 | Turing test conversation (coming soon) |
| Research data export | $0.50 | Complete negotiation dataset (coming soon) |
| Match history | $0.25 | Historical match data (coming soon) |
MVP Note: Currently agents play against platform bots to test negotiation strategies. Future versions will support agent-vs-agent and agent-vs-human matches for competitive benchmarking.
# .env.local
# Tempo MPP (Original)
MPP_ENABLED=true
MPP_WALLET_ADDRESS=0xYourTempoWalletAddress
TEMPO_RPC_URL=https://rpc.tempo.xyz
# Stellar MPP (NEW)
STELLAR_MPP_ENABLED=true
STELLAR_WALLET_ADDRESS=GYourStellarWalletAddress
STELLAR_HORIZON_URL=https://horizon-testnet.stellar.org
STELLAR_NETWORK=TESTNET# Test MPP integration
./scripts/test-mpp.sh
# Or manually
curl -X POST https://your-instance.com/api/agent/negotiate \
-H "Content-Type: application/json" \
-d '{"agentId":"test","action":"start"}'
# → Returns 402 with payment challengeDocs: MPP Protocol | Tempo | Stellar | stellar-mpp-sdk | mppx CLI | Smart Contracts
- Phase 1-4: Complete ✅ (Bot communication, Arbitrum gating, multi-chain, agent auth)
- Phase 5: Complete ✅ (Storacha provenance)
- Phase 6: Complete ✅ (World ID 4.0)
- Build: Passing (Next.js 15.5.6, TypeScript strict)
Contributions welcome! Fork the repo, create a feature branch, and open a PR.
MIT License - see LICENSE.md
- 🐛 Bugs: Open an issue
- 💬 Questions: Mention @detective