Live demo: scraper.propertywebbuilder.com
From the team behind PropertyWebBuilder — the open-source real estate platform.
A real estate listing extraction platform with a public API, export pipeline, webhook-ready haul workflows, admin diagnostics, and Chrome extensions. Given a property listing URL (or pre-rendered HTML), it returns structured data such as title, price, coordinates, images, and 70+ normalized fields.
Built with Astro (SSR mode), TypeScript, and Cheerio.
PropertyWebScraper currently ships with 109 named portal mappings across 75 countries, plus a generic fallback mapping for browser-rendered HTML captured from unsupported hosts.
- Canonical support metadata lives in
astro-app/src/lib/services/portal-registry.ts - The public catalog is exposed in the app at
/sites - The API catalog is exposed at
/public_api/v1/supported_sites
Examples of covered portals include Rightmove, Zoopla, OnTheMarket, Idealista, Fotocasa, Realtor.com, Redfin, Zillow, Funda, Daft.ie, Domain, RealEstate.com.au, Hemnet, SeLoger, Immobiliare.it, RealEstateIndia, and many more.
Support tiers:
core— strong direct URL support and ongoing fixture coverageexperimental— available but with lower confidence or narrower fixture historymanual-only— best used with browser-rendered HTML rather than direct URL fetches
The project includes a Manifest V3 Chrome extension that makes extraction available with one click on any supported listing page.
- Badge indicator — green check on supported sites
- Haul collections — browse multiple listings, then view them all on a single results page
- Property card popup — image, price, stats, quality grade
- Copy to clipboard — JSON or listing URL
- No API key required — uses anonymous haul collections
Install (dev mode): Open chrome://extensions/ → enable Developer mode → Load unpacked → select chrome-extensions/property-scraper/ folder.
See the full Chrome Extension documentation for architecture details and configuration.
The extraction engine takes fully-rendered HTML and a source URL, then applies configurable JSON mappings (CSS selectors, script JSON paths, regex patterns, JSON-LD, flight data paths) to extract structured property data. No browser automation or JS rendering happens inside the engine itself — the caller provides the HTML.
- User browses supported listing pages — extension badge turns green
- Click the extension icon to extract the current listing
- Results are collected into an anonymous haul — no login required
- A shareable results page shows all collected listings with comparison data
cd astro-app
npm install
npm run devThe dev server starts at http://localhost:4321. You can extract a listing via the web UI or the API.
POST /extract/url
Content-Type: application/x-www-form-urlencoded
url=https://www.rightmove.co.uk/properties/168908774
POST /extract/html
Content-Type: application/x-www-form-urlencoded
url=https://www.rightmove.co.uk/properties/168908774&html=<html>...</html>
GET /public_api/v1/listings?url=https://www.rightmove.co.uk/properties/168908774
GET /public_api/v1/supported_sites
GET /public_api/v1/health
POST /ext/v1/hauls # Create anonymous haul
GET /ext/v1/hauls/:id # Get haul summary
POST /ext/v1/hauls/:id/scrapes # Add extraction to haul
See DESIGN.md for the full API endpoint reference and architecture details.
An MCP server (astro-app/mcp-server.ts) enables Claude Code to capture rendered HTML directly from Chrome via the MCP Bridge extension. Start it with:
npx tsx astro-app/mcp-server.tscd astro-app
npx vitest runproperty_web_scraper/
├── astro-app/ # Astro 5 SSR application (active development)
│ ├── src/lib/extractor/ # Core extraction pipeline
│ ├── src/lib/services/ # URL validation, auth, rate limiting
│ ├── src/pages/ # Astro pages and API endpoints
│ ├── test/ # Vitest tests and HTML fixtures
│ └── scripts/ # CLI utilities (capture-fixture)
├── chrome-extensions/ # Chrome extensions
│ ├── property-scraper/ # Public extension (one-click extraction popup)
│ └── mcp-bridge/ # Dev extension (WebSocket bridge to MCP server)
├── config/scraper_mappings/ # JSON mapping files per portal
│ └── archive/ # Legacy mappings (kept for reference)
├── app/ # Legacy Rails engine (see RAILS_README.md)
└── spec-archive/ # Archived Rails RSpec tests (not run in CI)
Each supported site has a JSON mapping file in config/scraper_mappings/ with a country-code prefix (e.g. uk_rightmove.json, es_idealista.json). These define CSS selectors, script JSON paths, regex patterns, and post-processing rules for extracting fields from that site's HTML.
PropertyWebScraper is part of the PropertyWebBuilder ecosystem. These projects all use it as their extraction backend:
| Project | What it does | Stack |
|---|---|---|
| HomesToCompare | AI-powered side-by-side property comparisons with 11 analysis sections and Firestore sync | Astro, React, Firestore |
| HousePriceGuess | Gamified property price guessing with AI dossiers, 18+ white-label brands, and embeddable widgets | Astro, React, Tailwind |
| SinglePropertyPages | SaaS for dedicated property microsites with lead capture, analytics, and WYSIWYG editor | Astro, TypeScript |
| PropertySquares | 48-step first-time buyer journey across multiple markets | Astro, TypeScript |
Building a real estate project? PropertyWebScraper gives you structured listing data, support-tier metadata, export formats, and Chrome-extension capture flows from a much broader multi-country portal catalog. Open an issue to get your project listed here.
This project was originally a Ruby on Rails engine. The Rails code in app/ is kept for legacy purposes but is no longer under active development. See RAILS_README.md for details.
The easiest way to contribute is to add a scraper for a property portal in your country. We have a step-by-step guide in CONTRIBUTING.md that walks you through the process — no deep knowledge of the codebase required.
We also welcome bug fixes, test improvements, and documentation updates. See the open issues for ideas.
If you like this project, please star it and spread the word on Twitter, LinkedIn and Facebook.
Available as open source under the terms of the MIT License.
While scraping can sometimes be used as a legitimate way to access all kinds of data on the internet, it's also important to consider the legal implications. There are cases where scraping data may be considered illegal, or open you to the possibility of being sued.
This tool was created in part as a learning exercise and is shared in case others find it useful. If you do decide to use this tool to scrape a website it is your responsibility to ensure that what you are doing is legal.