PropertyWebScraper

Live demo: scraper.propertywebbuilder.com

From the team behind PropertyWebBuilder — the open-source real estate platform.

A real estate listing extraction platform with a public API, export pipeline, webhook-ready haul workflows, admin diagnostics, and Chrome extensions. Given a property listing URL (or pre-rendered HTML), it returns structured data such as title, price, coordinates, images, and 70+ normalized fields.

Built with Astro (SSR mode), TypeScript, and Cheerio.

Supported Sites

PropertyWebScraper currently ships with 109 named portal mappings across 75 countries, plus a generic fallback mapping for browser-rendered HTML captured from unsupported hosts.

Canonical support metadata lives in astro-app/src/lib/services/portal-registry.ts
The public catalog is exposed in the app at /sites
The API catalog is exposed at /public_api/v1/supported_sites

Examples of covered portals include Rightmove, Zoopla, OnTheMarket, Idealista, Fotocasa, Realtor.com, Redfin, Zillow, Funda, Daft.ie, Domain, RealEstate.com.au, Hemnet, SeLoger, Immobiliare.it, RealEstateIndia, and many more.

Support tiers:

core — strong direct URL support and ongoing fixture coverage
experimental — available but with lower confidence or narrower fixture history
manual-only — best used with browser-rendered HTML rather than direct URL fetches

Chrome Extension

The project includes a Manifest V3 Chrome extension that makes extraction available with one click on any supported listing page.

Badge indicator — green check on supported sites
Haul collections — browse multiple listings, then view them all on a single results page
Property card popup — image, price, stats, quality grade
Copy to clipboard — JSON or listing URL
No API key required — uses anonymous haul collections

Install (dev mode): Open chrome://extensions/ → enable Developer mode → Load unpacked → select chrome-extensions/property-scraper/ folder.

See the full Chrome Extension documentation for architecture details and configuration.

How It Works

The extraction engine takes fully-rendered HTML and a source URL, then applies configurable JSON mappings (CSS selectors, script JSON paths, regex patterns, JSON-LD, flight data paths) to extract structured property data. No browser automation or JS rendering happens inside the engine itself — the caller provides the HTML.

Haul workflow (Chrome extension)

User browses supported listing pages — extension badge turns green
Click the extension icon to extract the current listing
Results are collected into an anonymous haul — no login required
A shareable results page shows all collected listings with comparison data

Quick Start

cd astro-app
npm install
npm run dev

The dev server starts at http://localhost:4321. You can extract a listing via the web UI or the API.

API

Extract from URL

POST /extract/url
Content-Type: application/x-www-form-urlencoded

url=https://www.rightmove.co.uk/properties/168908774

Extract from HTML

POST /extract/html
Content-Type: application/x-www-form-urlencoded

url=https://www.rightmove.co.uk/properties/168908774&html=<html>...</html>

Public API

GET /public_api/v1/listings?url=https://www.rightmove.co.uk/properties/168908774
GET /public_api/v1/supported_sites
GET /public_api/v1/health

Chrome Extension (Haul) API

POST /ext/v1/hauls                    # Create anonymous haul
GET  /ext/v1/hauls/:id                # Get haul summary
POST /ext/v1/hauls/:id/scrapes        # Add extraction to haul

See DESIGN.md for the full API endpoint reference and architecture details.

MCP Server

An MCP server (astro-app/mcp-server.ts) enables Claude Code to capture rendered HTML directly from Chrome via the MCP Bridge extension. Start it with:

npx tsx astro-app/mcp-server.ts

Running Tests

cd astro-app
npx vitest run

Project Structure

property_web_scraper/
├── astro-app/                  # Astro 5 SSR application (active development)
│   ├── src/lib/extractor/      # Core extraction pipeline
│   ├── src/lib/services/       # URL validation, auth, rate limiting
│   ├── src/pages/              # Astro pages and API endpoints
│   ├── test/                   # Vitest tests and HTML fixtures
│   └── scripts/                # CLI utilities (capture-fixture)
├── chrome-extensions/          # Chrome extensions
│   ├── property-scraper/      # Public extension (one-click extraction popup)
│   └── mcp-bridge/            # Dev extension (WebSocket bridge to MCP server)
├── config/scraper_mappings/    # JSON mapping files per portal
│   └── archive/                # Legacy mappings (kept for reference)
├── app/                        # Legacy Rails engine (see RAILS_README.md)
└── spec-archive/               # Archived Rails RSpec tests (not run in CI)

Scraper Mappings

Each supported site has a JSON mapping file in config/scraper_mappings/ with a country-code prefix (e.g. uk_rightmove.json, es_idealista.json). These define CSS selectors, script JSON paths, regex patterns, and post-processing rules for extracting fields from that site's HTML.

Projects Using This API

PropertyWebScraper is part of the PropertyWebBuilder ecosystem. These projects all use it as their extraction backend:

Project	What it does	Stack
HomesToCompare	AI-powered side-by-side property comparisons with 11 analysis sections and Firestore sync	Astro, React, Firestore
HousePriceGuess	Gamified property price guessing with AI dossiers, 18+ white-label brands, and embeddable widgets	Astro, React, Tailwind
SinglePropertyPages	SaaS for dedicated property microsites with lead capture, analytics, and WYSIWYG editor	Astro, TypeScript
PropertySquares	48-step first-time buyer journey across multiple markets	Astro, TypeScript

Building a real estate project? PropertyWebScraper gives you structured listing data, support-tier metadata, export formats, and Chrome-extension capture flows from a much broader multi-country portal catalog. Open an issue to get your project listed here.

Legacy Rails Engine

This project was originally a Ruby on Rails engine. The Rails code in app/ is kept for legacy purposes but is no longer under active development. See RAILS_README.md for details.

Contributing

The easiest way to contribute is to add a scraper for a property portal in your country. We have a step-by-step guide in CONTRIBUTING.md that walks you through the process — no deep knowledge of the codebase required.

We also welcome bug fixes, test improvements, and documentation updates. See the open issues for ideas.

If you like this project, please star it and spread the word on Twitter, LinkedIn and Facebook.

License

Available as open source under the terms of the MIT License.

Disclaimer

While scraping can sometimes be used as a legitimate way to access all kinds of data on the internet, it's also important to consider the legal implications. There are cases where scraping data may be considered illegal, or open you to the possibility of being sued.

This tool was created in part as a learning exercise and is shared in case others find it useful. If you do decide to use this tool to scrape a website it is your responsibility to ensure that what you are doing is legal.

Name		Name	Last commit message	Last commit date
Latest commit History 466 Commits
.agent/workflows		.agent/workflows
.claude		.claude
.github		.github
.husky		.husky
app		app
astro-app		astro-app
bin		bin
chrome-extensions		chrome-extensions
config		config
db/seeds		db/seeds
docs		docs
lib		lib
spec-archive		spec-archive
.env.development		.env.development
.gitignore		.gitignore
.mcp.json		.mcp.json
.rspec		.rspec
.yardopts		.yardopts
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
DESIGN.md		DESIGN.md
Guardfile		Guardfile
LICENSE		LICENSE
MIT-LICENSE		MIT-LICENSE
RAILS_README.md		RAILS_README.md
README.md		README.md
Rakefile		Rakefile
property_web_scraper.gemspec		property_web_scraper.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PropertyWebScraper

Supported Sites

Chrome Extension

How It Works

Haul workflow (Chrome extension)

Quick Start

API

Extract from URL

Extract from HTML

Public API

Chrome Extension (Haul) API

MCP Server

Running Tests

Project Structure

Scraper Mappings

Projects Using This API

Legacy Rails Engine

Contributing

License

Disclaimer

About

Licenses found

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

PropertyWebScraper

Supported Sites

Chrome Extension

How It Works

Haul workflow (Chrome extension)

Quick Start

API

Extract from URL

Extract from HTML

Public API

Chrome Extension (Haul) API

MCP Server

Running Tests

Project Structure

Scraper Mappings

Projects Using This API

Legacy Rails Engine

Contributing

License

Disclaimer

About

Topics

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages