How to Build a B2B Lead Database with Web Scraping

B2B lead generation is one of the highest-ROI applications of web scraping. Instead of paying $0.50–$2.00 per lead from data brokers, you can build custom, targeted lead databases by extracting company and contact information directly from public sources — at a fraction of the cost.

In this guide, we'll walk through the entire pipeline: identifying data sources, building scrapers, structuring your output, and enriching your leads with verified emails and phone numbers.

TL;DR: The best B2B lead sources are Google Maps, LinkedIn (company pages), industry directories, and Crunchbase. Scraping these requires headless browsers, proxy rotation, and structured data pipelines. We offer pre-built lead databases and custom scraping — see our lead generation solutions.

Best Data Sources for B2B Leads

Not all sources are created equal. Here's where the highest-quality B2B data lives:

1. Google Maps / Google Business Profiles

Google Maps is arguably the best source for local business leads. Every business listing includes company name, address, phone number, website URL, business hours, and review data. For service-based businesses (plumbers, dentists, restaurants, law firms), Google Maps is unbeatable.

Key data points: business name, address, phone, website, category, rating, review count, and operating hours.

2. LinkedIn Company Pages

LinkedIn provides rich company data — industry, company size, headquarters location, employee count, founding year, and specialties. While personal profiles are harder to scrape ethically, company pages are publicly accessible and contain valuable firmographic data.

3. Industry-Specific Directories

Every industry has its own directories: Clutch (agencies), G2 (SaaS), Houzz (contractors), Avvo (lawyers), Healthgrades (doctors). These directories are structured, paginated, and often easier to scrape than general platforms. They also provide niche-specific data like service categories, pricing tiers, and certifications.

4. Crunchbase & AngelList

For tech and startup leads, Crunchbase provides funding data, key personnel, company descriptions, and technology stacks. AngelList (now Wellfound) focuses on startup hiring and investment data.

5. Business Registries & Government Data

State business registries (Secretary of State filings), SEC EDGAR for public companies, and SBA databases provide legally registered business data including registered agents, business type, and filing dates.

What Data to Extract

Data Field	Source	Use Case
Company Name	All sources	Core identifier
Website URL	Google Maps, Directories	Email enrichment, tech stack analysis
Phone Number	Google Maps, Directories	Outbound calling
Email Address	Directories, Website scraping	Email outreach
Industry / Category	LinkedIn, Directories	Segmentation
Employee Count	LinkedIn, Crunchbase	Company size filtering
Location	All sources	Geo-targeting
Revenue Estimate	Crunchbase, ZoomInfo	Lead scoring
Technology Stack	BuiltWith, Wappalyzer	Tech-based targeting
Social Profiles	Website scraping	Multi-channel outreach

Technical Challenges

Rate Limiting & IP Blocking

Google Maps, LinkedIn, and most directories implement aggressive rate limiting. Exceeding request thresholds results in CAPTCHAs, temporary bans, or permanent IP blocks. Residential proxy rotation is essential — datacenter IPs get flagged almost immediately on these platforms.

JavaScript-Rendered Content

LinkedIn and Google Maps render content dynamically with JavaScript. Simple HTTP requests return empty pages. You need headless browsers (Playwright, Puppeteer) to render the full DOM before extracting data.

Anti-Bot Detection

Modern platforms use browser fingerprinting, behavioral analysis, and honeypot traps. Your scraper needs to simulate realistic browsing patterns — random delays, mouse movements, and varied viewport sizes.

Data Quality & Deduplication

Raw scraped data is messy. The same business might appear in Google Maps, Yelp, and an industry directory with slightly different names or addresses. You need robust deduplication logic — fuzzy matching on company names, address normalization, and domain-based matching.

Data Enrichment Pipeline

Raw scraped data is just the starting point. A production lead database needs enrichment:

Email Discovery: Use the company domain to find email patterns (first.last@company.com) and verify them with SMTP validation
Phone Verification: Validate phone numbers are active and identify line type (mobile vs landline)
Social Profile Matching: Link company data to LinkedIn, Twitter, and Facebook profiles
Technology Detection: Scrape company websites to identify their tech stack (CMS, analytics, marketing tools)
Revenue Estimation: Cross-reference employee count, industry, and public data to estimate company revenue

Structuring Your Database

A well-structured lead database should include these normalized fields:

Company record: Name, domain, industry, sub-industry, employee range, revenue range, location (city, state, country, ZIP)
Contact record: Full name, title, email (verified/unverified), phone, LinkedIn URL
Metadata: Source, scrape date, last verified date, confidence score

Store in a relational database (PostgreSQL) with proper indexing on industry, location, and company size for fast filtering. Export to CSV or JSON for CRM import.

Legal & Ethical Considerations

B2B lead scraping occupies a legal gray area. Key guidelines:

Only scrape publicly available data — never bypass login walls or CAPTCHA-protected areas
Respect robots.txt directives and rate-limit your requests
Comply with GDPR if processing EU-based personal data — have a legitimate interest basis
Follow CAN-SPAM and TCPA regulations when using scraped data for outreach
The landmark hiQ v. LinkedIn ruling (2022) established that scraping publicly available data is generally permissible

The Managed Solution

Building and maintaining lead scraping infrastructure is a significant engineering investment. Proxies, anti-bot bypass, data cleaning, enrichment, and ongoing maintenance add up quickly.

At Crawl-Data, we provide ready-to-use B2B lead databases and custom scraping services:

✅ Pre-built lead databases from $29 — segmented by industry, location, and company size
✅ Custom Google Maps scraping for any category and location
✅ Directory scraping across 50+ industry-specific platforms
✅ Email enrichment and verification included
✅ Weekly or monthly data refreshes
✅ CSV, JSON, or direct CRM integration

Need B2B Lead Data?

Tell us your target industry, location, and company size. We'll deliver a custom lead database.

View Lead Solutions → Get a Quote