Data Augmentation

UpworkUSNot specifiedintermediateScore: 54
Data MiningData ScrapingData AnalysisData EntryData Extraction
# Upwork Posting: CA C-57 Well Driller Data Verification ## Job Title Data Research Specialist: Verify 700+ California C-57 Well Driller Records (Websites + Service Counties) ## Project Summary We run a public directory for California well drilling contractors. We already have an exported dataset from official CSLB-derived records and automated enrichment, but we need a meticulous human verification pass before launch. Your job is to review each row and improve trust quality: - confirm the correct company website (or mark no website), - verify whether the entity is truly a well drilling contractor (not only consulting/testing), - extract and normalize California service counties from credible evidence. Accuracy matters more than speed. ## Scope - Total rows: ~717 license records - Dataset includes: - license and business identity fields, - website candidate fields (sometimes blank or low confidence), - extracted service areas (sometimes blank), - review queue hints and metadata. ## Source of Truth Rules 1. The contractor license list comes from California CSLB C-57 data. 2. Website/service-area verification must be based on: - official company website, - official branch/location pages, - clear service-area map/text on website. 3. Do not use third-party directory guesses as final truth unless corroborated by company-owned pages. ## Required Deliverables Provide **three CSV files**: 1. `verified_websites.csv` - `license_number` - `business_name` - `final_website_url` - `website_status` (`verified_website` | `no_public_website` | `needs_escalation`) - `verification_confidence` (`high` | `medium` | `low`) - `evidence_url` - `notes` 2. `verified_service_areas.csv` - `license_number` - `business_name` - `area_type` (`county` or `city`) - `normalized_area` (e.g., `Fresno County`, `Bakersfield`) - `state` (`CA`) - `source_page_url` - `evidence_snippet` (short quote from page) - `verification_confidence` (`high` | `medium` | `low`) - `notes` 3. `exceptions_escalation.csv` - `license_number` - `business_name` - `issue_type` (`ambiguous_website`, `not_well_driller`, `conflicting_service_claims`, `insufficient_evidence`, etc.) - `details` - `recommended_next_step` ## Detailed Instructions ### 1) Website Verification For each license: - Confirm the website domain matches the business brand/name. - Reject mismatched domains (different company with similar name). - If multiple locations/phone numbers exist under same brand, keep as same brand unless clearly unrelated. - If no reliable public website exists, set `website_status = no_public_website`. ### 2) Well Drilling Relevance Check We only want actual well drilling contractors. - Accept if evidence clearly indicates drilling/well drilling services. - Flag and escalate if company appears to be only: - environmental consulting, - lab/testing only, - geotech consulting only, - unrelated trades. ### 3) Service Area Extraction Priority order: 1. Explicit county list on website. 2. Service map that clearly corresponds to counties. 3. Explicit city list that can be mapped to counties. Always prefer county-level coverage when possible. ### 4) County Normalization Rules (Critical) - Normalize to official California county names with suffix `County`. - Example: `fresno` = `Fresno County`. - If website says “Bay Area,” do **not** guess all Bay counties; only include counties directly supported by page evidence. - If map is vague, include only clearly inferable counties and set confidence to `medium` or `low`. ### 5) Evidence Requirement Every verified website/service-area row must include: - `source_page_url` - short `evidence_snippet` from that page (or explicit note why unavailable) No evidence = no acceptance. ## Quality Bar (Acceptance Criteria) Work is accepted only if: 1. CSV format is valid and consistent (no broken delimiters/headers). 2. Greater than or equal to 98% of rows contain complete required fields for chosen status. 3. Random QA sample of 75 rows has: - less than or equal to 2 minor issues - 0 critical mismatches (wrong company site, wrong county assignment) 4. All uncertain records are cleanly routed into `exceptions_escalation.csv`. ## Suggested Workflow for Contractor 1. Work in batches of 100 rows. 2. Submit first 25-row pilot for calibration. 3. Incorporate feedback. 4. Continue with full dataset in checkpoints. ## What We Provide - Base CSV export from our current database. - Definitions and examples. - Clarification on edge cases during project. ## Proposal Requirements When applying, include: 1. Similar data-cleaning/lead-enrichment project examples. 2. Your QA method to prevent wrong website associations. 3. Expected turnaround time for 717 rows. 4. Confirmation you can deliver strict evidence-backed county normalization. ## Nice-to-Have Skills - Experience with contractor/service directories - Strong web research and entity resolution skills - Ability to reason about geography and county boundaries
View Original Listing
Unlock AI Intelligence, score breakdowns, and real-time alerts
Upgrade to Pro — $29.99/mo

Client

Spent: $2,953.23Rating: 5.0Verified