59

Geospatial Data Engineer – Build Industrial Facility Dataset (OpenStreetMap + Python)

UpworkUSNot specifiedintermediate
PythonMicrosoft ExcelData ScienceMachine LearningETL Pipeline
Overview We are building a facility intelligence dataset for industrial and logistics locations across the United States. The goal of this project is to extract industrial facilities (warehouses, manufacturing plants, distribution centers, logistics hubs) from OpenStreetMap, enrich them with business data, and produce a clean structured dataset of facilities and operating companies. This dataset will ultimately support logistics market intelligence and supply chain analysis. We are looking for an engineer with experience in: Geospatial datasets OpenStreetMap / Overpass API Python data pipelines Data enrichment workflows This project is not simple web scraping. OSM data is messy and facilities must be cleaned, deduplicated, and enriched properly. Scope of Work 1. Extract Industrial Facilities from OpenStreetMap Build a Python script using the Overpass API to extract facilities with tags such as: building=warehouse building=industrial industrial=manufacturing industrial=logistics landuse=industrial The script should return: OSM ID coordinates geometry all relevant tags Pilot regions will likely include Texas, California, and Midwest logistics hubs. 2. Clean and Normalize the Data Raw OSM data contains significant noise. The pipeline should: Remove irrelevant locations (scrap yards, utilities, empty industrial land, etc.) Normalize tag structures Convert OSM JSON into structured datasets Handle nodes, ways, and relations correctly Deduplication of facilities is important since OSM can represent the same location multiple ways. 3. Enrich Facilities with Business Data Enrich each location with business information using APIs such as: Google Places business directories other enrichment sources if appropriate Goal fields: company facility_type industry lat lon city state Facility types should include categories like: manufacturing plant distribution center logistics hub warehouse 4. Deliver the Dataset and Pipeline Deliverables should include: Python pipeline scripts documentation cleaned dataset Preferred output formats: CSV Parquet Technical Stack (Preferred) Python Pandas / GeoPandas Overpass API Geospatial libraries API-based enrichment JSON processing Experience with GIS, geospatial ETL pipelines, or location intelligence systems is a major plus. Pilot Project Initial phase will focus on 10–20 cities to validate the approach before scaling nationwide. If the pilot succeeds, we will expand to a full US dataset. What We’re Looking For Strong candidates will have experience with: geospatial data engineering OpenStreetMap datasets geospatial joins or spatial clustering facility deduplication enrichment pipelines handling API rate limits and scaling To Apply Please briefly answer the following: Have you worked with OpenStreetMap datasets or Overpass API before? How would you deduplicate facilities when OSM represents the same location as nodes, ways, and polygons? What enrichment sources would you use if Google Places does not return a business listing for a facility? Project Details Pilot scope: 10–20 cities Estimated time: 20–40 hours Opportunity to expand to nationwide dataset build if pilot is successful.
View Original Listing
Unlock AI intelligence, score breakdowns, and real-time alerts
Upgrade to Pro — $29.99/mo

Client

Spent: $4,042.25Rating: 4.8Verified