59
Geospatial Data Engineer – Build Industrial Facility Dataset (OpenStreetMap + Python)
UpworkUSNot specifiedintermediate
PythonMicrosoft ExcelData ScienceMachine LearningETL Pipeline
Overview
We are building a facility intelligence dataset for industrial and logistics locations across the United States.
The goal of this project is to extract industrial facilities (warehouses, manufacturing plants, distribution centers, logistics hubs) from OpenStreetMap, enrich them with business data, and produce a clean structured dataset of facilities and operating companies.
This dataset will ultimately support logistics market intelligence and supply chain analysis.
We are looking for an engineer with experience in:
Geospatial datasets
OpenStreetMap / Overpass API
Python data pipelines
Data enrichment workflows
This project is not simple web scraping. OSM data is messy and facilities must be cleaned, deduplicated, and enriched properly.
Scope of Work
1. Extract Industrial Facilities from OpenStreetMap
Build a Python script using the Overpass API to extract facilities with tags such as:
building=warehouse
building=industrial
industrial=manufacturing
industrial=logistics
landuse=industrial
The script should return:
OSM ID
coordinates
geometry
all relevant tags
Pilot regions will likely include Texas, California, and Midwest logistics hubs.
2. Clean and Normalize the Data
Raw OSM data contains significant noise.
The pipeline should:
Remove irrelevant locations (scrap yards, utilities, empty industrial land, etc.)
Normalize tag structures
Convert OSM JSON into structured datasets
Handle nodes, ways, and relations correctly
Deduplication of facilities is important since OSM can represent the same location multiple ways.
3. Enrich Facilities with Business Data
Enrich each location with business information using APIs such as:
Google Places
business directories
other enrichment sources if appropriate
Goal fields:
company
facility_type
industry
lat
lon
city
state
Facility types should include categories like:
manufacturing plant
distribution center
logistics hub
warehouse
4. Deliver the Dataset and Pipeline
Deliverables should include:
Python pipeline scripts
documentation
cleaned dataset
Preferred output formats:
CSV
Parquet
Technical Stack (Preferred)
Python
Pandas / GeoPandas
Overpass API
Geospatial libraries
API-based enrichment
JSON processing
Experience with GIS, geospatial ETL pipelines, or location intelligence systems is a major plus.
Pilot Project
Initial phase will focus on 10–20 cities to validate the approach before scaling nationwide.
If the pilot succeeds, we will expand to a full US dataset.
What We’re Looking For
Strong candidates will have experience with:
geospatial data engineering
OpenStreetMap datasets
geospatial joins or spatial clustering
facility deduplication
enrichment pipelines
handling API rate limits and scaling
To Apply
Please briefly answer the following:
Have you worked with OpenStreetMap datasets or Overpass API before?
How would you deduplicate facilities when OSM represents the same location as nodes, ways, and polygons?
What enrichment sources would you use if Google Places does not return a business listing for a facility?
Project Details
Pilot scope: 10–20 cities
Estimated time: 20–40 hours
Opportunity to expand to nationwide dataset build if pilot is successful.
Unlock AI intelligence, score breakdowns, and real-time alerts
Upgrade to Pro — $29.99/moClient
Spent: $4,042.25Rating: 4.8Verified