Data Engineer

UpworkGBNot specifiedexpertScore: 69
ETL PipelinePythonData Science
Data Engineer
Data Engineer - AI-Powered Real Estate Feasibility & Investment Intelligence Platform

Project Overview
We are developing a large-scale AI-powered real estate feasibility and investment intelligence platform designed to automate and enhance land and development opportunity evaluation.
The platform integrates diverse data sources, including:
Geospatial datasets (GIS layers, zoning maps, satellite imagery)

Real estate transaction and pricing data

Government planning and regulatory documents

Construction cost benchmarks and engineering data

Demographic and economic indicators

Historical feasibility studies and financial models

The system will transform fragmented and heterogeneous data into structured intelligence that supports:
Land-use recommendation

Investment performance forecasting (ROI, IRR, NPV, payback period)

Risk assessment and sensitivity modeling

Scenario comparison

Automated feasibility report generation

The Data Engineer will be responsible for building and managing the data infrastructure that powers the entire AI decision engine.
Role Overview
The Data Engineer will design, implement, and maintain scalable data pipelines and data architecture to support machine learning models, geospatial analytics, and financial simulations.
This role requires strong experience in handling both structured and unstructured data, building ETL/ELT workflows, and integrating multiple external data sources into a unified data environment.
Key Responsibilities
Data Architecture Design
Design the end-to-end data architecture supporting AI, analytics, and reporting modules.

Define data schemas for feasibility cases, financial projections, zoning constraints, and geospatial attributes.

Establish data modeling standards for structured and semi-structured data.

Implement data versioning strategies for feasibility studies and model outputs.

Data Ingestion and Integration
Build data pipelines for ingesting structured and unstructured data from multiple sources.

Integrate APIs from government datasets, GIS platforms, transaction portals, and financial benchmarks.

Design ingestion workflows for document-based inputs (PDFs, scanned reports, spreadsheets).

Implement data validation and cleansing processes.

Geospatial Data Handling
Ingest and process GIS layers and zoning overlays.

Implement geospatial feature engineering pipelines.

Manage spatial indexing and coordinate systems.

Integrate satellite and map-based data with structured financial datasets.

ETL / ELT Development
Design automated ETL pipelines for transforming raw inputs into AI-ready datasets.

Build transformation workflows to support financial modeling and demand forecasting.

Implement data aggregation and feature engineering logic for machine learning models.

Ensure reliability and performance of recurring data refresh cycles.

Mobile and Behavioral Data (If Applicable)
Design data ingestion strategies for anonymized mobility or movement datasets.

Structure time-series and spatial movement data for AI consumption.

Ensure privacy-preserving processing and compliance where required.

Data Quality and Governance
Implement data quality checks and anomaly detection.

Ensure consistency across integrated datasets.

Maintain documentation of data lineage and transformations.

Establish access control and governance standards.

Performance and Scalability
Optimize database performance for high-volume queries.

Implement indexing and caching strategies.

Design scalable cloud-native data architecture.

Ensure high availability and fault tolerance.

Collaboration
Work closely with AI engineers to prepare model-ready datasets.

Support backend engineers with optimized query structures.

Collaborate with domain experts to understand data semantics and contextualization.

Required Qualifications
Bachelor’s or Master’s degree in Data Engineering, Computer Science, Information Systems, or related field.

Minimum 4+ years of professional experience in data engineering or data infrastructure roles.

Proven experience building production-grade data pipelines.

Strong experience working with structured and semi-structured datasets.

Experience handling large-scale datasets and multi-source integration.

Strong understanding of database optimization and data modeling principles.

Technical Stack Expectations
Programming and Data Processing
Python (advanced proficiency)

Pandas and NumPy

Experience with PySpark or distributed processing frameworks (preferred)

Databases and Storage
PostgreSQL (strongly preferred)

Experience with relational database design and optimization

Familiarity with spatial databases (PostGIS preferred)

Redis (caching)

Experience with data warehouses such as BigQuery or Snowflake (advantage)

Geospatial Technologies
GeoPandas

GDAL / Fiona

Shapely

Experience handling GIS datasets and spatial joins

Familiarity with ArcGIS or QGIS APIs (advantage)

ETL and Orchestration
Experience with workflow orchestration tools (e.g., Airflow or similar)

ETL pipeline design and automation

Experience with data validation frameworks

Cloud and Infrastructure
Experience with GCP, Azure, or AWS

Cloud storage and managed database services

Docker and containerized data services

CI/CD integration for data workflows

Security and Compliance
Understanding of data encryption standards

Role-based access controls

Data anonymization techniques (where required)

Familiarity with regulated data environments is a plus

Preferred Additional Competencies
Experience in real estate technology, fintech, or investment analytics platforms

Familiarity with financial modeling data structures

Experience working in AI-driven systems

Exposure to large document ingestion pipelines
View Original Listing
Unlock AI Intelligence, score breakdowns, and real-time alerts
Upgrade to Pro — $29.99/mo
Client

Spent: $70,443.83Rating: 5.0Verified