Data Engineer

UpworkGBNot specifiedexpertScore: 69
ETL PipelinePythonData Science
Data Engineer Data Engineer - AI-Powered Real Estate Feasibility & Investment Intelligence Platform Project Overview We are developing a large-scale AI-powered real estate feasibility and investment intelligence platform designed to automate and enhance land and development opportunity evaluation. The platform integrates diverse data sources, including: Geospatial datasets (GIS layers, zoning maps, satellite imagery) Real estate transaction and pricing data Government planning and regulatory documents Construction cost benchmarks and engineering data Demographic and economic indicators Historical feasibility studies and financial models The system will transform fragmented and heterogeneous data into structured intelligence that supports: Land-use recommendation Investment performance forecasting (ROI, IRR, NPV, payback period) Risk assessment and sensitivity modeling Scenario comparison Automated feasibility report generation The Data Engineer will be responsible for building and managing the data infrastructure that powers the entire AI decision engine. Role Overview The Data Engineer will design, implement, and maintain scalable data pipelines and data architecture to support machine learning models, geospatial analytics, and financial simulations. This role requires strong experience in handling both structured and unstructured data, building ETL/ELT workflows, and integrating multiple external data sources into a unified data environment. Key Responsibilities Data Architecture Design Design the end-to-end data architecture supporting AI, analytics, and reporting modules. Define data schemas for feasibility cases, financial projections, zoning constraints, and geospatial attributes. Establish data modeling standards for structured and semi-structured data. Implement data versioning strategies for feasibility studies and model outputs. Data Ingestion and Integration Build data pipelines for ingesting structured and unstructured data from multiple sources. Integrate APIs from government datasets, GIS platforms, transaction portals, and financial benchmarks. Design ingestion workflows for document-based inputs (PDFs, scanned reports, spreadsheets). Implement data validation and cleansing processes. Geospatial Data Handling Ingest and process GIS layers and zoning overlays. Implement geospatial feature engineering pipelines. Manage spatial indexing and coordinate systems. Integrate satellite and map-based data with structured financial datasets. ETL / ELT Development Design automated ETL pipelines for transforming raw inputs into AI-ready datasets. Build transformation workflows to support financial modeling and demand forecasting. Implement data aggregation and feature engineering logic for machine learning models. Ensure reliability and performance of recurring data refresh cycles. Mobile and Behavioral Data (If Applicable) Design data ingestion strategies for anonymized mobility or movement datasets. Structure time-series and spatial movement data for AI consumption. Ensure privacy-preserving processing and compliance where required. Data Quality and Governance Implement data quality checks and anomaly detection. Ensure consistency across integrated datasets. Maintain documentation of data lineage and transformations. Establish access control and governance standards. Performance and Scalability Optimize database performance for high-volume queries. Implement indexing and caching strategies. Design scalable cloud-native data architecture. Ensure high availability and fault tolerance. Collaboration Work closely with AI engineers to prepare model-ready datasets. Support backend engineers with optimized query structures. Collaborate with domain experts to understand data semantics and contextualization. Required Qualifications Bachelor’s or Master’s degree in Data Engineering, Computer Science, Information Systems, or related field. Minimum 4+ years of professional experience in data engineering or data infrastructure roles. Proven experience building production-grade data pipelines. Strong experience working with structured and semi-structured datasets. Experience handling large-scale datasets and multi-source integration. Strong understanding of database optimization and data modeling principles. Technical Stack Expectations Programming and Data Processing Python (advanced proficiency) Pandas and NumPy Experience with PySpark or distributed processing frameworks (preferred) Databases and Storage PostgreSQL (strongly preferred) Experience with relational database design and optimization Familiarity with spatial databases (PostGIS preferred) Redis (caching) Experience with data warehouses such as BigQuery or Snowflake (advantage) Geospatial Technologies GeoPandas GDAL / Fiona Shapely Experience handling GIS datasets and spatial joins Familiarity with ArcGIS or QGIS APIs (advantage) ETL and Orchestration Experience with workflow orchestration tools (e.g., Airflow or similar) ETL pipeline design and automation Experience with data validation frameworks Cloud and Infrastructure Experience with GCP, Azure, or AWS Cloud storage and managed database services Docker and containerized data services CI/CD integration for data workflows Security and Compliance Understanding of data encryption standards Role-based access controls Data anonymization techniques (where required) Familiarity with regulated data environments is a plus Preferred Additional Competencies Experience in real estate technology, fintech, or investment analytics platforms Familiarity with financial modeling data structures Experience working in AI-driven systems Exposure to large document ingestion pipelines
View Original Listing
Unlock AI Intelligence, score breakdowns, and real-time alerts
Upgrade to Pro — $29.99/mo

Client

Spent: $70,443.83Rating: 5.0Verified