93

AI + Data Automation Project: Build Enriched School Directory Dataset (2,400 Schools)

UpworkUSNot specifiedintermediate
Data ScrapingPythonWeb ScrapingData ExtractionAutomationArtificial IntelligenceData Processing
Project Overview We are building a comprehensive online directory of schools in Indiana and are seeking a developer to help automate the enrichment of a base dataset. We currently have a spreadsheet of approximately 2,400 schools sourced from the Indiana Department of Education that includes: • School name • Address • City / State / ZIP • Website URL (present for ~95% of schools) • Basic identifying information The goal of this project is to programmatically enrich the dataset, generate standardized content for each school, and collect associated images so the data can be imported into a website directory. This is a fixed-scope automation project, not a manual research task. ⸻ Technical Context To help clarify scope: • A base spreadsheet of ~2,400 schools will be provided • ~95% already include school website URLs • Many school logos already exist on another website and can likely be reused • The objective is to build an automated pipeline, not manual data entry • Public datasets may be used to populate structured fields before scraping school websites Developers may use any combination of: • public education datasets • web scraping • LLM extraction • automation pipelines ⸻ Scope of Work 1. Dataset Enrichment Using school websites and publicly available education datasets, populate additional structured fields where available. Examples include: • Grade levels served • Enrollment size • School type (public, charter, private) • Academic highlights or indicators • Student demographics (if available) • Extracurricular programs • Updated contact information Public datasets (DOE, NCES, etc.) may be used to populate factual fields where possible. School websites should then be used to extract descriptive information. ⸻ 2. AI Content Generation Generate four original content sections (~180–220 words each) for each school using the collected information. Sections include: 1. School Overview 2. Academics & Performance 3. Student Life & Activities 4. Community & Student Body Requirements: • Content must be original and not copied from school websites • Informational and neutral tone suitable for a directory • Consistent formatting across schools ⸻ 3. Image Collection School Logos Collect a square logo image for each school. Note: There is an existing website containing a large number of school logos already compiled, which can likely be used as a primary source. Banner Images Collect a landscape banner image representing the school (campus, building, students, etc.) where available. Images should be web quality and reasonably cropped. ⸻ 4. Data Formatting Final dataset should be structured and ready for import into a CMS directory. Preferred formats: • Excel (.xlsx) • CSV • or structured JSON Each school record should include: • Original dataset fields • Enriched data fields • Four generated content sections • Logo image or URL • Banner image or URL ⸻ Dataset Size Approximately 2,400 schools ⸻ Deliverables 1. Enriched dataset for all schools 2. Four generated content sections per school 3. Logo images or URLs 4. Banner images or URLs 5. Clean spreadsheet ready for CMS import ⸻ Timeline Preferred completion within 2–4 weeks. ⸻ Ideal Experience We are looking for developers experienced with: • data scraping and automation • dataset enrichment pipelines • AI content generation workflows • Python-based data processing Experience with education datasets or directory websites is a plus. ⸻ Proposal Requirements Please include: 1. A brief description of your technical approach 2. Estimated timeline 3. Fixed project cost 4. Example of a similar scraping or data automation project ⸻ Important To confirm you have read the project description, please begin your proposal by answering this question: Given that we already have website URLs for ~95% of schools, what approach would you use to efficiently extract structured information from those sites? Generic proposals that do not answer this question will not be considered.
View Original Listing
Unlock AI intelligence, score breakdowns, and real-time alerts
Upgrade to Pro — $29.99/mo