47

Python Developer — Rule-Based Text Parser (No ML)

UpworkUnited States$700fixedintermediate
PythonPython ScriptAutomation
I need a Python developer to debug and improve an existing rule-based text parser. The parser extracts verb|complement pairs from occupational task statements. Example: "Direct, plan, or implement policies, objectives, or activities" should yield: direct|policies, plan|policies, implement|policies, direct|objectives, plan|objectives, implement|objectives, direct|activities, plan|activities, implement|activities The parser produces a Cartesian product of all leading verbs against all direct object complements. Requirements: * Pure Python only — no spaCy, no NLTK, no ML libraries of any kind * All parsing must be deterministic and rule-based * Existing codebase provided (approximately 400 lines of Python) * Input: 19,530 structured task statements in Excel * Output: verb|complement pairs in Excel The problem: The current extractor has bugs producing wrong complements in edge cases. Rules exist but are not being applied correctly in all cases. Deliverables: * A corrected main extraction script for the bulk of task statements (currently v9) * A second-pass script for the or-without-comma condition (approximately 3,055 tasks) * A second-pass script for the in-location-tail condition (approximately 647 tasks) * A test suite of known-correct cases verified against expected verb|complement output All scripts produce accurate verb|complement extractions across the full corpus, with a test suite of known-correct cases verified against Reed-Kellogg sentence diagramming rules. All scripts pure Python, deterministic, no ML. You must be comfortable with: * Formal grammar and string parsing * Regular expressions * Debugging someone else's Python Scripts MUST produce output conforming to Reed-Kellogg sentence diagramming rules. NO ML libraries of any kind.
View Original Listing
Unlock AI intelligence, score breakdowns, and real-time alerts
Upgrade to Pro — $29.99/mo