47
Python Developer — Rule-Based Text Parser (No ML)
UpworkUnited States$700fixedintermediate
PythonPython ScriptAutomation
I need a Python developer to debug and improve an existing rule-based text parser. The parser extracts verb|complement pairs from occupational task statements.
Example: "Direct, plan, or implement policies, objectives, or activities" should yield:
direct|policies, plan|policies, implement|policies, direct|objectives, plan|objectives, implement|objectives, direct|activities, plan|activities, implement|activities
The parser produces a Cartesian product of all leading verbs against all direct object complements.
Requirements:
* Pure Python only — no spaCy, no NLTK, no ML libraries of any kind
* All parsing must be deterministic and rule-based
* Existing codebase provided (approximately 400 lines of Python)
* Input: 19,530 structured task statements in Excel
* Output: verb|complement pairs in Excel
The problem: The current extractor has bugs producing wrong complements in edge cases. Rules exist but are not being applied correctly in all cases.
Deliverables:
* A corrected main extraction script for the bulk of task statements (currently v9)
* A second-pass script for the or-without-comma condition (approximately 3,055 tasks)
* A second-pass script for the in-location-tail condition (approximately 647 tasks)
* A test suite of known-correct cases verified against expected verb|complement output
All scripts produce accurate verb|complement extractions across the full corpus, with a test suite of known-correct cases verified against Reed-Kellogg sentence diagramming rules.
All scripts pure Python, deterministic, no ML.
You must be comfortable with:
* Formal grammar and string parsing
* Regular expressions
* Debugging someone else's Python
Scripts MUST produce output conforming to Reed-Kellogg sentence diagramming rules.
NO ML libraries of any kind.
Unlock AI intelligence, score breakdowns, and real-time alerts
Upgrade to Pro — $29.99/mo