Senior AI Engineer for Private LLM Stack, Vision Analysis, and Guided Search Chat

UpworkUSNot specifiedexpert
Web ApplicationAI App DevelopmentAI DevelopmentMachine LearningArtificial IntelligenceNatural Language ProcessingReactNode.js
We are looking for a strong AI engineer / applied LLM developer to help build a private AI layer for a consumer-facing real estate platform.

This is not a toy chatbot project.

We are building a serious AI-assisted product that combines:
	•	private LLM inference
	•	listing text extraction
	•	photo signal extraction
	•	structured output generation
	•	safe summary generation
	•	guided search chat
	•	deterministic scoring support
	•	future learning data capture

Important:
This project is not about letting AI “wing it.”
The system has strict architecture boundaries.

AI must help with:
	•	signal extraction
	•	structured interpretations
	•	safe summaries
	•	conversational packaging

AI must not:
	•	invent listing data
	•	compute deterministic scores
	•	bypass the backend
	•	access restricted data through unauthorized paths

We already have a strong product direction, backend, and UX work underway. We need someone who can help design and implement the AI layer correctly.

What we are building:

A private AI architecture with:
	•	local / private model serving
	•	text extraction from broker remarks
	•	photo / vision interpretation for listings
	•	structured JSON outputs
	•	safe normalized listing objects
	•	guided search chat on top of deterministic MLS retrieval
	•	support for future learning datasets and model improvement
	•	likely local model stack based on Llama
	•	Ollama for dev/testing
	•	vLLM for production serving

What we need from you:
	•	Help implement a private AI inference layer using open-weight models
	•	Work with local LLM and local vision model workflows
	•	Build or refine structured prompt pipelines for extraction and explanation
	•	Help integrate model serving into an existing backend/API stack
	•	Ensure outputs are deterministic-friendly and structured
	•	Support chat-style orchestration where AI interprets but does not own truth
	•	Help design for model versioning, prompt versioning, and output metadata
	•	Help set up production-ready inference flow using vLLM
	•	Help with practical model selection, local testing, and deployment recommendations
	•	Keep the system private-first and architecture disciplined

Likely work areas:
	•	local model inference
	•	Llama model setup
	•	Ollama for development
	•	vLLM for production serving
	•	JSON-constrained output generation
	•	prompt engineering for extraction tasks
	•	vision model integration
	•	safe object packaging for downstream chat
	•	backend integration
	•	future fine-tuning / learning dataset readiness

Must-have experience:
	•	open-weight LLM deployment
	•	Llama or similar model family
	•	vLLM and/or Ollama
	•	structured prompt pipelines
	•	JSON output enforcement
	•	backend/API integration
	•	practical experience with model serving, not just notebook demos
	•	strong understanding of inference vs training vs fine-tuning
	•	ability to work within strict product/architecture constraints

Nice to have:
	•	multimodal / vision model experience
	•	experience with real estate, marketplace, search, or listing-style products
	•	experience with evaluation pipelines
	•	experience with prompt versioning and model metadata tracking
	•	experience with fine-tuning, LoRA, or training platforms
	•	experience with safe private deployments and sensitive-data workflows

Important architecture principles:
	•	private layer handles sensitive data
	•	frontier chat layer only sees safe normalized objects
	•	AI extracts signals and explains outputs
	•	deterministic scoring remains outside AI
	•	no hallucinated listings
	•	no direct AI ownership of product truth
	•	model outputs must be structured and reproducible enough for downstream systems

Example tasks you may work on:
	•	set up local text model for broker remark extraction
	•	set up local vision model for listing photo interpretation
	•	return structured outputs like renovation cues, design recency, natural light, curb appeal, modernization gap
	•	generate safe summaries for downstream use
	•	support guided search chat that sits on top of server-side retrieval
	•	help create a clean AI service boundary the backend can call
	•	advise on best deployment path for private inference
	•	help design storage of model name, prompt version, output metadata, and timestamps

What success looks like:
	•	local/private AI stack working cleanly
	•	text and vision outputs reliable and structured
	•	backend can call AI services predictably
	•	prompts are maintainable and versioned
	•	production serving path is clear
	•	chat layer is cleanly separated from sensitive raw data
	•	no architecture shortcuts that create compliance or data-boundary problems
	•	good technical judgment, not just model enthusiasm

To apply, please send:
	1.	Examples of private LLM or local model deployments you worked on
	2.	Which open-weight models you have actually deployed yourself
	3.	Your experience with vLLM, Ollama, and model-serving infrastructure
	4.	Your experience with multimodal / vision workflows
	5.	Examples of structured JSON extraction pipelines you built
	6.	Your experience integrating AI into an existing backend
	7.	Whether you have worked on products where AI had to stay inside strict business or compliance boundaries
	8.	Your availability over the next 4–8 weeks

Please also answer these questions:
	1.	If you were building a private AI layer for a product that cannot send raw sensitive text/images to frontier APIs, how would you structure the inference stack?
	2.	How would you separate deterministic business logic from AI-assisted interpretation?
	3.	What is your preferred setup for local development vs production serving?
	4.	How would you enforce structured JSON outputs from both text and vision models?
	5.	What metadata would you store with each AI run so the system becomes more valuable over time?
	6.	When would you recommend fine-tuning versus staying with prompting only?
	7.	What mistakes do teams make when building chat on top of structured search systems?

Budget / engagement:

Open to milestone-based or hourly structure depending on experience and fit. We care much more about real applied experience and architectural discipline than flashy buzzwords.

Important note:

Please do not apply if your experience is mostly generic chatbot wrappers, low-code AI tools, or prompt-only demos without real deployment work.

We are looking for someone who understands the difference between:
	•	model
	•	inference
	•	training
	•	serving
	•	architecture
	•	product constraints

and can help build this correctly.
View Original Listing
Unlock AI intelligence, score breakdowns, and real-time alerts
Upgrade to Pro — $29.99/mo
Client

Spent: $7,293.75Rating: 0.0Verified