
Type: Long-term contract
Location: Fully remote
We are partnering with a major media holding to transform their workflow with thousands of legal and financial documents. The goal is to build an intelligent assistant for deep document analysis and interaction, moving far beyond simple classification.
Responsibilities:
Design, build, and optimize a production-grade Retrieval-Augmented Generation (RAG) system to ground LLM responses in the client's proprietary document repositories.
Fine-tune and manage open-source or cloud-based LLMs using techniques like LoRA or prompt engineering to excel at domain-specific tasks (legal/financial analysis).
Develop, containerize, and deploy robust API endpoints to integrate LLM capabilities seamlessly into the client's existing SaaS platforms and internal tools.
Architect and implement monitoring for LLM performance, cost, and latency; optimize inference pipelines using techniques like model quantization and caching.
Establish guardrail systems to evaluate output accuracy, mitigate hallucinations, and maintain audit trails of model decisions for compliance.
Requirements:
Experience with NLP and Large Language Models (3+ years in production settings).
Experience with LLM frameworks and orchestration tools (LangChain, LlamaIndex, Hugging Face transformers).
Practical knowledge of vector databases (Pinecone, Weaviate, pgvector) and embedding strategies.
Experience with Python and experience building and deploying scalable APIs (5+ years in production settings).
Familiarity with cloud platforms (AWS, GCP, Azure) for MLOps.
Differentiator:
A proven track record of successfully deploying a RAG system or LLM-powered application to production.
Experience implementing evaluation frameworks and metrics for generative AI outputs.