AI-Native Foundation

RAG Architecture

Query → LLM Gateway (LangGraph) → Embed (E5-large) + Retrieve (OpenSearch) → Prompt + PII Filter → LLM → Response + Audit.

Component	Technology
Embedding	E5-large (English), BGE-M3 (multilingual)
Vector Store	OpenSearch
Retriever	Hybrid dense+sparse
LLM	On-prem Llama 3.1 70B / Claude API
PII Filter	Custom regex + NER

On-Prem: Llama 3.1 70B via vLLM. No data leaves infrastructure.

API: Claude/GPT-4 with PII stripped. Complex reasoning tasks.

Hybrid: On-prem for PII tasks, API for general knowledge. Router decides.

Input: Prompt injection detection, PII masking, rate limiting, consent verification
Output: PII leak detection, hallucination checking, response limits, toxicity filtering
Operational: Human-in-the-loop, approval workflows, full audit logging