This post documents the implementation of a Retrieval-Augmented Generation (RAG) system for gold market intelligence, running entirely on a homelab Kubernetes cluster with GPU acceleration.
The Goal Build a self-hosted AI system that:
Ingests gold market data from multiple sources (FRED, GoldAPI, RSS feeds) Stores embeddings in a vector database Provides natural language query capabilities using a local LLM Runs on an NVIDIA RTX 5070 Ti GPU Architecture ┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐ │ Data Ingestion │───▶│ Embedding Service │───▶│ Qdrant │ │ (CronJobs) │ │ (nomic-embed-text) │ │ (Vector Store) │ └─────────────────────┘ └─────────────────────┘ └──────────┬──────────┘ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │ Query Service │◀───│ Ollama │◀──────────────┘ │ (RAG API + UI) │ │ (Llama 3.1 8B) │ └─────────────────────┘ └─────────────────────┘ │ │ │ ┌──────┴──────┐ ▼ │ RTX 5070 Ti │ Web UI @ :80 │ (16GB) │ └─────────────┘ Components Component Purpose Image Ollama LLM inference (Llama 3.1 8B) + embeddings (nomic-embed-text) ollama/ollama Qdrant Vector database for storing embeddings qdrant/qdrant Data Ingestion CronJobs fetching from FRED, GoldAPI, RSS Custom Python/FastAPI Embedding Service Converts text to vectors, stores in Qdrant Custom Python/FastAPI Query Service RAG pipeline + web UI Custom Python/FastAPI Data Sources Source Data Schedule FRED Gold price history, CPI, Fed Funds Rate, 10Y Treasury, USD Index Every 6 hours GoldAPI.io Real-time XAU/USD spot price Hourly RSS Feeds Market news from Investing.com Every 4 hours Implementation Repository Structure gold-intelligence/ ├── .gitlab-ci.yml ├── services/ │ ├── data-ingestion/ │ │ ├── Dockerfile │ │ ├── requirements.txt │ │ └── src/ │ │ ├── main.py │ │ └── collectors/ │ │ ├── fred.py │ │ ├── gold_api.py │ │ └── news_rss.py │ ├── embedding-service/ │ │ ├── Dockerfile │ │ ├── requirements.txt │ │ └── src/ │ │ ├── main.py │ │ ├── embedder.py │ │ └── qdrant_client.py │ └── query-service/ │ ├── Dockerfile │ ├── requirements.txt │ └── src/ │ ├── main.py │ ├── rag_pipeline.py │ ├── ollama_client.py │ └── static/ # Web UI ├── helm/ │ ├── data-ingestion/ │ ├── embedding-service/ │ ├── query-service/ │ ├── ollama-values.yaml │ └── qdrant-values.yaml └── kubernetes/ └── argocd/ Ollama Configuration The key to GPU acceleration is the runtimeClassName: nvidia in the Helm values:
...