Mochi is a self-hosted algorithmic trading backtest platform. It takes a stock ticker and date range, downloads historical market data, runs trading strategy simulations across multiple parameter combinations, aggregates the results with Trino, generates statistical graphs with R, and presents everything through a React dashboard. The entire pipeline runs on a homelab Kubernetes cluster orchestrated by Argo Workflows.
Architecture Overview
┌──────────────────────────────────────────────────────────────────────────┐
│ dashboard.minoko.life │
│ React + Vite + TypeScript │
│ S3 client (MinIO) ─── Backtest API client │
└──────────────┬───────────────────────────────┬───────────────────────────┘
│ │
▼ ▼
┌──────────────────────┐ ┌──────────────────────────┐
│ MinIO (S3 API) │ │ Backtest API (FastAPI) │
│ s3.minoko.life │ │ backtest-api.minoko.life│
│ 13 buckets │ │ POST /backtest │
└──────────┬───────────┘ └────────────┬─────────────┘
│ │
│ ┌───────────────────▼──────────────────┐
│ │ Argo Workflows │
│ │ workflows.minoko.life │
│ │ │
│ │ Phase 1: backtest-pipeline │
│ │ ┌────────┐ ┌──────────┐ ┌────────┐ │
│ │ │Polygon │→│Enhancer │→│Metadata│ │
│ │ │Python │ │Kotlin │ │Python │ │
│ │ └────────┘ └──────────┘ └───┬────┘ │
│ │ │ │
│ │ Phase 2: per scenario (x N) │ │
│ │ ┌──────────────────────────┐│ │
│ │ │mochi-trades (Java) ││ │
│ │ │ │ ││ │
│ │ │sync-partitions (Trino) ││ │
│ │ │ │ ││ │
│ │ │mochi-aggregate (Java) ││ │
│ │ │ │ ││ │
│ │ │ ┌──┴──┬──────┐ ││ │
│◀─────────────│──│ │years│stops │best- ││ │
│ results │ │ │ .r │ .r │traders.r││ │
│ │ │ └─────┴──────┴────┬────┘│ │
│ │ │ trade-extract (Kt) │ ││ │
│ │ │ py-trade-lens (Py) │ ││ │
│ │ │ trade-summary (Py) │ ││ │
│ │ └──────────────────────────┘│ │
│ └──────────────────────────────┘ │
│ │
│ ┌──────────────────────────────┐ │
└────────▶│ Trino (Coordinator + Worker) │◀──────────┘
│ Hive Metastore + Postgres │
│ 100GB worker for aggregation │
└──────────────────────────────┘
Components
| Component | Language | Purpose | Image |
|---|---|---|---|
| mochi-dashboard | TypeScript/React | UI, S3 browsing, backtest submission | harbor.minoko.life/mochi/mochi-dashboard |
| backtest-api | Python/FastAPI | Accepts backtest requests, creates Argo Workflows | harbor.minoko.life/mochi/backtest-api |
| polygon | Python | Downloads historical data from Polygon.io API | harbor.minoko.life/mochi/polygon |
| trade-data-enhancer | Kotlin | Calculates ATR and technical indicators | harbor.minoko.life/mochi/trade-data-enhancer |
| data-metadata | Python | Generates scenario parameter combinations for Phase 2 | harbor.minoko.life/mochi/data-metadata |
| mochi-java | Java 21 | Core trading simulation engine and Trino aggregation | harbor.minoko.life/mochi/mochi-java |
| r-graphs | R | Statistical visualizations (years, stops, best traders) | harbor.minoko.life/mochi/r-graphs |
| trade-extract | Kotlin | Extracts individual trades from aggregated results | harbor.minoko.life/mochi/trade-extract |
| py-trade-lens | Python | Trade analysis and insights | harbor.minoko.life/mochi/py-trade-lens |
| trade-summary | Python | Final result summarization | harbor.minoko.life/mochi/trade-summary |
Five languages across ten containerized services.
Infrastructure
The platform runs on a Kubernetes 1.35 cluster with the following nodes:
| Node | Role | OS | Notes |
|---|---|---|---|
| k8s-master01 | Control plane | CentOS Stream 9 | |
| k8s-worker01 | Worker | CentOS Stream 9 | |
| k8s-worker02 | Worker | CentOS Stream 9 | |
| polycephala | Worker | Rocky Linux 10.1 | Primary workload node, large memory |
| minis-enp195s0 | Worker (DMZ) | Fedora 43 |
Polycephala handles all backtest workloads. The Phase 2 pipeline requires up to 30GB JVM heap for the Java trading engine and a 100GB Trino worker for aggregation queries.
Supporting Services
| Service | Namespace | Purpose |
|---|---|---|
| MinIO | minio | S3-compatible object storage (13 buckets) |
| Trino | trino | SQL query engine replacing AWS Athena |
| Harbor | harbor | Private container registry |
| Argo Workflows | argo | Workflow orchestration |
| ArgoCD | argocd | GitOps deployment |
| Linkerd | cluster-wide | Service mesh |
| Nginx Ingress | ingress-nginx | TLS termination and routing |
Data Flow
Phase 1: Data Pipeline
A backtest request starts at the dashboard. The user specifies a ticker symbol (e.g., AAPL, C:XAUUSD), a date range, and strategy parameters (ATR periods, alpha, trade duration, timeout).
The backtest API receives the request, generates a unique group tag (e.g., grape-koala-20260127122439), uploads the parameters as JSON to MinIO, and creates an Argo Workflow from the backtest-pipeline template.
Phase 1 runs three steps sequentially:
polygon – Downloads minute, hour, and day resolution data from the Polygon.io API. Data is LZO-compressed and uploaded to
mochi-prod-raw-historical-data.trade-data-enhancer – Reads the raw data, calculates ATR (Average True Range) with configurable short and long periods, and writes enhanced data to
mochi-prod-prepared-historical-data.data-metadata – Analyzes the enhanced data and generates multiple scenario parameter combinations (stop loss ranges, limit ranges, tick offsets). Each scenario defines a different strategy configuration to test. The output is a JSON array of scenarios that triggers Phase 2.
Phase 2: Per-Scenario Processing
Phase 2 runs independently for each scenario generated by data-metadata. With parallelism: 1 at the workflow level, scenarios execute one at a time to stay within the node’s memory budget.
Each scenario follows this DAG:
mochi-trades Execute trading simulation (Java, 30GB heap)
│
sync-partitions Drop/recreate Hive tables, sync partition metadata
│
mochi-aggregate Run 20+ CTE aggregation query via Trino (100GB worker)
│
├── r-graphs-years Annual performance graphs
├── r-graphs-stops Stop loss analysis graphs
└── r-graphs-besttraders Top performer ranking
│
trade-extract Extract individual trades (Kotlin)
│
py-trade-lens Trade analysis (Python)
│
trade-summary Final summarization (Python)
The three R graph scripts run in parallel after aggregation. The remaining steps are sequential, with trade-extract gated on the best-traders graph completing.
Bucket Architecture
Each pipeline step reads from one bucket and writes to another:
| Bucket | Purpose | Written By |
|---|---|---|
mochi-prod-backtest-params | Backtest configuration JSON | backtest-api |
mochi-prod-raw-historical-data | Raw Polygon data (LZO compressed) | polygon |
mochi-prod-prepared-historical-data | Enhanced data with indicators | trade-data-enhancer |
mochi-prod-ticker-meta | Ticker metadata | data-metadata |
mochi-prod-backtest-traders | Trader configurations | mochi-trades |
mochi-prod-backtest-trades | Raw trade simulation results | mochi-trades |
mochi-prod-aggregated-trades | Aggregated trade statistics | mochi-aggregate |
mochi-prod-aggregated-trades-staging | Staging area for aggregation | mochi-aggregate |
mochi-prod-summary-graphs | R-generated graphs and CSVs | r-graphs |
mochi-prod-trade-extracts | Extracted trade data | trade-extract |
mochi-prod-final-trader-ranking | Final ranked results | trade-summary |
mochi-prod-live-trades | Live trading configurations | dashboard |
mochi-prod-portfolio-tracking | Portfolio tracking data | dashboard |
Kubernetes Configuration
Argo Workflow Templates
Two WorkflowTemplates in the mochi-backtest namespace:
- backtest-pipeline – Phase 1 (sequential steps: init-params, polygon, trade-data-enhancer, data-metadata)
- phase2-scenario-pipeline – Phase 2 (DAG per scenario with all processing steps)
Each step in the workflow uses envFrom to load both the aws-credentials secret (MinIO access keys) and a step-specific ConfigMap. There are ten ConfigMaps, one per step, keeping configuration explicit:
init-params-env polygon-env trade-data-enhancer-env
data-metadata-env mochi-trades-env mochi-aggregate-env
r-graphs-env trade-extract-env py-trade-lens-env
trade-summary-env
Resource Requirements
The pipeline is memory-intensive. The heaviest steps:
| Step | CPU Request | Memory | Notes |
|---|---|---|---|
| mochi-trades | 4 | 30Gi | Java 21, 30GB JVM heap, parallel GC |
| mochi-aggregate | 4 | 30Gi | Java client for Trino aggregation |
| Trino worker | - | 100Gi | Single worker, 96GB JVM heap |
| r-graphs (each) | 2 | 16Gi | Three instances run in parallel |
A 10-year forex backtest (e.g., C:XAUUSD) generates enough data that the Trino aggregation query – with 20+ CTEs and 400+ execution stages – requires the full 100GB worker allocation.
Trino Setup
Trino replaces AWS Athena for the aggregation step. It runs as a coordinator + single worker in the trino namespace:
- Coordinator: Schedules queries, 500 max stages, phased execution policy
- Worker: 100Gi memory, 96GB JVM heap (
-Xmx96G), G1GC - Hive Metastore: Postgres-backed, manages table metadata for MinIO-hosted data
- Catalog: Hive connector with
forcePathStylepointing tominio.minio.svc.cluster.local:9000
The sync-partitions step in Phase 2 drops and recreates Hive external tables before each aggregation, pointing them at the S3 paths where mochi-trades wrote its output. This ensures Trino sees the latest partition data.
CI/CD Pipeline
All components follow the same pattern:
- Code is pushed to a GitLab repository
- GitLab CI triggers a Kaniko build (using the in-cluster GitLab runner)
- The Docker image is pushed to Harbor (
harbor.minoko.life/mochi/<component>) - For the dashboard: ArgoCD detects the Helm chart change and syncs the deployment
The GitLab runner executes on Kubernetes with the kubernetes executor, running on polycephala with a disk-pressure toleration. Build cache is stored in a MinIO bucket (gitlab-runner-cache).
Dashboard Deployment
The dashboard has a Helm chart managed by ArgoCD:
mochi-dashboard/
├── helm/mochi-dashboard/
│ ├── Chart.yaml
│ ├── values.yaml # Ingress, resources, replicas
│ ├── values-prod.yaml # Production overrides
│ └── templates/
│ ├── deployment.yaml
│ ├── ingress.yaml
│ ├── service.yaml
│ └── serviceaccount.yaml
ArgoCD watches the HEAD of the GitLab repository and auto-syncs with self-heal enabled. Image tags are set as Helm parameters in the ArgoCD Application resource.
Dashboard
The React dashboard provides:
- Stock tree view – Browse backtest results by symbol, with nested strategies
- Strategy visualization – View graphs and statistics for a selected strategy
- Backtest submission – Ticker search, date range presets (1-30 years), ATR parameter configuration
- Live trades management – CRUD operations on per-broker trade configurations stored in MinIO
- Broker management – Add/remove brokers
- SEC filings viewer – Browse SEC filing data
The frontend communicates directly with MinIO via the AWS S3 SDK (using forcePathStyle: true and static credentials). Pre-signed URLs provide temporary access to graphs and CSV files. The backtest API is called directly for workflow submission.
Network Path
Browser → dashboard.minoko.life (nginx ingress, TLS)
→ s3.minoko.life (MinIO S3 API, TLS)
→ backtest-api.minoko.life (FastAPI, TLS)
All endpoints use the same Let’s Encrypt wildcard certificate (*.minoko.life).
Ticker Sanitization
Forex and crypto tickers from Polygon use colons in their symbols (e.g., C:XAUUSD for gold). Colons break Trino’s URI parsing because it interprets C: as a URI scheme like http:.
The fix sanitizes ticker symbols by replacing colons with underscores (C:XAUUSD -> C_XAUUSD) at multiple points:
| Component | Where |
|---|---|
| mochi-java | TickerSanitizer.java utility class |
| mochi-java | S3UploaderImpl.java (S3 partition paths) |
| mochi-java | TradeYearFetcher.java (S3 prefix lookups) |
| mochi-java | TrinoAggregator.java (SQL queries and output filenames) |
| data-metadata | job_placer.py (before passing to Phase 2) |
This ensures partition paths like mochi_trades/run_id=.../symbol=C_XAUUSD/... are URI-safe.
Lessons Learned
Trino memory requirements are hard to predict. The aggregation query with 20+ CTEs and 400+ stages failed repeatedly at lower memory configurations (48GB, 64GB, 80GB per worker). A single 100GB worker succeeded where two smaller workers could not, because the query’s intermediate state doesn’t distribute well.
NetworkPolicies catch you by surprise. MinIO had a default-deny policy with an allowlist of namespaces for port 9000. When adding the S3 API ingress (s3.minoko.life), requests from the nginx ingress controller were blocked because ingress-nginx wasn’t in the allowlist. The Linkerd service mesh proxy made this harder to diagnose – the error manifested as l5d-proxy-error: connect timed out rather than a clear connection refused.
ConfigMap-per-step keeps things manageable. With ten pipeline steps across two workflow phases, sharing ConfigMaps with per-step overrides would be fragile. Each step having its own ConfigMap makes it explicit what environment each container sees.
One scenario at a time. With mochi-trades needing 30GB and the Trino worker needing 100GB, running multiple scenarios in parallel would exceed the node’s memory. The workflow sets parallelism: 1 to process scenarios sequentially.