Mochi: An Algorithmic Trading Backtest Platform on Kubernetes

Mochi is a self-hosted algorithmic trading backtest platform. It takes a stock ticker and date range, downloads historical market data, runs trading strategy simulations across multiple parameter combinations, aggregates the results with Trino, generates statistical graphs with R, and presents everything through a React dashboard. The entire pipeline runs on a homelab Kubernetes cluster orchestrated by Argo Workflows.

Architecture Overview

┌──────────────────────────────────────────────────────────────────────────┐
│                         dashboard.minoko.life                           │
│                     React + Vite + TypeScript                           │
│              S3 client (MinIO) ─── Backtest API client                  │
└──────────────┬───────────────────────────────┬───────────────────────────┘
               │                               │
               ▼                               ▼
┌──────────────────────┐         ┌──────────────────────────┐
│   MinIO (S3 API)     │         │   Backtest API (FastAPI) │
│   s3.minoko.life     │         │   backtest-api.minoko.life│
│   13 buckets         │         │   POST /backtest          │
└──────────┬───────────┘         └────────────┬─────────────┘
           │                                  │
           │              ┌───────────────────▼──────────────────┐
           │              │         Argo Workflows               │
           │              │    workflows.minoko.life              │
           │              │                                      │
           │              │  Phase 1: backtest-pipeline           │
           │              │  ┌────────┐ ┌──────────┐ ┌────────┐ │
           │              │  │Polygon │→│Enhancer  │→│Metadata│ │
           │              │  │Python  │ │Kotlin    │ │Python  │ │
           │              │  └────────┘ └──────────┘ └───┬────┘ │
           │              │                              │      │
           │              │  Phase 2: per scenario (x N) │      │
           │              │  ┌──────────────────────────┐│      │
           │              │  │mochi-trades (Java)       ││      │
           │              │  │     │                    ││      │
           │              │  │sync-partitions (Trino)   ││      │
           │              │  │     │                    ││      │
           │              │  │mochi-aggregate (Java)    ││      │
           │              │  │     │                    ││      │
           │              │  │  ┌──┴──┬──────┐         ││      │
           │◀─────────────│──│  │years│stops │best-    ││      │
           │   results    │  │  │  .r │  .r  │traders.r││      │
           │              │  │  └─────┴──────┴────┬────┘│      │
           │              │  │  trade-extract (Kt) │    ││      │
           │              │  │  py-trade-lens (Py)  │    ││      │
           │              │  │  trade-summary (Py)  │    ││      │
           │              │  └──────────────────────────┘│      │
           │              └──────────────────────────────┘      │
           │                                                    │
           │         ┌──────────────────────────────┐           │
           └────────▶│  Trino (Coordinator + Worker) │◀──────────┘
                     │  Hive Metastore + Postgres    │
                     │  100GB worker for aggregation │
                     └──────────────────────────────┘

Components

Component	Language	Purpose	Image
mochi-dashboard	TypeScript/React	UI, S3 browsing, backtest submission	harbor.minoko.life/mochi/mochi-dashboard
backtest-api	Python/FastAPI	Accepts backtest requests, creates Argo Workflows	harbor.minoko.life/mochi/backtest-api
polygon	Python	Downloads historical data from Polygon.io API	harbor.minoko.life/mochi/polygon
trade-data-enhancer	Kotlin	Calculates ATR and technical indicators	harbor.minoko.life/mochi/trade-data-enhancer
data-metadata	Python	Generates scenario parameter combinations for Phase 2	harbor.minoko.life/mochi/data-metadata
mochi-java	Java 21	Core trading simulation engine and Trino aggregation	harbor.minoko.life/mochi/mochi-java
r-graphs	R	Statistical visualizations (years, stops, best traders)	harbor.minoko.life/mochi/r-graphs
trade-extract	Kotlin	Extracts individual trades from aggregated results	harbor.minoko.life/mochi/trade-extract
py-trade-lens	Python	Trade analysis and insights	harbor.minoko.life/mochi/py-trade-lens
trade-summary	Python	Final result summarization	harbor.minoko.life/mochi/trade-summary

Five languages across ten containerized services.

Infrastructure

The platform runs on a Kubernetes 1.35 cluster with the following nodes:

Node	Role	OS	Notes
k8s-master01	Control plane	CentOS Stream 9
k8s-worker01	Worker	CentOS Stream 9
k8s-worker02	Worker	CentOS Stream 9
polycephala	Worker	Rocky Linux 10.1	Primary workload node, large memory
minis-enp195s0	Worker (DMZ)	Fedora 43

Polycephala handles all backtest workloads. The Phase 2 pipeline requires up to 30GB JVM heap for the Java trading engine and a 100GB Trino worker for aggregation queries.

Supporting Services

Service	Namespace	Purpose
MinIO	`minio`	S3-compatible object storage (13 buckets)
Trino	`trino`	SQL query engine replacing AWS Athena
Harbor	`harbor`	Private container registry
Argo Workflows	`argo`	Workflow orchestration
ArgoCD	`argocd`	GitOps deployment
Linkerd	cluster-wide	Service mesh
Nginx Ingress	`ingress-nginx`	TLS termination and routing

Data Flow

Phase 1: Data Pipeline

A backtest request starts at the dashboard. The user specifies a ticker symbol (e.g., AAPL, C:XAUUSD), a date range, and strategy parameters (ATR periods, alpha, trade duration, timeout).

The backtest API receives the request, generates a unique group tag (e.g., grape-koala-20260127122439), uploads the parameters as JSON to MinIO, and creates an Argo Workflow from the backtest-pipeline template.

Phase 1 runs three steps sequentially:

polygon – Downloads minute, hour, and day resolution data from the Polygon.io API. Data is LZO-compressed and uploaded to mochi-prod-raw-historical-data.
trade-data-enhancer – Reads the raw data, calculates ATR (Average True Range) with configurable short and long periods, and writes enhanced data to mochi-prod-prepared-historical-data.
data-metadata – Analyzes the enhanced data and generates multiple scenario parameter combinations (stop loss ranges, limit ranges, tick offsets). Each scenario defines a different strategy configuration to test. The output is a JSON array of scenarios that triggers Phase 2.

Phase 2: Per-Scenario Processing

Phase 2 runs independently for each scenario generated by data-metadata. With parallelism: 1 at the workflow level, scenarios execute one at a time to stay within the node’s memory budget.

Each scenario follows this DAG:

mochi-trades          Execute trading simulation (Java, 30GB heap)
      │
sync-partitions       Drop/recreate Hive tables, sync partition metadata
      │
mochi-aggregate       Run 20+ CTE aggregation query via Trino (100GB worker)
      │
      ├── r-graphs-years         Annual performance graphs
      ├── r-graphs-stops         Stop loss analysis graphs
      └── r-graphs-besttraders   Top performer ranking
              │
         trade-extract           Extract individual trades (Kotlin)
              │
         py-trade-lens           Trade analysis (Python)
              │
         trade-summary           Final summarization (Python)

The three R graph scripts run in parallel after aggregation. The remaining steps are sequential, with trade-extract gated on the best-traders graph completing.

Bucket Architecture

Each pipeline step reads from one bucket and writes to another:

Bucket	Purpose	Written By
`mochi-prod-backtest-params`	Backtest configuration JSON	backtest-api
`mochi-prod-raw-historical-data`	Raw Polygon data (LZO compressed)	polygon
`mochi-prod-prepared-historical-data`	Enhanced data with indicators	trade-data-enhancer
`mochi-prod-ticker-meta`	Ticker metadata	data-metadata
`mochi-prod-backtest-traders`	Trader configurations	mochi-trades
`mochi-prod-backtest-trades`	Raw trade simulation results	mochi-trades
`mochi-prod-aggregated-trades`	Aggregated trade statistics	mochi-aggregate
`mochi-prod-aggregated-trades-staging`	Staging area for aggregation	mochi-aggregate
`mochi-prod-summary-graphs`	R-generated graphs and CSVs	r-graphs
`mochi-prod-trade-extracts`	Extracted trade data	trade-extract
`mochi-prod-final-trader-ranking`	Final ranked results	trade-summary
`mochi-prod-live-trades`	Live trading configurations	dashboard
`mochi-prod-portfolio-tracking`	Portfolio tracking data	dashboard

Kubernetes Configuration

Argo Workflow Templates

Two WorkflowTemplates in the mochi-backtest namespace:

backtest-pipeline – Phase 1 (sequential steps: init-params, polygon, trade-data-enhancer, data-metadata)
phase2-scenario-pipeline – Phase 2 (DAG per scenario with all processing steps)

Each step in the workflow uses envFrom to load both the aws-credentials secret (MinIO access keys) and a step-specific ConfigMap. There are ten ConfigMaps, one per step, keeping configuration explicit:

init-params-env          polygon-env              trade-data-enhancer-env
data-metadata-env        mochi-trades-env         mochi-aggregate-env
r-graphs-env             trade-extract-env        py-trade-lens-env
trade-summary-env

Resource Requirements

The pipeline is memory-intensive. The heaviest steps:

Step	CPU Request	Memory	Notes
mochi-trades	4	30Gi	Java 21, 30GB JVM heap, parallel GC
mochi-aggregate	4	30Gi	Java client for Trino aggregation
Trino worker	-	100Gi	Single worker, 96GB JVM heap
r-graphs (each)	2	16Gi	Three instances run in parallel

A 10-year forex backtest (e.g., C:XAUUSD) generates enough data that the Trino aggregation query – with 20+ CTEs and 400+ execution stages – requires the full 100GB worker allocation.

Trino Setup

Trino replaces AWS Athena for the aggregation step. It runs as a coordinator + single worker in the trino namespace:

Coordinator: Schedules queries, 500 max stages, phased execution policy
Worker: 100Gi memory, 96GB JVM heap (-Xmx96G), G1GC
Hive Metastore: Postgres-backed, manages table metadata for MinIO-hosted data
Catalog: Hive connector with forcePathStyle pointing to minio.minio.svc.cluster.local:9000

The sync-partitions step in Phase 2 drops and recreates Hive external tables before each aggregation, pointing them at the S3 paths where mochi-trades wrote its output. This ensures Trino sees the latest partition data.

CI/CD Pipeline

All components follow the same pattern:

Code is pushed to a GitLab repository
GitLab CI triggers a Kaniko build (using the in-cluster GitLab runner)
The Docker image is pushed to Harbor (harbor.minoko.life/mochi/<component>)
For the dashboard: ArgoCD detects the Helm chart change and syncs the deployment

The GitLab runner executes on Kubernetes with the kubernetes executor, running on polycephala with a disk-pressure toleration. Build cache is stored in a MinIO bucket (gitlab-runner-cache).

Dashboard Deployment

The dashboard has a Helm chart managed by ArgoCD:

mochi-dashboard/
├── helm/mochi-dashboard/
│   ├── Chart.yaml
│   ├── values.yaml          # Ingress, resources, replicas
│   ├── values-prod.yaml     # Production overrides
│   └── templates/
│       ├── deployment.yaml
│       ├── ingress.yaml
│       ├── service.yaml
│       └── serviceaccount.yaml

ArgoCD watches the HEAD of the GitLab repository and auto-syncs with self-heal enabled. Image tags are set as Helm parameters in the ArgoCD Application resource.

Dashboard

The React dashboard provides:

Stock tree view – Browse backtest results by symbol, with nested strategies
Strategy visualization – View graphs and statistics for a selected strategy
Backtest submission – Ticker search, date range presets (1-30 years), ATR parameter configuration
Live trades management – CRUD operations on per-broker trade configurations stored in MinIO
Broker management – Add/remove brokers
SEC filings viewer – Browse SEC filing data

The frontend communicates directly with MinIO via the AWS S3 SDK (using forcePathStyle: true and static credentials). Pre-signed URLs provide temporary access to graphs and CSV files. The backtest API is called directly for workflow submission.

Network Path

Browser → dashboard.minoko.life (nginx ingress, TLS)
       → s3.minoko.life (MinIO S3 API, TLS)
       → backtest-api.minoko.life (FastAPI, TLS)

All endpoints use the same Let’s Encrypt wildcard certificate (*.minoko.life).

Ticker Sanitization

Forex and crypto tickers from Polygon use colons in their symbols (e.g., C:XAUUSD for gold). Colons break Trino’s URI parsing because it interprets C: as a URI scheme like http:.

The fix sanitizes ticker symbols by replacing colons with underscores (C:XAUUSD -> C_XAUUSD) at multiple points:

Component	Where
mochi-java	`TickerSanitizer.java` utility class
mochi-java	`S3UploaderImpl.java` (S3 partition paths)
mochi-java	`TradeYearFetcher.java` (S3 prefix lookups)
mochi-java	`TrinoAggregator.java` (SQL queries and output filenames)
data-metadata	`job_placer.py` (before passing to Phase 2)

This ensures partition paths like mochi_trades/run_id=.../symbol=C_XAUUSD/... are URI-safe.

Lessons Learned

Trino memory requirements are hard to predict. The aggregation query with 20+ CTEs and 400+ stages failed repeatedly at lower memory configurations (48GB, 64GB, 80GB per worker). A single 100GB worker succeeded where two smaller workers could not, because the query’s intermediate state doesn’t distribute well.

NetworkPolicies catch you by surprise. MinIO had a default-deny policy with an allowlist of namespaces for port 9000. When adding the S3 API ingress (s3.minoko.life), requests from the nginx ingress controller were blocked because ingress-nginx wasn’t in the allowlist. The Linkerd service mesh proxy made this harder to diagnose – the error manifested as l5d-proxy-error: connect timed out rather than a clear connection refused.

ConfigMap-per-step keeps things manageable. With ten pipeline steps across two workflow phases, sharing ConfigMaps with per-step overrides would be fragile. Each step having its own ConfigMap makes it explicit what environment each container sees.

One scenario at a time. With mochi-trades needing 30GB and the Trino worker needing 100GB, running multiple scenarios in parallel would exceed the node’s memory. The workflow sets parallelism: 1 to process scenarios sequentially.

Architecture Overview#

Components#

Infrastructure#

Supporting Services#

Data Flow#

Phase 1: Data Pipeline#

Phase 2: Per-Scenario Processing#

Bucket Architecture#

Kubernetes Configuration#

Argo Workflow Templates#

Resource Requirements#

Trino Setup#

CI/CD Pipeline#

Dashboard Deployment#

Dashboard#

Network Path#

Ticker Sanitization#

Lessons Learned#