Mochi is a self-hosted algorithmic trading backtest platform. It takes a stock ticker and date range, downloads historical market data, runs trading strategy simulations across multiple parameter combinations, aggregates the results with Trino, generates statistical graphs with R, and presents everything through a React dashboard. The entire pipeline runs on a homelab Kubernetes cluster orchestrated by Argo Workflows.

Architecture Overview

┌──────────────────────────────────────────────────────────────────────────┐
│                         dashboard.minoko.life                           │
│                     React + Vite + TypeScript                           │
│              S3 client (MinIO) ─── Backtest API client                  │
└──────────────┬───────────────────────────────┬───────────────────────────┘
               │                               │
               ▼                               ▼
┌──────────────────────┐         ┌──────────────────────────┐
│   MinIO (S3 API)     │         │   Backtest API (FastAPI) │
│   s3.minoko.life     │         │   backtest-api.minoko.life│
│   13 buckets         │         │   POST /backtest          │
└──────────┬───────────┘         └────────────┬─────────────┘
           │                                  │
           │              ┌───────────────────▼──────────────────┐
           │              │         Argo Workflows               │
           │              │    workflows.minoko.life              │
           │              │                                      │
           │              │  Phase 1: backtest-pipeline           │
           │              │  ┌────────┐ ┌──────────┐ ┌────────┐ │
           │              │  │Polygon │→│Enhancer  │→│Metadata│ │
           │              │  │Python  │ │Kotlin    │ │Python  │ │
           │              │  └────────┘ └──────────┘ └───┬────┘ │
           │              │                              │      │
           │              │  Phase 2: per scenario (x N) │      │
           │              │  ┌──────────────────────────┐│      │
           │              │  │mochi-trades (Java)       ││      │
           │              │  │     │                    ││      │
           │              │  │sync-partitions (Trino)   ││      │
           │              │  │     │                    ││      │
           │              │  │mochi-aggregate (Java)    ││      │
           │              │  │     │                    ││      │
           │              │  │  ┌──┴──┬──────┐         ││      │
           │◀─────────────│──│  │years│stops │best-    ││      │
           │   results    │  │  │  .r │  .r  │traders.r││      │
           │              │  │  └─────┴──────┴────┬────┘│      │
           │              │  │  trade-extract (Kt) │    ││      │
           │              │  │  py-trade-lens (Py)  │    ││      │
           │              │  │  trade-summary (Py)  │    ││      │
           │              │  └──────────────────────────┘│      │
           │              └──────────────────────────────┘      │
           │                                                    │
           │         ┌──────────────────────────────┐           │
           └────────▶│  Trino (Coordinator + Worker) │◀──────────┘
                     │  Hive Metastore + Postgres    │
                     │  100GB worker for aggregation │
                     └──────────────────────────────┘

Components

ComponentLanguagePurposeImage
mochi-dashboardTypeScript/ReactUI, S3 browsing, backtest submissionharbor.minoko.life/mochi/mochi-dashboard
backtest-apiPython/FastAPIAccepts backtest requests, creates Argo Workflowsharbor.minoko.life/mochi/backtest-api
polygonPythonDownloads historical data from Polygon.io APIharbor.minoko.life/mochi/polygon
trade-data-enhancerKotlinCalculates ATR and technical indicatorsharbor.minoko.life/mochi/trade-data-enhancer
data-metadataPythonGenerates scenario parameter combinations for Phase 2harbor.minoko.life/mochi/data-metadata
mochi-javaJava 21Core trading simulation engine and Trino aggregationharbor.minoko.life/mochi/mochi-java
r-graphsRStatistical visualizations (years, stops, best traders)harbor.minoko.life/mochi/r-graphs
trade-extractKotlinExtracts individual trades from aggregated resultsharbor.minoko.life/mochi/trade-extract
py-trade-lensPythonTrade analysis and insightsharbor.minoko.life/mochi/py-trade-lens
trade-summaryPythonFinal result summarizationharbor.minoko.life/mochi/trade-summary

Five languages across ten containerized services.

Infrastructure

The platform runs on a Kubernetes 1.35 cluster with the following nodes:

NodeRoleOSNotes
k8s-master01Control planeCentOS Stream 9
k8s-worker01WorkerCentOS Stream 9
k8s-worker02WorkerCentOS Stream 9
polycephalaWorkerRocky Linux 10.1Primary workload node, large memory
minis-enp195s0Worker (DMZ)Fedora 43

Polycephala handles all backtest workloads. The Phase 2 pipeline requires up to 30GB JVM heap for the Java trading engine and a 100GB Trino worker for aggregation queries.

Supporting Services

ServiceNamespacePurpose
MinIOminioS3-compatible object storage (13 buckets)
TrinotrinoSQL query engine replacing AWS Athena
HarborharborPrivate container registry
Argo WorkflowsargoWorkflow orchestration
ArgoCDargocdGitOps deployment
Linkerdcluster-wideService mesh
Nginx Ingressingress-nginxTLS termination and routing

Data Flow

Phase 1: Data Pipeline

A backtest request starts at the dashboard. The user specifies a ticker symbol (e.g., AAPL, C:XAUUSD), a date range, and strategy parameters (ATR periods, alpha, trade duration, timeout).

The backtest API receives the request, generates a unique group tag (e.g., grape-koala-20260127122439), uploads the parameters as JSON to MinIO, and creates an Argo Workflow from the backtest-pipeline template.

Phase 1 runs three steps sequentially:

  1. polygon – Downloads minute, hour, and day resolution data from the Polygon.io API. Data is LZO-compressed and uploaded to mochi-prod-raw-historical-data.

  2. trade-data-enhancer – Reads the raw data, calculates ATR (Average True Range) with configurable short and long periods, and writes enhanced data to mochi-prod-prepared-historical-data.

  3. data-metadata – Analyzes the enhanced data and generates multiple scenario parameter combinations (stop loss ranges, limit ranges, tick offsets). Each scenario defines a different strategy configuration to test. The output is a JSON array of scenarios that triggers Phase 2.

Phase 2: Per-Scenario Processing

Phase 2 runs independently for each scenario generated by data-metadata. With parallelism: 1 at the workflow level, scenarios execute one at a time to stay within the node’s memory budget.

Each scenario follows this DAG:

mochi-trades          Execute trading simulation (Java, 30GB heap)
      │
sync-partitions       Drop/recreate Hive tables, sync partition metadata
      │
mochi-aggregate       Run 20+ CTE aggregation query via Trino (100GB worker)
      │
      ├── r-graphs-years         Annual performance graphs
      ├── r-graphs-stops         Stop loss analysis graphs
      └── r-graphs-besttraders   Top performer ranking
              │
         trade-extract           Extract individual trades (Kotlin)
              │
         py-trade-lens           Trade analysis (Python)
              │
         trade-summary           Final summarization (Python)

The three R graph scripts run in parallel after aggregation. The remaining steps are sequential, with trade-extract gated on the best-traders graph completing.

Bucket Architecture

Each pipeline step reads from one bucket and writes to another:

BucketPurposeWritten By
mochi-prod-backtest-paramsBacktest configuration JSONbacktest-api
mochi-prod-raw-historical-dataRaw Polygon data (LZO compressed)polygon
mochi-prod-prepared-historical-dataEnhanced data with indicatorstrade-data-enhancer
mochi-prod-ticker-metaTicker metadatadata-metadata
mochi-prod-backtest-tradersTrader configurationsmochi-trades
mochi-prod-backtest-tradesRaw trade simulation resultsmochi-trades
mochi-prod-aggregated-tradesAggregated trade statisticsmochi-aggregate
mochi-prod-aggregated-trades-stagingStaging area for aggregationmochi-aggregate
mochi-prod-summary-graphsR-generated graphs and CSVsr-graphs
mochi-prod-trade-extractsExtracted trade datatrade-extract
mochi-prod-final-trader-rankingFinal ranked resultstrade-summary
mochi-prod-live-tradesLive trading configurationsdashboard
mochi-prod-portfolio-trackingPortfolio tracking datadashboard

Kubernetes Configuration

Argo Workflow Templates

Two WorkflowTemplates in the mochi-backtest namespace:

  • backtest-pipeline – Phase 1 (sequential steps: init-params, polygon, trade-data-enhancer, data-metadata)
  • phase2-scenario-pipeline – Phase 2 (DAG per scenario with all processing steps)

Each step in the workflow uses envFrom to load both the aws-credentials secret (MinIO access keys) and a step-specific ConfigMap. There are ten ConfigMaps, one per step, keeping configuration explicit:

init-params-env          polygon-env              trade-data-enhancer-env
data-metadata-env        mochi-trades-env         mochi-aggregate-env
r-graphs-env             trade-extract-env        py-trade-lens-env
trade-summary-env

Resource Requirements

The pipeline is memory-intensive. The heaviest steps:

StepCPU RequestMemoryNotes
mochi-trades430GiJava 21, 30GB JVM heap, parallel GC
mochi-aggregate430GiJava client for Trino aggregation
Trino worker-100GiSingle worker, 96GB JVM heap
r-graphs (each)216GiThree instances run in parallel

A 10-year forex backtest (e.g., C:XAUUSD) generates enough data that the Trino aggregation query – with 20+ CTEs and 400+ execution stages – requires the full 100GB worker allocation.

Trino Setup

Trino replaces AWS Athena for the aggregation step. It runs as a coordinator + single worker in the trino namespace:

  • Coordinator: Schedules queries, 500 max stages, phased execution policy
  • Worker: 100Gi memory, 96GB JVM heap (-Xmx96G), G1GC
  • Hive Metastore: Postgres-backed, manages table metadata for MinIO-hosted data
  • Catalog: Hive connector with forcePathStyle pointing to minio.minio.svc.cluster.local:9000

The sync-partitions step in Phase 2 drops and recreates Hive external tables before each aggregation, pointing them at the S3 paths where mochi-trades wrote its output. This ensures Trino sees the latest partition data.

CI/CD Pipeline

All components follow the same pattern:

  1. Code is pushed to a GitLab repository
  2. GitLab CI triggers a Kaniko build (using the in-cluster GitLab runner)
  3. The Docker image is pushed to Harbor (harbor.minoko.life/mochi/<component>)
  4. For the dashboard: ArgoCD detects the Helm chart change and syncs the deployment

The GitLab runner executes on Kubernetes with the kubernetes executor, running on polycephala with a disk-pressure toleration. Build cache is stored in a MinIO bucket (gitlab-runner-cache).

Dashboard Deployment

The dashboard has a Helm chart managed by ArgoCD:

mochi-dashboard/
├── helm/mochi-dashboard/
│   ├── Chart.yaml
│   ├── values.yaml          # Ingress, resources, replicas
│   ├── values-prod.yaml     # Production overrides
│   └── templates/
│       ├── deployment.yaml
│       ├── ingress.yaml
│       ├── service.yaml
│       └── serviceaccount.yaml

ArgoCD watches the HEAD of the GitLab repository and auto-syncs with self-heal enabled. Image tags are set as Helm parameters in the ArgoCD Application resource.

Dashboard

The React dashboard provides:

  • Stock tree view – Browse backtest results by symbol, with nested strategies
  • Strategy visualization – View graphs and statistics for a selected strategy
  • Backtest submission – Ticker search, date range presets (1-30 years), ATR parameter configuration
  • Live trades management – CRUD operations on per-broker trade configurations stored in MinIO
  • Broker management – Add/remove brokers
  • SEC filings viewer – Browse SEC filing data

The frontend communicates directly with MinIO via the AWS S3 SDK (using forcePathStyle: true and static credentials). Pre-signed URLs provide temporary access to graphs and CSV files. The backtest API is called directly for workflow submission.

Network Path

Browser → dashboard.minoko.life (nginx ingress, TLS)
       → s3.minoko.life (MinIO S3 API, TLS)
       → backtest-api.minoko.life (FastAPI, TLS)

All endpoints use the same Let’s Encrypt wildcard certificate (*.minoko.life).

Ticker Sanitization

Forex and crypto tickers from Polygon use colons in their symbols (e.g., C:XAUUSD for gold). Colons break Trino’s URI parsing because it interprets C: as a URI scheme like http:.

The fix sanitizes ticker symbols by replacing colons with underscores (C:XAUUSD -> C_XAUUSD) at multiple points:

ComponentWhere
mochi-javaTickerSanitizer.java utility class
mochi-javaS3UploaderImpl.java (S3 partition paths)
mochi-javaTradeYearFetcher.java (S3 prefix lookups)
mochi-javaTrinoAggregator.java (SQL queries and output filenames)
data-metadatajob_placer.py (before passing to Phase 2)

This ensures partition paths like mochi_trades/run_id=.../symbol=C_XAUUSD/... are URI-safe.

Lessons Learned

Trino memory requirements are hard to predict. The aggregation query with 20+ CTEs and 400+ stages failed repeatedly at lower memory configurations (48GB, 64GB, 80GB per worker). A single 100GB worker succeeded where two smaller workers could not, because the query’s intermediate state doesn’t distribute well.

NetworkPolicies catch you by surprise. MinIO had a default-deny policy with an allowlist of namespaces for port 9000. When adding the S3 API ingress (s3.minoko.life), requests from the nginx ingress controller were blocked because ingress-nginx wasn’t in the allowlist. The Linkerd service mesh proxy made this harder to diagnose – the error manifested as l5d-proxy-error: connect timed out rather than a clear connection refused.

ConfigMap-per-step keeps things manageable. With ten pipeline steps across two workflow phases, sharing ConfigMaps with per-step overrides would be fragile. Each step having its own ConfigMap makes it explicit what environment each container sees.

One scenario at a time. With mochi-trades needing 30GB and the Trino worker needing 100GB, running multiple scenarios in parallel would exceed the node’s memory. The workflow sets parallelism: 1 to process scenarios sequentially.