Documentation Index
Fetch the complete documentation index at: https://docs.techulus.cloud/llms.txt
Use this file to discover all available pages before exploring further.
Techulus Cloud Architecture
Overview
Techulus Cloud is a stateless container deployment platform built around three core principles:
- Workloads are disposable: containers can be killed and recreated at any time.
- Two node types: proxy nodes handle public traffic, worker nodes run containers.
- Networking is private-first: services communicate over a WireGuard mesh, with public exposure routed through proxy nodes.
Tech Stack
| Component | Choice | Rationale |
|---|
| Control Plane | Next.js (full-stack) | Single deployment with React frontend and API routes |
| Database | Postgres + Drizzle | Simple, low operational overhead, easy backup |
| Background Jobs | Inngest (self-hosted) | Durable workflows, retries, event-driven orchestration |
| Server Agent | Go | Single binary that shells out to Podman |
| Container Runtime | Podman | Docker-compatible, daemonless, bridge networking with static IPs |
| Reverse Proxy | Traefik | Automatic HTTPS via Let’s Encrypt, runs on proxy nodes only |
| Private Network | WireGuard | Full mesh coordinated by the control plane |
| Service Discovery | Built-in DNS | Agent serves .internal domains |
| Agent Communication | Pull-based HTTP | Agent polls expected state and receives leased commands through status reports |
Node Types
| Type | Traefik | Public Traffic | Containers |
|---|
| Proxy | Yes | Handles TLS termination | Yes |
| Worker | No | None | Yes |
- Proxy nodes handle incoming public traffic, terminate TLS using HTTP-01 ACME, and route requests to containers over WireGuard.
- Worker nodes run containers only and have no public exposure.
Architecture Diagram
Agent State Machine
The agent uses a two-state machine to prevent race conditions during reconciliation.
IDLE State
- Poll the control plane every 10 seconds for expected state.
- Compare expected state versus actual state for containers, DNS, Traefik, and WireGuard.
- If no drift exists, send a status report and remain in
IDLE.
- If drift is detected, snapshot expected state and transition to
PROCESSING.
Traefik drift detection only applies on proxy nodes.
PROCESSING State
- Stop polling and work from the expected-state snapshot.
- Apply one change at a time with verification.
- Re-check drift after every change.
- Transition back to
IDLE once drift is resolved.
- Force a return to
IDLE after 5 minutes if reconciliation stalls.
- Always send a status report before returning to
IDLE.
Drift Detection
The agent uses hash comparisons for deterministic drift detection:
- Containers: missing, orphaned, wrong state, or image mismatch.
- DNS: hash of sorted records versus current DNS config.
- Traefik: hash of sorted routes versus current Traefik config on proxy nodes.
- WireGuard: hash of sorted peers versus current
wg0.conf.
Container Reconciliation Order
- Stop orphan containers with no deployment ID.
- Start containers in
created or exited state.
- Deploy missing containers.
- Redeploy containers with wrong state or image mismatch.
- Update DNS records.
- Update Traefik routes on proxy nodes.
- Update WireGuard peers.
Rollout Stages
pending -> pulling -> starting -> healthy -> dns_updating -> traefik_updating -> stopping_old -> running
| Stage | Description |
|---|
pending | Deployment created and waiting for an agent |
pulling | Agent is pulling the container image |
starting | Container started and waiting for health checks |
healthy | Health check passed, or no health check is configured |
dns_updating | DNS records are being updated |
traefik_updating | Traefik routes are being updated |
stopping_old | Old deployment containers are being stopped |
running | Deployment is complete and serving traffic |
Special states:
unknown: the agent stopped reporting this deployment and the container may still exist.
stopped: the container was explicitly stopped.
failed: the deployment failed, such as during health checks.
rolled_back: rollout failed and reverted to the previous deployment.
Networking
IP Address Scheme
| Range | Purpose |
|---|
10.100.X.1 | WireGuard IP for server X |
10.200.X.2-254 | Container IPs on server X |
X is the server subnet ID from 1 to 255.
WireGuard Mesh
Each server gets a /24 subnet for routing:
- Server 1:
10.100.1.0/24 with WireGuard IP 10.100.1.1
- Server 2:
10.100.2.0/24 with WireGuard IP 10.100.2.1
Every server peers with every other server. AllowedIPs includes both WireGuard and container subnets:
AllowedIPs = 10.100.2.0/24, 10.200.2.0/24
Container Network
Each server has a Podman bridge network:
podman network create \
--driver bridge \
--subnet 10.200.1.0/24 \
--gateway 10.200.1.1 \
--disable-dns \
techulus
Containers receive static IPs assigned by the control plane:
podman run -d \
--name service-deployment \
--network techulus \
--ip 10.200.1.2 \
--label techulus.deployment.id=<deployment-id> \
--label techulus.service.id=<service-id> \
traefik/whoami
DNS Resolution
Each agent runs a built-in DNS server for .internal domains:
- It listens on the container gateway IP, such as
10.200.1.1.
- It configures
systemd-resolved to forward .internal queries.
- Records are pushed from the control plane through expected state.
Services resolve through .internal names with round-robin across replicas.
Traefik on Proxy Nodes
Proxy nodes receive routes and certificates from the control plane:
- Routes live in
/etc/traefik/dynamic/routes.yaml.
- Certificates live in
/etc/traefik/dynamic/tls.yaml.
- Routes map
subdomain.example.com to container IPs over WireGuard.
- TLS certificates are managed centrally by the control plane.
/.well-known/acme-challenge/* is routed back to the control plane for ACME validation.
Worker nodes do not run Traefik.
Multiple Proxy Nodes
The platform supports geographically distributed proxy nodes with proximity steering:
- Users point custom domains to a single GeoDNS-managed hostname.
- GeoDNS routes clients to the nearest healthy proxy.
- Health checks fail over automatically when a proxy becomes unavailable.
- All proxies share the same TLS certificates from the control plane.
Example:
Proxy US: 1.2.3.4
Proxy EU: 5.6.7.8
Proxy SYD: 9.10.11.12
GeoDNS:
example.com -> lb.techulus.cloud
-> route client to nearest proxy
-> fail over when a proxy is unhealthy
Proximity-Aware Load Balancing
Within a proxy node, traffic is distributed using weighted round-robin:
- Local replicas on the same proxy server use weight
5.
- Remote replicas on other proxy servers use weight
1.
That keeps the majority of traffic local whenever possible while still preserving cross-node routing.