Docker on Guillaume Delré

Eleven Out of Twelve

Sun, 17 May 2026 15:00:00 +0000

The composer.json in each service had this in its post-install-cmd section:

"post-install-cmd": [
    "bin/console cache:clear --env=prod",
    "bin/console doctrine:migrations:migrate --no-interaction"
]

post-install-cmd runs during composer install, which in the production Dockerfile runs during the image build. There is no database available during a Docker build. The migration command either failed silently, or connected to nothing, or was skipped by Doctrine when it couldn’t find a schema to compare against. In any case, it didn’t migrate anything.

This is a clean violation of Factor XII : admin processes — migrations, one-off scripts, console tasks — should run in the same environment as the application, against the actual production data. Running them at build time inverts the relationship. The image shouldn’t know about the database. The database should be there when the image needs it.

The move to the entrypoint

The migration command moved from composer.json to docker-entrypoint.sh. The shift looks small on a diff. The implications are not.

The entrypoint runs when the container starts, not when the image is built. The database is reachable. The entrypoint waits for it — up to 60 seconds, one attempt per second — before doing anything:

ATTEMPTS_LEFT_TO_REACH_DATABASE=60
until [ $ATTEMPTS_LEFT_TO_REACH_DATABASE -eq 0 ] || \
  DATABASE_ERROR=$(php bin/console dbal:run-sql -q "SELECT 1" 2>&1); do
    sleep 1
    ATTEMPTS_LEFT_TO_REACH_DATABASE=$((ATTEMPTS_LEFT_TO_REACH_DATABASE - 1))
done

if [ $ATTEMPTS_LEFT_TO_REACH_DATABASE -eq 0 ]; then
    echo "$DATABASE_ERROR"
    exit 1
fi

If the database doesn’t respond within 60 seconds, the container exits with an error and Kubernetes restarts it. Once the database is ready, the migration runs:

if [ "$( find ./migrations -iname '*.php' -print -quit )" ]; then
    php bin/console doctrine:migrations:migrate --no-interaction --all-or-nothing
fi

Two changes from the original command: --all-or-nothing ensures that if any migration in a batch fails, the entire batch rolls back. And the find guard skips the command entirely if there are no migration files — useful for services that don’t use Doctrine migrations at all.

This is genuinely better. The database is present. The migration runs in the real environment. The --all-or-nothing flag adds atomicity that the build-time version never had.

What it doesn’t solve

Two pods redeploying simultaneously both run the entrypoint. Both reach the database. Both find pending migrations. Both call doctrine:migrations:migrate.

Doctrine has a locking mechanism: a doctrine_migration_versions table that records which migrations have run, and the command checks it before applying. Under normal conditions this is fine: the second pod finds the table up to date and exits cleanly. The real failure modes are more specific: a migration long enough that the database lock times out before it completes, letting a second runner start the same migration before the first has finished; or a pod that crashes mid-migration before recording the version in the table, leaving the schema in an applied-but-unregistered state that the next pod will try to apply again.

The team’s position is explicit: a brief deployment downtime is acceptable. Application versions aren’t necessarily forward-compatible with older schema versions, so running N and N+1 simultaneously against the same database isn’t safe anyway. The deployment strategy is Recreate: all old pods are terminated before any new pods start. The migration runs on first startup, no overlap between versions. It works.

But “it works” and “it’s the right architecture” are different answers.

What would be different

Factor XII says admin processes should run in “one-off processes.” A process that runs once, for a specific purpose, against the production environment. The entrypoint is not one-off — it runs every time a container starts, including restarts, scaling events, and Kubernetes node movements.

Three alternatives exist, each with a different answer to the question of ownership:

A Kubernetes init container runs before the main container starts, in the same pod. It could run the migration, exit, and let the main container start only after it succeeds. The migration is isolated from the application runtime. The downside: the init container is another image to build and maintain, and it runs on every pod start — so a 14-service platform starting simultaneously still has a potential race.

A Kubernetes Job runs once, on demand or triggered by a deployment pipeline. It can be made to run before any pods are updated — serial, isolated, with a clear success or failure signal. The race condition goes away. The complexity moves to the deployment process: the Job must complete before the Deployment rollout begins, and the CI pipeline must coordinate both.

A Helm hook is the same concept expressed declaratively in the Helm chart. A pre-upgrade hook runs the migration before the application pods are updated. It’s the most idiomatic Kubernetes answer. It also means the Helm chart is now responsible for running migrations — a decision that belongs to whoever owns the chart.

That last sentence is why the entrypoint hasn’t changed. Moving migrations out of the application means deciding that the deployment infrastructure — not the application itself — is responsible for the schema. It’s a governance question as much as a technical one, and governance questions take longer to resolve than code changes.

The honest end

The migration block in the entrypoint is two lines. Literally: the if [ "$( find ./migrations... )" ] guard, and the php bin/console doctrine:migrations:migrate that follows. Eleven other factors have clean resolutions. The cache moved to Redis. The logs go to stdout. The filesystem is an S3 bucket. The CI assembles production images from the same commit it tests. The secrets don’t travel in image layers.

Factor XII has an answer. It’s just not the final one.

The migrations run at startup, with a real database, with atomicity, with a bounded retry window. That’s better than running at build time against nothing. Whether they eventually move to a Job or a Helm hook is a conversation about who owns the schema — a question that a kubectl apply can’t answer.

Ready Is Not the Same as Started

Sun, 17 May 2026 10:00:00 +0000

The rolling deploy looked clean. A new pod started. Kubernetes saw the healthcheck pass — php -v returned zero — and began routing traffic to the new container.

For the next forty seconds — out of a possible sixty — that container was polling for the database.

Requests that landed on it during that window got errors. Not many — the window was short — but enough to show up as noise in the monitoring. The kind of noise that gets dismissed as a transient network issue and filed nowhere. The deploy succeeded. The pod eventually became ready. The mechanism that caused it was still there, waiting for the next deploy.

The entrypoint script does five things before FrankenPHP starts: copy a version file, verify the vendor directory, wait up to sixty seconds for the database, run pending migrations, install assets and set filesystem permissions. In Docker Compose, this is invisible. In Kubernetes, the gap becomes traffic.

The gap between started and ready

Kubernetes decides whether to send traffic to a pod by watching its readiness probe. A pod whose readiness probe passes receives requests. A pod whose readiness probe fails is removed from the load balancer rotation until it recovers. This is the mechanism that makes rolling deploys safe: Kubernetes doesn’t cut over to a new pod until that pod says it’s ready.

The compose.yaml defines a healthcheck on every service:

healthcheck:
    test: [ "CMD", "php", "-v" ]
    interval: 30s
    timeout: 10s
    retries: 3
    start_period: 10s

php -v succeeds the moment the PHP binary is present — which is true from the first millisecond of container life. The start_period: 10s gives ten seconds before checks begin. But the entrypoint polling loop runs for up to sixty seconds before FrankenPHP even starts. At second ten, the healthcheck passes. The application is still waiting for the database.

The Dockerfile has a better signal:

HEALTHCHECK --start-period=60s CMD curl -f http://localhost:2019/metrics || exit 1

Port 2019 is Caddy’s built-in metrics server, embedded directly in FrankenPHP. The endpoint is Prometheus-compatible and only responds once Caddy’s HTTP stack is fully initialized and PHP workers are accepting connections. php -v exits in fifty milliseconds regardless of what the application is doing — it checks the binary, not the server. :2019/metrics only answers when the server is actually serving. It is also not an endpoint added just for the probe: every service in the platform already has it scraped by Prometheus, so the signal is live regardless of any healthcheck configuration.

That’s closer. But in Kubernetes, the HEALTHCHECK instruction is ignored entirely. Kubernetes uses its own probe configuration. Without explicit probe definitions in the Kubernetes manifests, there are no readiness checks — and a pod is considered ready the moment its container starts.

Which means: pod starts, entrypoint begins polling, Kubernetes routes traffic, application is not yet serving. Requests arrive at a container that isn’t ready to handle them.

Three signals, three questions

Kubernetes separates container lifecycle into three distinct questions, each with its own probe type:

startupProbe — “Has the application finished starting?” Fires repeatedly until it passes, then hands off to liveness. Prevents the liveness probe from killing a container that’s legitimately slow to initialize. For a container whose entrypoint can take sixty seconds, this is the right tool.

readinessProbe — “Is the application ready to handle requests?” Fails and passes throughout the container’s life. When it fails, the pod is removed from the load balancer. This is what makes a rolling deploy safe.

livenessProbe — “Is the application still alive?” If it fails, Kubernetes restarts the container. Meant to catch hung processes, not slow startups.

The sixty-second polling loop belongs in the startupProbe’s patience, not in application code:

startupProbe:
    httpGet:
        path: /metrics
        port: 2019
    failureThreshold: 12    # 12 attempts × 5s = 60s max
    periodSeconds: 5

Once the startupProbe passes, a readinessProbe on the same endpoint takes over — telling Kubernetes when the pod is safe to receive traffic — and a livenessProbe watches for hung processes. But the startupProbe is the one that absorbs the slow start. The entrypoint polling loop becomes redundant: its job was to keep the container alive while the database caught up. Without it, the application attempts to connect, fails, and the container exits — Kubernetes restarts the pod, and the startupProbe maintains its retry cycle until the database responds and the application starts cleanly. The retry responsibility moves from inside the entrypoint to the orchestrator, which is exactly where it belongs.

The migration problem

The polling loop is the most visible issue, but the migrations create a subtler one.

With a rolling deploy and two replicas, Kubernetes starts a new pod while the old one still serves traffic. Both pods run the same entrypoint. Both reach doctrine:migrations:migrate.

Doctrine’s migration table tracks which migrations have already executed, so a completed migration won’t run twice. But if two pods start simultaneously and both see a pending migration, both attempt to run it at the same time. Whether that’s safe depends on the migration: additive schema changes are usually fine; destructive ones less so. And you don’t get to choose which ones run on a deploy that didn’t expect to coordinate. --all-or-nothing wraps migrations in a transaction and rolls back everything if one fails — it’s about atomicity within a single run, not coordination across processes.

The cleaner approach separates the two concerns into two init containers: one that waits for the database, one that runs migrations. The main container starts only after both complete:

initContainers:
    - name: wait-for-db
      image: authentication:latest
      command: ["php", "bin/console", "dbal:run-sql", "-q", "SELECT 1"]
    - name: migrate
      image: authentication:latest
      command: ["php", "bin/console", "doctrine:migrations:migrate", "--no-interaction", "--all-or-nothing"]

Both init containers reuse the application image. That’s not waste: they need the same PHP binary and the same environment wiring to reach the database and resolve the migration classes. A lighter purpose-built image would reduce startup overhead, but would require maintaining a separate PHP installation in sync with the main image.

Even with init containers, multiple pods starting simultaneously — initial deploy, after a node failure, or under autoscaling pressure — will each attempt to run migrations. Solving that properly — through a Helm pre-upgrade hook, a maxSurge: 0 strategy, or a separate migration Job — is a topic in itself. What matters here is that the entrypoint is the wrong place to host that decision: it can’t coordinate across pods, and it ties migration execution to application startup in a way that’s hard to untangle later. The question of which approach fits this codebase — and why the entrypoint hasn’t been replaced — gets its own treatment in the next article in this series .

Factor XII of the twelve-factor methodology — admin processes run in the same environment as the application — is satisfied either way. The question is whether “same environment” means “same entrypoint script” or “same image, separate process”. In Kubernetes, the latter is safer.

What the entrypoint’s real job is

Strip out the database wait (now a startupProbe or init container), the migrations (now an init container or Job), and the assets install (a build-time operation that belongs in the Dockerfile), and the entrypoint has one remaining job: start the application.

exec docker-php-entrypoint "$@"

Factor IX of the twelve-factor app asks for fast startup and graceful shutdown. A container whose startup takes sixty seconds because it’s waiting for external dependencies is not fast. It means rolling deploys are slow, recovery after a crash is slow, and horizontal scale-out creates a sixty-second gap before each new pod contributes.

Fast startup is not just a nice-to-have. It’s what makes the rest of the cloud model work. When a pod can start in seconds, the orchestrator can scale aggressively and recover quickly. When it takes a minute, you add headroom everywhere — longer probe timeouts, larger deployment windows, more conservative scaling policies — and the system becomes rigid.

The Docker Compose tax

The entrypoint accumulates these responsibilities for a reason. In Docker Compose, there is no init container concept. There is no startupProbe. Services declare depends_on, but without health conditions, that’s just startup ordering — not readiness. The entrypoint fills the gap.

This is not a design flaw. It’s a reasonable adaptation to the constraints of Docker Compose. The script works. It handles edge cases (the database timeout, unrecoverable errors, missing migrations directory). Someone tested it.

The issue is the assumption that the same script works equally well in Kubernetes. It runs. The application eventually starts. But it bypasses the probe system that makes Kubernetes deployments reliable, and it puts migration responsibility in a place where coordination across pods is difficult to reason about.

Several of the changes in this series — media storage , secrets in image layers , log handlers , service dependencies , CI environment parity , cache adapters — were changes to application code or configuration. This one is different. It requires the infrastructure to gain awareness of what “ready” means for this application, and it requires the entrypoint to give up responsibilities it currently owns.

That’s a harder conversation. But the startupProbe is waiting for it.

Fifteen Minutes Before the First Test

Sat, 16 May 2026 10:00:00 +0000

The pipeline had two stages that had nothing to do with code: provision and deprovision. Between them, in sequence, came phpunit, phpmetrics, and behat.

stages:
  - build
  - provision
  - phpunit
  - phpmetrics
  - behat
  - deprovision
  - deploy

Before the first assertion ran, fifteen minutes had passed. Terraform had cloned an infrastructure repository, authenticated to Azure, and applied a VM configuration. Ansible had connected to the new VM, installed PHP, configured the application, wired up a database and a Redis instance. Then the tests ran. Then Terraform destroyed what Ansible had built.

For every pipeline. From every branch. For every pull request, from open to merge.

What those fifteen minutes were missing

The provision stage set up two services: PostgreSQL and Redis. Three services that the application depended on in production were absent: RabbitMQ, MinIO, and Varnish.

RabbitMQ processed all asynchronous work — 56 consumers across 14 microservices. MinIO handled media storage. Varnish fronted the HTTP cache. In CI, none of them existed. Tests that exercised message queuing or file storage had two options: skip these paths, or leave them untested until staging. Varnish is a different case: tests hit the application directly and intentionally bypass the cache layer, so its absence in CI is a deliberate choice rather than a gap.

This is the problem Factor X describes as the environment gap. The gap here wasn’t a matter of configuration — it was structural. The VM was built by Ansible from a script in a separate repository. It wasn’t a container image. It wasn’t versioned alongside the application. If a branch modified the RabbitMQ message topology, there was no way to test that modification in CI. The topology change and the code that relied on it would only meet in staging.

The Ansible provisioning script itself is part of the problem:

launch_vm:
  stage: provision
  script:
    - git clone git@gitlab.internal/infra/ci-vm.git
    - cd ci-vm
    - az login --service-principal -u $ARM_CLIENT_ID ...
    - terraform apply -var "prefix=${CI_PIPELINE_ID}-vm" ...
    - sleep 45
    - ansible-playbook behat/test-env.yml ...

The sleep 45 is there because Ansible needs the VM to finish booting before it can connect. It’s not an oversight — it’s the minimum time a freshly provisioned VM needs before SSH works. It’s baked into the process.

What replaced it

The new pipeline has no provision stage. It has no deprovision stage. The environment is the images, and the images exist before the tests begin.

Each test job declares its dependencies as Docker services:

services:
  - name: $REGISTRY_URL/platform/rabbitmq:$CI_COMMIT_REF_SLUG
    alias: rabbitmq
  - name: $REGISTRY_URL/platform/minio:$CI_COMMIT_REF_SLUG
    alias: minio
  - name: redis:7.4.1
    alias: redis
  - name: $ARTIFACTORY_URL/postgresql:13
    alias: postgresql

The services start in parallel when the job begins. Before the test script runs, a before_script waits for all of them to be ready:

before_script:
  - $CI_PROJECT_DIR/dockerize
      -wait tcp://postgresql:5432
      -wait tcp://rabbitmq:5672
      -wait tcp://minio:9000
      -wait tcp://redis:6379
      -timeout 120s

From pipeline start to first assertion: ninety seconds — assuming images are already cached on the runner; a cold pull adds time, but becomes negligible once the pipeline has run once on a given branch.

What `$CI_COMMIT_REF_SLUG` means

The timing is the visible result. What produces it is more interesting: the image names.

$REGISTRY_URL/platform/rabbitmq:$CI_COMMIT_REF_SLUG is not the official RabbitMQ image from Docker Hub. It’s an image built by the same pipeline, from the same branch, at the same commit as the code being tested. The RabbitMQ image carries the topology: a definitions.json with every exchange, every queue, every binding, every dead-letter configuration — versioned in git alongside the application that depends on them.

If a branch modifies the messaging topology, the CI pipeline builds a new RabbitMQ image that includes those modifications, then runs the tests against it. The topology change and the code that relies on it are tested together, at the same commit, before anything reaches staging.

The same logic applies to MinIO, as described in the first article in this series : the MinIO image carries preloaded test fixtures. The CI environment doesn’t need a setup step to populate storage. The state is built in.

The test runner itself follows the same pattern. Each job uses a debug variant of the application image — built from the same branch, same commit — with the test dependencies included:

image: $REGISTRY_URL/platform/$service:$CI_COMMIT_REF_SLUG-debug

The whole environment assembles from artifacts built at the same point in the git history.

What this required dropping

Behat and the provisioned VM were coupled. The Behat test suite ran against an HTTP server on the VM; removing the VM meant removing Behat.

That turned out not to be the obstacle it looked like. The Behat suite lived in a separate repository, required the VM to run, and had accumulated significant maintenance overhead. PHPUnit, running inside the application container with Docker services, covered the same scenarios through a more direct path: functional tests exercising the HTTP layer, unit tests for individual components, suites organized per feature area and generated dynamically into parallel CI jobs.

The BDD layer went away. The test coverage stayed — and could now run against the actual services.

Factor X, applied

Factor X is often read as “use the same database locally as in production.” That’s the simplest version. The deeper version is about the gap between what you test and what you ship.

The gap in the old pipeline was wide: a manually configured VM, missing key services, rebuilt from scratch on every run. The gap in the new pipeline is narrow: the CI assembles the environment from the same images as production, built from the same commit as the code under test.

The fifteen minutes of Terraform and Ansible were not just slow. They were building something that wasn’t what production ran, every time, before any test could begin. The ninety seconds of docker pull build exactly what production runs — and the tests that follow are testing that, not an approximation of it.

What Survives the Build

Thu, 14 May 2026 15:00:00 +0000

At some point during a cloud migration audit, someone ran this:

docker run --rm  php -r "var_dump(require '.env.local.php');"

The output showed everything that composer dump-env prod had compiled into the image at build time. Which meant it showed everything that had been in the .env file when the image was built. Which meant it showed these, among others:

INFLUXDB_INIT_ADMIN_TOKEN=
GF_SECURITY_ADMIN_USER=admin
GF_SECURITY_ADMIN_PASSWORD=admin123
BLACKFIRE_CLIENT_ID=
BLACKFIRE_CLIENT_TOKEN=
BLACKFIRE_SERVER_ID=
BLACKFIRE_SERVER_TOKEN=
NGROK_AUTHTOKEN=replace-me-optionnal

Twenty-five variables in total. Every credential that had accumulated in the root .env over three years, now permanent in an image layer.

How `dump-env` works

composer dump-env prod is a legitimate Symfony optimization. Instead of parsing .env files on every request, the runtime loads a pre-compiled PHP array from .env.local.php. Faster and simpler.

The problem is what it reads. The Dockerfile copies the repository into the image with COPY . ./, .env included. Then dump-env prod reads that file and compiles every variable into .env.local.php. The image ships with a frozen snapshot of the credentials that were in .env at build time.

Docker layers are immutable archives. Even if a subsequent step removed .env from the container filesystem, the layer containing it would still exist inside the image. docker save produces a tarball of every layer; extracting any file from any point in the build history is straightforward. The credentials are invisible at runtime. They are not gone.

Factor V calls this out directly: a build artifact should be environment-agnostic, with config arriving at the release step from outside. Once credentials are compiled in, the image is no longer portable. You can’t promote it across environments. You build twice and hope the second build behaves like the first.

How twenty-five variables accumulate

Before tracing how this gets fixed, it’s worth understanding how it happened.

The BLACKFIRE_* tokens are the easy case to understand. A team member sets up profiling, needs to share the configuration, and the repository is already open to everyone. One line in .env is the path of least resistance. The InfluxDB and Grafana credentials follow the same logic — shared tooling, shared repo, one commit.

Then there are the variables that reveal a different kind of drift. In some of the service-level .env files:

APP__RATINGS__SERIALS='{"brand1":{"fr":"12345"},...}'  # ~40 lines of JSON
APP__YOUTUBE__CREDENTIALS='{"brand1":{"client_id":"xxx","refresh_token":"yyy"},...}'

Audience measurement serial numbers. YouTube API refresh tokens per brand. These aren’t secrets in the Blackfire sense. They’re business data — the kind of values that vary between brands and environments, that someone decided to version in .env because they behaved like configuration and .env was where configuration lived.

Twenty-five variables is the sum of incremental decisions, none of which felt wrong in isolation. The problem is structural: when .env is the only answer available, everything starts looking like it belongs there.

Where things actually belong

Emptying the file required answering one question for each variable: where does this actually belong?

The answers revealed three categories that the team had never explicitly named:

Static config lives in code. Business rules, routing logic, Symfony parameter files — anything that doesn’t vary between deployments. A change requires a rebuild. The JSON blobs for audience measurement serials turned out not to be static config at all: they were queried from a dedicated Config service at runtime. They had no business being in a file.

Environment config varies between deployments: hostnames, connection strings, third-party credentials. This is what Factor III means by “config in environment variables” — real OS-level variables injected by the runtime, never files that travel with the code. In Kubernetes, this becomes a ConfigMap for non-sensitive values and a Kubernetes Secret for credentials. The choice for secrets management was SOPS — credentials are encrypted and committed to git, rather than stored in an external vault like Azure Key Vault or HashiCorp Vault. A vault trades simplicity for auditability: automatic rotation, centralized audit logs, workload identity-based access with no key to protect. SOPS trades those capabilities for a simpler operational model — no external service to query at deploy time, secrets travel through the normal code review process, git history serves as the audit trail. The accepted downsides are manual rotation and the responsibility of protecting the decryption key itself. For the team’s scale, the tradeoff was deliberate.

Dynamic config changes without a deployment: editorial parameters, per-brand thresholds, content moderation settings. It belongs in a database, managed through the application’s Config service. Some of what had accumulated in .env files was this category all along, passing as static defaults because it changed rarely enough that nobody noticed.

Once the categories had names, the variables sorted themselves. The root .env ended at four lines:

DOMAIN=platform.127.0.0.1.sslip.io
XDEBUG_MODE=off
SERVER_NAME=:80
APP_ENV=dev

Safe defaults. Nothing sensitive. dump-env prod now compiles empty strings; real values arrive at runtime from Kubernetes.

The PostgreSQL image

The PostgreSQL image used in CI has a hardcoded password:

FROM postgres:15
ENV POSTGRES_PASSWORD=admin123

This looks like the same problem. It isn’t, because the threat model is different. The CI database is ephemeral — it exists for the duration of a pipeline run, contains no real data, and runs in an isolated network. A hardcoded password on a throwaway test database is an acceptable risk, not a policy exception.

In production, the question doesn’t arise: the platform uses Azure Flexible Server, a managed PostgreSQL service. There is no Docker image. Credentials arrive via Helm chart injection, never touching a layer.

What survives the build now

The image that ships to production now contains a guarantee: var_dump(require '.env.local.php') returns only empty strings and safe defaults. The credentials aren’t there because they were never put there — they arrive at runtime, from outside.

That’s the responsibility boundary dump-env had been quietly erasing: the image is the application, the runtime is the environment. They should not know each other’s secrets.

Building a self-hosted homelab with Docker Compose and Traefik

Tue, 17 Feb 2026 00:00:00 +0000

For years I wanted a homelab at home. A place of my own to host development tools, monitor my machines, run home automation, and experiment without risking breaking anything important. The idea is simple. Getting it running, a bit less so.

Back then, Kubernetes didn’t exist yet. Options for running multiple services on a single machine came down to bash scripting, hand-written Nginx configs, and a lot of coffee. Tutorials on “homelab for humans” were nowhere to be found.

This tutorial is what I wish I had found back then. It’s been running for several years now. Not without evolving: services added, others dropped, choices revisited. But the foundation is there, stable — and that’s what success looks like in self-hosting.

The setup: ten self-hosted web services on a local machine, accessible from a browser via readable URLs, without touching DNS configuration, without renting a VPS, without managing TLS certificates. The ingredient that makes it possible: sslip.io , a public DNS service that encodes the IP directly in the domain name. service.192.168.1.10.sslip.io resolves to 192.168.1.10, with zero configuration, from any machine on the local network.

This tutorial is aimed at someone who knows Docker but is starting from scratch on self-hosted service orchestration.

Philosophy and architecture choices
The building blocks
Step-by-step setup
Adding a new service
Patterns and conventions
Common pitfalls
Conclusion
References

1. Philosophy and architecture choices

Goal

Run multiple web services on a local machine, accessible from a browser via readable URLs, without touching DNS configuration, without renting a VPS, without managing TLS certificates.

Why Docker Compose and not something else?

Docker Compose is the right level of complexity for a personal homelab. Kubernetes is too heavy for a single machine. Docker Swarm is in decline. Compose is simple, readable, versionable, and sufficient for dozens of services.

Why Traefik and not Nginx Proxy Manager?

Nginx Proxy Manager (NPM) is a graphical interface for configuring Nginx as a reverse proxy. Routes are stored in a database and configured through a UI.

Traefik automatically reads Docker container labels and generates its configuration on the fly. When a container starts with the right labels, Traefik discovers it and creates the route immediately, without restarting, without opening any UI.

This “configuration as code” approach has two major advantages:

A service’s configuration lives in its compose.yaml, in the same place as everything else.
Adding a service requires no changes to Traefik.

Why Dockge and not Portainer?

Portainer is a full Docker management tool: images, volumes, networks, individual containers… powerful but complex.

Dockge is focused on a single thing: managing Docker Compose stacks. Its UI is minimal and intuitive. For a homelab where everything is managed through Compose, it’s sufficient and much more pleasant to use.

Why sslip.io?

Web services need a hostname (e.g. dozzle.myserver.local) for Traefik to route correctly. The usual options:

Edit /etc/hosts on every machine: tedious, not shareable.
Set up a local DNS server (Pi-hole, AdGuard): requires additional infrastructure.
Buy a domain and configure DNS: costs money and time.

sslip.io is a public DNS service that automatically resolves ..sslip.io to . Example: dozzle.192.168.1.10.sslip.io resolves to 192.168.1.10. Nothing to configure — the DNS works everywhere without touching anything.

2. The building blocks

The shared Docker network

All services and Traefik must share the same Docker network so Traefik can communicate with them. This network is called traefik and is created once:

docker network create traefik

It is an external network (created outside any Compose file). Each compose.yaml declares it as external:

networks:
    traefik:
        external: true

Why external rather than internal to a Compose file? Because multiple independent stacks all need to connect to it. A network internal to a Compose file is only accessible to services within that file.

Traefik: the reverse proxy

Traefik listens on port 80 and routes HTTP requests to the right container based on the Host header.

Its main configuration lives in stacks/traefik/docker/traefik/traefik.yaml:

api:
    dashboard: true
    insecure: true

entryPoints:
    web:
        address: :80
    ping:
        address: :8082

providers:
    docker:
        endpoint: unix:///var/run/docker.sock
        exposedByDefault: false

log:
    level: INFO

global:
    sendAnonymousUsage: false

exposedByDefault: false is important: Traefik ignores all containers by default. A container must explicitly opt in with the label traefik.enable: true. This prevents accidentally exposing services.

The ping entrypoint on port 8082 is dedicated to health checks. Separating it from the web entrypoint prevents health check requests from appearing in access logs.

To access the Docker daemon, Traefik mounts the socket:

volumes:
    - /var/run/docker.sock:/var/run/docker.sock

Dockge: the stack manager

Dockge runs inside a container itself (the compose.yaml at the root of the repo). It needs two things:

Access to the Docker socket to manage the other containers.
Access to the stack directories to read and edit compose.yaml files.

The critical point is the stack mount. Dockge launches stacks by passing absolute paths to the Docker daemon. These paths must be identical inside the Dockge container and on the host. The solution:

volumes:
    - ${PWD}/stacks:${PWD}/stacks
environment:
    DOCKGE_STACKS_DIR: ${PWD}/stacks

${PWD} is a shell variable resolved at docker compose up time. It equals the current directory. If Dockge is launched from /home/user/homelab, the stacks folder will be mounted at /home/user/homelab/stacks on both sides. This is the only way to prevent Docker from creating ghost directories in the wrong place.

Practical consequence: always run docker compose up -d from the root of the repo.

Dockge’s persistent data (configuration, history) lives in a named volume created in advance:

docker volume create homelab_dockge_data

A named volume survives docker compose down -v. An anonymous volume would be destroyed with the stack.

3. Step-by-step setup

Step 1: clone and configure

git clone  homelab
cd homelab

Find the machine’s local IP:

hostname -I | awk '{print $1}'
# e.g.: 192.168.1.10

Create and edit the root .env:

cp .env.example .env
# Edit .env:
# IP=192.168.1.10
# DOMAIN=sslip.io
# COMPOSE_PROJECT_NAME=dockge  ← important, see conventions section

Step 2: Docker prerequisites

docker network create traefik
docker volume create homelab_dockge_data

Step 3: start Dockge

echo "STACKS_DIR=$(pwd)/stacks" >> .env
docker compose up -d

Dockge is accessible at http://:5001. It is exposed directly on port 5001, not through Traefik (Traefik is not running yet at this point). Create an admin account on first launch.

Step 4: configure the stacks

For each directory in stacks/, copy the .env.example:

for stack in stacks/*/; do
    cp "${stack}.env.example" "${stack}.env"
done

Then edit each .env to set IP and DOMAIN to the same values as in step 1. The COMPOSE_PROJECT_NAME value is pre-filled with the folder name — do not change it (see conventions section).

For filebrowser, also set FILEBROWSER_ROOT to the local path to expose.

Step 5: start the stacks from Dockge

From the Dockge interface (http://:5001), in this order:

1. Traefik first

Traefik must be running before the other services. Without Traefik, routes don’t exist and services are unreachable via their URL.

After starting, verify Traefik is healthy:

docker ps --filter name=traefik

2. The other stacks in any order

Each stack automatically registers itself with Traefik via its Docker labels. Traefik discovers new containers in real time.

3. Homepage last

Homepage reads Docker labels from all running containers at startup to build the dashboard. Starting it last ensures it discovers all active services from the first launch.

4. Adding a new service

Here is the compose.yaml template for any new service:

services:
    myservice:
        image: vendor/myservice:latest
        restart: unless-stopped
        healthcheck:
            test: ["CMD-SHELL", "wget -qO- http://127.0.0.1:/ || exit 1"]
            interval: 30s
            timeout: 10s
            retries: 3
            start_period: 10s
        labels:
            # Homepage - auto-discovery in dashboard
            homepage.group: tools
            homepage.name: My Service
            homepage.icon: https://cdn.jsdelivr.net/gh/selfhst/icons/webp/myservice.webp
            homepage.href: http://${COMPOSE_PROJECT_NAME}.${IP}.${DOMAIN}

            # Traefik - HTTP routing
            traefik.enable: true
            traefik.http.routers.myservice.entrypoints: web
            traefik.http.routers.myservice.rule: Host(`${COMPOSE_PROJECT_NAME}.${IP}.${DOMAIN}`)
            traefik.http.services.myservice.loadbalancer.server.port: 
        networks:
            - traefik

networks:
    traefik:
        external: true

And the associated .env.example:

COMPOSE_PROJECT_NAME=myservice
IP=127.0.0.1
DOMAIN=sslip.io

The folder name determines the subdomain. If the folder is called myservice, the service will be accessible at myservice... That’s it.

To find services worth adding, selfh.st is an excellent resource: it’s a catalog of self-hosted software organized by category (media, security, productivity, monitoring…), with a description, screenshot, and GitHub link for each. The site also publishes a weekly newsletter on new releases.

Checklist for a new service

Create stacks//compose.yaml
Create stacks//.env.example with COMPOSE_PROJECT_NAME=
Copy .env.example to .env and fill in IP/DOMAIN
Check the port in the Traefik labels
Choose the Homepage group: infra, monitoring, tools
Find the icon on selfhst/icons
Add persistent data in a volume if needed
Start from Dockge and verify the container is healthy

5. Patterns and conventions

The `${COMPOSE_PROJECT_NAME}` variable

Docker Compose automatically sets COMPOSE_PROJECT_NAME to the stack folder name. We use it to build URLs dynamically:

traefik.http.routers.dozzle.rule: Host(`${COMPOSE_PROJECT_NAME}.${IP}.${DOMAIN}`)
homepage.href: http://${COMPOSE_PROJECT_NAME}.${IP}.${DOMAIN}

Advantage: no *_HOST variable to maintain in each .env. Renaming the folder automatically changes the subdomain.

Warning: in the .env, COMPOSE_PROJECT_NAME must be defined explicitly with the stack folder name. Without it, Docker Compose uses the current directory name at launch time, which can produce unexpected values depending on where the command is run from.

Homepage groups

Services are organized into three groups in the dashboard:

Group	Services
`infra`	Traefik , Dockge , Watchtower , Homepage
`monitoring`	Dozzle , Glances , Uptime Kuma
`tools`	FileBrowser , IT-Tools , Stirling PDF

This grouping is specific to this homelab, not an enforced convention. Homepage accepts any value for homepage.group: you can create as many groups as needed and name them however you like (media, home-automation, dev…). The dashboard reorganizes automatically.

Health checks

All services have a health check. This is crucial because Traefik silently ignores unhealthy containers: a service with a failing health check will not appear in routing, even with traefik.enable: true.

Three edge cases encountered in practice:

1. localhost does not always resolve to 127.0.0.1

In some minimal images, localhost is not resolved. Use 127.0.0.1 explicitly:

test: ["CMD-SHELL", "wget -qO- http://127.0.0.1:8080/ || exit 1"]

2. Images without a shell (scratch-based)

Images based on scratch (e.g. Dozzle) do not contain /bin/sh. CMD-SHELL fails. Use the embedded binary:

test: ["CMD", "/dozzle", "healthcheck"]

3. Images without wget or curl

Some Node.js or JVM images have neither wget nor curl. Possible solutions:

If Node.js is available: node -e "require('http').get('http://localhost:PORT', r => process.exit(r.statusCode < 400 ? 0 : 1)).on('error', () => process.exit(1))"
If curl is available: curl -fs http://127.0.0.1:PORT/
If the app binary exposes a healthcheck subcommand: use it directly.

Data persistence

For services that have data (configuration, user accounts, database):

volumes:
    - ./docker/data:/path/in/container

The ./docker/ folder lives inside the stack directory and can be versioned, except for runtime data which goes in .gitignore.

Rule: add stacks//docker/ to .gitignore if the folder contains data that should not be committed (SQLite databases, uploads…).

Traefik label conventions

By convention, the name used in Traefik labels (traefik.http.routers.) matches the Docker service name in compose.yaml. In practice, align it with the folder name:

stacks/it-tools/    →    service: ittools    →    traefik.http.routers.ittools.*

This is not a technical constraint from Traefik, just a readability convention.

6. Common pitfalls

Dockge: Stop then Start, not Restart

When a compose.yaml is modified from an IDE and the changes need to be applied, use Stop + Start from Dockge, not “Restart”. Restart restarts the existing container without re-reading the compose.yaml. Stop + Start recreates the container with the new configuration.

Modified labels: restart Homepage

Homepage reads Docker labels at startup. If homepage.group or homepage.name is changed for a service, Homepage won’t see it until it is restarted.

Container starts but is not routable

Check in order:

docker ps: is the container healthy? Traefik ignores unhealthy containers.
Is the container on the traefik network?

docker inspect  --format '{{json .NetworkSettings.Networks}}'

Is the label traefik.enable: true present?
Does the Host(...) rule match the URL being tested?

Mounting non-existent files under Docker Desktop / WSL

When Docker Desktop (WSL) mounts a file that does not yet exist on the host, it creates a directory instead. This ghost directory then blocks the mount of the actual file. Symptom: the container fails to start with a mount error.

Solution: ensure the file exists on the host before starting the container, or use a directory mount instead of a file mount.

Watchtower: Docker API too old

On some configurations, Watchtower tries to communicate with the daemon starting the negotiation at API v1.25 (its historical minimum). Recent versions of Docker reject this version. Symptom: the container restarts in a loop with client version 1.25 is too old. Minimum supported API version is 1.40.

Fix in the Watchtower compose.yaml:

environment:
    DOCKER_API_VERSION: "1.40"

1.40 is the value to use, regardless of your Docker version. It is not your exact version — it is the minimum the daemon accepts, as stated in the error message. To check the actual API version of your daemon:

docker version --format '{{.Server.APIVersion}}'

`${PWD}` in Dockge’s compose file

${PWD} is not a .env variable — it is a shell variable resolved at docker compose up time. It equals the current terminal directory. Running docker compose up -d from any other directory will produce a wrong value and break stack volume mounts.

This homelab is designed to run on a Linux machine or WSL. All commands have been tested on Ubuntu/WSL2 with Docker Desktop.

Conclusion

I’m well aware this tutorial doesn’t cover everything. We could have added authentication in front of each service, run the whole thing over HTTPS, set up a socket proxy to limit the Docker daemon’s exposure, or pinned precise image versions. But each of those points would have considerably lengthened the article and the complexity of the setup. The goal was to start with something functional and maintainable, not to build a fortress on day one.

The perfect homelab doesn’t exist. The one that runs, does.

guillaumedelre/homelab

Docker Compose homelab with Traefik — independent stacks, auto-configured dashboard, and zero DNS configuration using sslip.io.

References

Project	Link
sslip.io	sslip.io
selfh.st	selfh.st
Traefik	github.com/traefik/traefik
Dockge	github.com/louislam/dockge
Homepage	github.com/gethomepage/homepage
Dozzle	github.com/amir20/dozzle
Glances	github.com/nicolargo/glances
FileBrowser	github.com/gtsteffaniak/filebrowser
IT-Tools	github.com/CorentinTh/it-tools
Stirling PDF	github.com/Stirling-Tools/Stirling-PDF
Uptime Kuma	github.com/louislam/uptime-kuma
Watchtower	github.com/containrrr/watchtower
selfhst/icons	github.com/selfhst/icons

Local HTTPS with Traefik: traefik.me is dead, long live sslip.io

Thu, 17 Apr 2025 00:00:00 +0000

The setup seemed perfect. Point *.traefik.me at 127.0.0.1, download a wildcard certificate from the same domain, drop it into Traefik, and every local service gets a clean HTTPS URL with no IP in the address bar. No Let’s Encrypt rate limits, no mkcert to explain to teammates, no self-signed warnings to click through. Just https://myapp.traefik.me and a green padlock.

Then in March 2025, Let’s Encrypt revoked the certificate. The wildcard cert for traefik.me is gone and it’s not coming back.

What traefik.me was actually selling

traefik.me is a wildcard DNS resolver. Type anything.traefik.me and it resolves to 127.0.0.1. Type anything.10.0.0.1.traefik.me and it resolves to 10.0.0.1. No account, no configuration, no infrastructure to maintain. The DNS part still works fine, by the way.

The certificate was the bonus: a wildcard cert for *.traefik.me that pyrou, the maintainer, generated with Let’s Encrypt and distributed at https://traefik.me/cert.pem and https://traefik.me/privkey.pem. It was convenient precisely because it was shared: download, drop into Traefik, done.

Sharing a private key is why it died.

The CA/Browser Forum Baseline Requirements, section 9.6.3, require subscribers to “maintain sole control” over their private key. Distributing it to anyone who visits a URL is the exact opposite of sole control. Let’s Encrypt sent a notice, blocked future issuance for the domain, and revoked the existing certificate. Pyrou confirmed the situation and recommended mkcert as an alternative. The project will live on as a DNS resolver only.

The cert had already been revoked twice before 2025. Third time was the last.

sslip.io does the same thing, differently

sslip.io is also a wildcard DNS resolver, with one difference: the IP is encoded in the hostname rather than resolved from a fallback. 10-0-0-1.sslip.io resolves to 10.0.0.1. myapp.192-168-1-10.sslip.io resolves to 192.168.1.10. IPv6 works too.

The infrastructure behind sslip.io is also more visible: three nameservers in Singapore, the US, and Poland, handling over 10,000 requests per second, with public monitoring. About 1,000 GitHub stars and active maintenance under the Apache 2.0 licence.

Strip away the certificate story and the comparison is pretty straightforward:

	traefik.me	sslip.io
DNS wildcard	yes	yes
Fallback to 127.0.0.1	yes	no
IPv6	no	yes
Wildcard certificate	~~yes~~ revoked	no
Infrastructure	opaque	documented
Project activity	stalled	active

traefik.me’s only remaining advantage is the 127.0.0.1 fallback: URLs without an IP segment. That matters if you really want myapp.traefik.me instead of myapp.127-0-0-1.sslip.io. Whether that difference is worth the infrastructure uncertainty is a short conversation.

mkcert fills the gap

mkcert creates a local certificate authority, installs it in the system trust store and whatever browsers it finds, then issues certificates signed by that CA. Browsers see a trusted chain. No warning, no click-through, no “proceed anyway”.

mkcert -install

That’s the one-time setup. After that, generating a certificate is one command:

mkcert "*.127-0-0-1.sslip.io"
# produces _wildcard.127-0-0-1.sslip.io.pem
#          _wildcard.127-0-0-1.sslip.io-key.pem

The limitation is that mkcert’s CA is local. Other machines on the network won’t trust it by default. For a solo dev setup that’s fine. For a shared team environment, you’d need to distribute the CA root, which is essentially the same operational problem traefik.me was trying to avoid, just smaller in scope.

The Traefik configuration

The setup is the same regardless of which DNS service you pick. Traefik needs the certificate mounted as a volume and a static file provider pointing at a TLS configuration file.

# traefik/config/tls.yml
tls:
  certificates:
    - certFile: /certs/cert.pem
      keyFile: /certs/key.pem
  stores:
    default:
      defaultCertificate:
        certFile: /certs/cert.pem
        keyFile: /certs/key.pem

The key practice: run Traefik in its own Compose project, separate from the services it routes to. Each service project connects to Traefik through a shared external network. Start and stop services independently without touching the reverse proxy.

Start by creating the external network once:

docker network create traefik-public

traefik/compose.yml - Traefik alone, owning the network:

services:
  traefik:
    image: traefik:v3
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./config:/etc/traefik/config
      - ./certs:/certs
    command:
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
      - --providers.docker=true
      - --providers.docker.network=traefik-public
      - --providers.file.directory=/etc/traefik/config
    networks:
      - traefik-public

networks:
  traefik-public:
    external: true

Copy the mkcert output into ./certs/, rename to cert.pem and key.pem, then:

docker compose -f traefik/compose.yml up -d

Traefik is up, listening on 80 and 443, watching Docker for new containers. Nothing is routed yet.

whoami/compose.yml - a service that joins the same network:

services:
  whoami:
    image: traefik/whoami
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.whoami.rule=Host(`whoami.127-0-0-1.sslip.io`)"
      - "traefik.http.routers.whoami.tls=true"
      - "traefik.http.routers.whoami.entrypoints=websecure"
    networks:
      - traefik-public

networks:
  traefik-public:
    external: true

docker compose -f whoami/compose.yml up -d

Traefik detects the new container via the Docker provider, reads its labels, and adds the route. https://whoami.127-0-0-1.sslip.io responds immediately. Bring whoami down and the route disappears. Traefik keeps running without noticing.

The external: true declaration is the load-bearing line. Without it, Compose creates a project-scoped network: Traefik and whoami end up on different networks and can’t reach each other, even though both are running. The external network is the shared bus every service project must explicitly opt into.

If you prefer traefik.me URLs, replace the mkcert command and the host label:

mkcert "*.traefik.me"

- "traefik.http.routers.whoami.rule=Host(`whoami.traefik.me`)"

The DNS fallback to 127.0.0.1 handles the rest.

What the traefik.me story actually teaches

The certificate distribution model was always fragile. A “public-private key pair” is a contradiction in terms. Every revocation was a warning that the next one could be permanent. Eventually it was.

The lesson isn’t specific to traefik.me. Any service that provides convenience by quietly removing a security boundary will eventually hit that boundary. mkcert is the right tool for this problem because it operates entirely within your own trust domain: you generate the CA, you install it, you issue the certificates. Nothing depends on a third party’s continued willingness to bend certificate issuance rules.

sslip.io solves the DNS part cleanly. mkcert solves the TLS part cleanly. They compose well. The traefik.me setup was simpler, for a while. Until it wasn’t.

From Vagrant to Docker Compose: a retrospective

Mon, 18 Apr 2022 00:00:00 +0000

I ran Vagrant for years. A Vagrantfile per project, a shared base box, a provision script that worked on Tuesday but not on Thursday. The promise was simple: reproducible environments for everyone on the team. The reality was more complicated.

The Vagrant years

The setup made sense at the time. One VM per project, provisioned with shell scripts or Ansible, shared via a versioned Vagrantfile. Onboarding was theoretically vagrant up and you’re done.

In practice, it was vagrant up, wait four minutes, watch the provision fail on a package that changed its download URL, fix it, reprovision, wait again. Vagrantfiles accumulated configuration over time: workarounds for specific machines, OS version pinning, memory tweaks for the team member whose laptop had 8GB. The files became historical documents nobody wanted to touch.

The VM itself was the other problem. Booting took time. Running took memory and CPU that could have gone to the application. File syncing between host and guest added latency that made PHP apps feel slower than they had any right to be. The overhead was significant for what was ultimately just “run a web server.”

We lived with it because everyone did. Vagrant was the standard for local PHP development, and the alternative (each developer managing their own LAMP stack) was clearly worse.

The project that changed the model

The shift wasn’t a decision we made. It was a project that arrived already containerized.

A new client project had a docker-compose.yml at the root, a Dockerfile, and a README that said docker compose up. We ran it. The containers started in seconds. PHP-FPM, nginx, PostgreSQL, Redis: all running, all networked, no provisioning step. Stop the containers, start them again, same state.

The contrast with our Vagrant setup was immediate. Not faster by a percentage: faster by a different order. And the Compose file was actually readable: each service, its image, its volumes, its environment variables, its dependencies. Compared to a provision script that SSHed into a VM and ran apt-get, this was legible.

We migrated everything. Not gradually, all at once, over a sprint. Every project got a docker-compose.yml. Every Vagrantfile was deleted. The transition was the most painful three weeks of infrastructure work I remember, and also the most clearly worth it.

What docker-compose actually changed

Beyond the speed, Compose changed the mental model. Vagrant abstracted a machine. Compose abstracted a set of processes. The distinction matters: with Compose, you can stop the database without stopping the application server, scale a worker service independently, swap the PostgreSQL image for a newer version without touching anything else.

The services declaration also replaced the VM provisioning problem entirely. If a new developer joins, they don’t run a provision script that may or may not work on their OS version. They run docker compose up and get the exact same images everyone else runs.

CI/CD got simpler too. The same docker-compose.yml that ran locally could run in the pipeline. The environment parity that Vagrant promised but rarely delivered was actually real with Compose.

The quiet deprecation

For years, the command was docker-compose: a separate binary, installed independently from Docker itself, written in Python, versioned independently. We used it, it worked, nobody thought much about it.

At some point a colleague mentioned that Docker had integrated Compose directly into the docker CLI. The new command was docker compose, no hyphen, Go rewrite, bundled with Docker Desktop. The old docker-compose binary was deprecated.

We had been using v1 for two years after v2 shipped. Our CI scripts, our Makefiles, our documentation all said docker-compose. Nothing had broken because Docker maintained the old binary for a long time. But the ecosystem had moved on quietly, and we’d missed it.

The migration was trivial: a hyphen removed from every script, a few aliases updated. The lesson was less trivial. Infrastructure tooling evolves without ceremony. The announcement happened, the blog posts were written, the deprecation notices were there. We just weren’t paying attention.

The actual retrospective

Looking back across Vagrant → docker-compose → docker compose, the pattern is less about the tools and more about the defaults.

Vagrant defaulted to “it works on my VM.” The overhead of sharing that VM was permanent.

Compose defaulted to “it works in these containers.” The images are the artifacts; the host machine is irrelevant.

The hyphen between docker and compose was always cosmetic. What mattered was the shift from provisioned machines to declarative services. That shift happened the day we ran a project someone else containerized and realized we never wanted to go back.

Controlling a USB missile launcher over HTTP with FastAPI and Docker

Tue, 21 Feb 2017 00:00:00 +0000

The rule was simple: whoever breaks the CI build owes the team a coffee. It worked fine for a while. Then someone suggested we needed something with more immediate feedback. Something physical. Something that fires.

A Dream Cheeky Thunder appeared on a desk shortly after. Four foam missiles, a USB cable, and a very clear team consensus: hook it to the cluster, wire it to the build pipeline, and let the CI decide who deserves a volley.

The launcher needed to respond to HTTP calls from anywhere on the network. No driver, no GUI, no manual aiming. Just an endpoint that makes it shoot in the direction of the guilty party’s desk.

This is the story of dream-cheeky-thunder.

No SDK, no docs, no problem

Dream Cheeky never published a protocol spec. The launcher speaks raw USB HID, and the only starting point was a vendored Python script from 2012 floating around in forum threads. Vendor ID 0x2123, product ID 0x1010, and a handful of control bytes that someone had reverse engineered years before.

That was enough. The protocol is simple: send a byte sequence to move the motors, send another to fire. The tricky part is that the launcher has no position feedback. No encoders, no limit switches beyond the physical hard stops at the extremes. You drive it blind.

From USB to HTTP

The CI pipeline needed to trigger the launcher over the network. A local script wasn’t going to cut it — the launcher had to be reachable from any machine on the cluster, including the build server. So: a REST API.

FastAPI was the obvious choice. The targeting flow from the CI side ends up being three HTTP calls:

curl -X POST http://localhost:8000/park      # reset to known position
curl -X POST http://localhost:8000/yaw/20    # rotate toward guilty desk
curl -X POST "http://localhost:8000/fire?shots=2"

The /park call matters more than it looks. Since the launcher has no position feedback, the server estimates the current angle by tracking how long the motors have been running. That estimate drifts. Bumping the hardware, interrupting a command, or just the imprecision of time-based tracking — they all accumulate. Parking drives both motors against the physical hard stops at full sweep, which guarantees alignment regardless of what the server thinks it knows. Skip it, and your aim is a guess.

The full API reference is in the repo. There’s also a web UI if you prefer clicking over curl.

Docker knows nothing about USB

Running this in a Docker container on the cluster was where the fun really started: containers don’t see USB devices by default.

The devices mount in compose.yaml exposes the USB bus to the container:

devices:
  - /dev/bus/usb:/dev/bus/usb

Not enough. First run came back with USBError: [Errno 13] Access denied. The device node is there inside the container, but it inherits permissions from the host, and on the host only root can open it by default.

The fix is a udev rule. Drop one file into /etc/udev/rules.d/, and the kernel sets the right group and permissions when the device plugs in. After that, the container user can open it without needing elevated privileges. The rule ships with the project, setup instructions are in the docs.

WSL2 made it interesting

Half the team runs Windows with Docker Desktop on WSL2. That’s where things got creative.

WSL2 has no access to USB devices by default: the Windows kernel holds them, and the devices mount alone does nothing because WSL2 simply doesn’t see the hardware. The fix is usbipd-win, which forwards the USB device from Windows into the WSL2 kernel over IP. Once that’s done, the Linux path works exactly the same: udev rule, devices mount, done.

The attachment doesn’t survive reboots, though. usbipd v4+ added a policy mechanism that automates reconnection, which killed the “it worked yesterday” mystery that had been annoying us for days.

What actually surprised us

Time-based positioning works well enough. No encoders meant we went in expecting the angle tracking to be basically useless. Turns out, parking before every sequence kept it accurate enough to reliably aim at a specific desk. Not millimeter precision, but foam missile precision is fine.

The devices mount is necessary but not sufficient. The permission error was confusing precisely because the device was clearly visible inside the container. The udev rule is the bit most tutorials quietly skip.

The coffee rule was never the same after this. Once the launcher was wired to the pipeline, broken builds suddenly became a lot more motivating to fix.

guillaumedelre/dream-cheeky-thunder

FastAPI + Docker + PyUSB — HTTP control for the Dream Cheeky Thunder USB missile launcher. Pull requests welcome, especially if you have a better angle calibration approach.