<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Symfony to the Cloud: Twelve Factors, Thirteen Services on Guillaume Delré</title><link>https://guillaumedelre.github.io/series/symfony-to-the-cloud/</link><description>Recent content in Symfony to the Cloud: Twelve Factors, Thirteen Services on Guillaume Delré</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sun, 17 May 2026 15:00:00 +0000</lastBuildDate><atom:link href="https://guillaumedelre.github.io/series/symfony-to-the-cloud/index.xml" rel="self" type="application/rss+xml"/><item><title>Eleven Out of Twelve</title><link>https://guillaumedelre.github.io/2026/05/17/eleven-out-of-twelve/</link><pubDate>Sun, 17 May 2026 15:00:00 +0000</pubDate><guid>https://guillaumedelre.github.io/2026/05/17/eleven-out-of-twelve/</guid><description>Part 8 of 8 in &amp;quot;Symfony to the Cloud: Twelve Factors, Thirteen Services&amp;quot;: Eleven factors resolved cleanly. The twelfth: Doctrine migrations in the entrypoint, waiting on a governance question that code alone can&amp;#39;t answer.</description><category>symfony-to-the-cloud</category><content:encoded><![CDATA[<p>The <code>composer.json</code> in each service had this in its <code>post-install-cmd</code> section:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-json" data-lang="json"><span style="display:flex;"><span><span style="color:#e6db74">&#34;post-install-cmd&#34;</span><span style="color:#960050;background-color:#1e0010">:</span> [
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;bin/console cache:clear --env=prod&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#34;bin/console doctrine:migrations:migrate --no-interaction&#34;</span>
</span></span><span style="display:flex;"><span>]
</span></span></code></pre></div><p><code>post-install-cmd</code> runs during <code>composer install</code>, which in the production Dockerfile runs during the image build. There is no database available during a Docker build. The migration command either failed silently, or connected to nothing, or was skipped by Doctrine when it couldn&rsquo;t find a schema to compare against. In any case, it didn&rsquo;t migrate anything.</p>
<p>This is a clean violation of <a href="https://12factor.net/admin-processes" target="_blank" rel="noopener noreferrer">Factor XII</a>
: admin processes — migrations, one-off scripts, console tasks — should run in the same environment as the application, against the actual production data. Running them at build time inverts the relationship. The image shouldn&rsquo;t know about the database. The database should be there when the image needs it.</p>
<h2 id="the-move-to-the-entrypoint">The move to the entrypoint</h2>
<p>The migration command moved from <code>composer.json</code> to <code>docker-entrypoint.sh</code>. The shift looks small on a diff. The implications are not.</p>
<p>The entrypoint runs when the container starts, not when the image is built. The database is reachable. The entrypoint waits for it — up to 60 seconds, one attempt per second — before doing anything:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>ATTEMPTS_LEFT_TO_REACH_DATABASE<span style="color:#f92672">=</span><span style="color:#ae81ff">60</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">until</span> <span style="color:#f92672">[</span> $ATTEMPTS_LEFT_TO_REACH_DATABASE -eq <span style="color:#ae81ff">0</span> <span style="color:#f92672">]</span> <span style="color:#f92672">||</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span>  DATABASE_ERROR<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>php bin/console dbal:run-sql -q <span style="color:#e6db74">&#34;SELECT 1&#34;</span> 2&gt;&amp;1<span style="color:#66d9ef">)</span>; <span style="color:#66d9ef">do</span>
</span></span><span style="display:flex;"><span>    sleep <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    ATTEMPTS_LEFT_TO_REACH_DATABASE<span style="color:#f92672">=</span><span style="color:#66d9ef">$((</span>ATTEMPTS_LEFT_TO_REACH_DATABASE <span style="color:#f92672">-</span> <span style="color:#ae81ff">1</span><span style="color:#66d9ef">))</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">done</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> $ATTEMPTS_LEFT_TO_REACH_DATABASE -eq <span style="color:#ae81ff">0</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    echo <span style="color:#e6db74">&#34;</span>$DATABASE_ERROR<span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>    exit <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span></code></pre></div><p>If the database doesn&rsquo;t respond within 60 seconds, the container exits with an error and Kubernetes restarts it. Once the database is ready, the migration runs:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span><span style="color:#66d9ef">if</span> <span style="color:#f92672">[</span> <span style="color:#e6db74">&#34;</span><span style="color:#66d9ef">$(</span> find ./migrations -iname <span style="color:#e6db74">&#39;*.php&#39;</span> -print -quit <span style="color:#66d9ef">)</span><span style="color:#e6db74">&#34;</span> <span style="color:#f92672">]</span>; <span style="color:#66d9ef">then</span>
</span></span><span style="display:flex;"><span>    php bin/console doctrine:migrations:migrate --no-interaction --all-or-nothing
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">fi</span>
</span></span></code></pre></div><p>Two changes from the original command: <code>--all-or-nothing</code> ensures that if any migration in a batch fails, the entire batch rolls back. And the <code>find</code> guard skips the command entirely if there are no migration files — useful for services that don&rsquo;t use Doctrine migrations at all.</p>
<p>This is genuinely better. The database is present. The migration runs in the real environment. The <code>--all-or-nothing</code> flag adds atomicity that the build-time version never had.</p>
<h2 id="what-it-doesnt-solve">What it doesn&rsquo;t solve</h2>
<p>Two pods redeploying simultaneously both run the entrypoint. Both reach the database. Both find pending migrations. Both call <code>doctrine:migrations:migrate</code>.</p>
<p>Doctrine has a locking mechanism: a <code>doctrine_migration_versions</code> table that records which migrations have run, and the command checks it before applying. Under normal conditions this is fine: the second pod finds the table up to date and exits cleanly. The real failure modes are more specific: a migration long enough that the database lock times out before it completes, letting a second runner start the same migration before the first has finished; or a pod that crashes mid-migration before recording the version in the table, leaving the schema in an applied-but-unregistered state that the next pod will try to apply again.</p>
<p>The team&rsquo;s position is explicit: a brief deployment downtime is acceptable. Application versions aren&rsquo;t necessarily forward-compatible with older schema versions, so running N and N+1 simultaneously against the same database isn&rsquo;t safe anyway. The deployment strategy is Recreate: all old pods are terminated before any new pods start. The migration runs on first startup, no overlap between versions. It works.</p>
<p>But &ldquo;it works&rdquo; and &ldquo;it&rsquo;s the right architecture&rdquo; are different answers.</p>
<h2 id="what-would-be-different">What would be different</h2>
<p><a href="https://12factor.net/admin-processes" target="_blank" rel="noopener noreferrer">Factor XII</a>
 says admin processes should run in &ldquo;one-off processes.&rdquo; A process that runs once, for a specific purpose, against the production environment. The entrypoint is not one-off — it runs every time a container starts, including restarts, scaling events, and Kubernetes node movements.</p>
<p>Three alternatives exist, each with a different answer to the question of ownership:</p>
<p><strong>A Kubernetes init container</strong> runs before the main container starts, in the same pod. It could run the migration, exit, and let the main container start only after it succeeds. The migration is isolated from the application runtime. The downside: the init container is another image to build and maintain, and it runs on every pod start — so a 14-service platform starting simultaneously still has a potential race.</p>
<p><strong>A Kubernetes Job</strong> runs once, on demand or triggered by a deployment pipeline. It can be made to run before any pods are updated — serial, isolated, with a clear success or failure signal. The race condition goes away. The complexity moves to the deployment process: the Job must complete before the Deployment rollout begins, and the CI pipeline must coordinate both.</p>
<p><strong>A Helm hook</strong> is the same concept expressed declaratively in the Helm chart. A <code>pre-upgrade</code> hook runs the migration before the application pods are updated. It&rsquo;s the most idiomatic Kubernetes answer. It also means the Helm chart is now responsible for running migrations — a decision that belongs to whoever owns the chart.</p>
<p>That last sentence is why the entrypoint hasn&rsquo;t changed. Moving migrations out of the application means deciding that the deployment infrastructure — not the application itself — is responsible for the schema. It&rsquo;s a governance question as much as a technical one, and governance questions take longer to resolve than code changes.</p>
<h2 id="the-honest-end">The honest end</h2>
<p>The migration block in the entrypoint is two lines. Literally: the <code>if [ &quot;$( find ./migrations... )&quot; ]</code> guard, and the <code>php bin/console doctrine:migrations:migrate</code> that follows. Eleven other factors have clean resolutions. The cache moved to Redis. The logs go to stdout. The filesystem is an S3 bucket. The CI assembles production images from the same commit it tests. The secrets don&rsquo;t travel in image layers.</p>
<p>Factor XII has an answer. It&rsquo;s just not the final one.</p>
<p>The migrations run at startup, with a real database, with atomicity, with a bounded retry window. That&rsquo;s better than running at build time against nothing. Whether they eventually move to a Job or a Helm hook is a conversation about who owns the schema — a question that a <code>kubectl apply</code> can&rsquo;t answer.</p>
]]></content:encoded></item><item><title>Ready Is Not the Same as Started</title><link>https://guillaumedelre.github.io/2026/05/17/ready-is-not-the-same-as-started/</link><pubDate>Sun, 17 May 2026 10:00:00 +0000</pubDate><guid>https://guillaumedelre.github.io/2026/05/17/ready-is-not-the-same-as-started/</guid><description>Part 7 of 8 in &amp;quot;Symfony to the Cloud: Twelve Factors, Thirteen Services&amp;quot;: The entrypoint script that works perfectly in Docker Compose has five jobs. In Kubernetes, each of those jobs belongs somewhere else.</description><category>symfony-to-the-cloud</category><content:encoded><![CDATA[<p>The rolling deploy looked clean. A new pod started. Kubernetes saw the healthcheck pass — <code>php -v</code> returned zero — and began routing traffic to the new container.</p>
<p>For the next forty seconds — out of a possible sixty — that container was polling for the database.</p>
<p>Requests that landed on it during that window got errors. Not many — the window was short — but enough to show up as noise in the monitoring. The kind of noise that gets dismissed as a transient network issue and filed nowhere. The deploy succeeded. The pod eventually became ready. The mechanism that caused it was still there, waiting for the next deploy.</p>
<p>The entrypoint script does five things before FrankenPHP starts: copy a version file, verify the vendor directory, wait up to sixty seconds for the database, run pending migrations, install assets and set filesystem permissions. In Docker Compose, this is invisible. In Kubernetes, the gap becomes traffic.</p>
<h2 id="the-gap-between-started-and-ready">The gap between started and ready</h2>
<p>Kubernetes decides whether to send traffic to a pod by watching its readiness probe. A pod whose readiness probe passes receives requests. A pod whose readiness probe fails is removed from the load balancer rotation until it recovers. This is the mechanism that makes rolling deploys safe: Kubernetes doesn&rsquo;t cut over to a new pod until that pod says it&rsquo;s ready.</p>
<p>The compose.yaml defines a healthcheck on every service:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">healthcheck</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">test</span>: [ <span style="color:#e6db74">&#34;CMD&#34;</span>, <span style="color:#e6db74">&#34;php&#34;</span>, <span style="color:#e6db74">&#34;-v&#34;</span> ]
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">interval</span>: <span style="color:#ae81ff">30s</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">timeout</span>: <span style="color:#ae81ff">10s</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">retries</span>: <span style="color:#ae81ff">3</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">start_period</span>: <span style="color:#ae81ff">10s</span>
</span></span></code></pre></div><p><code>php -v</code> succeeds the moment the PHP binary is present — which is true from the first millisecond of container life. The <code>start_period: 10s</code> gives ten seconds before checks begin. But the entrypoint polling loop runs for up to sixty seconds before FrankenPHP even starts. At second ten, the healthcheck passes. The application is still waiting for the database.</p>
<p>The Dockerfile has a better signal:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-dockerfile" data-lang="dockerfile"><span style="display:flex;"><span><span style="color:#66d9ef">HEALTHCHECK</span> --start-period<span style="color:#f92672">=</span>60s CMD curl -f http://localhost:2019/metrics <span style="color:#f92672">||</span> exit <span style="color:#ae81ff">1</span><span style="color:#960050;background-color:#1e0010">
</span></span></span></code></pre></div><p>Port 2019 is Caddy&rsquo;s built-in metrics server, embedded directly in FrankenPHP. The endpoint is Prometheus-compatible and only responds once Caddy&rsquo;s HTTP stack is fully initialized and PHP workers are accepting connections. <code>php -v</code> exits in fifty milliseconds regardless of what the application is doing — it checks the binary, not the server. <code>:2019/metrics</code> only answers when the server is actually serving. It is also not an endpoint added just for the probe: every service in the platform already has it scraped by Prometheus, so the signal is live regardless of any healthcheck configuration.</p>
<p>That&rsquo;s closer. But in Kubernetes, the <code>HEALTHCHECK</code> instruction is ignored entirely. Kubernetes uses its own probe configuration. Without explicit probe definitions in the Kubernetes manifests, there are no readiness checks — and a pod is considered ready the moment its container starts.</p>
<p>Which means: pod starts, entrypoint begins polling, Kubernetes routes traffic, application is not yet serving. Requests arrive at a container that isn&rsquo;t ready to handle them.</p>
<h2 id="three-signals-three-questions">Three signals, three questions</h2>
<p>Kubernetes separates container lifecycle into three distinct questions, each with its own probe type:</p>
<p><strong>startupProbe</strong> — &ldquo;Has the application finished starting?&rdquo; Fires repeatedly until it passes, then hands off to liveness. Prevents the liveness probe from killing a container that&rsquo;s legitimately slow to initialize. For a container whose entrypoint can take sixty seconds, this is the right tool.</p>
<p><strong>readinessProbe</strong> — &ldquo;Is the application ready to handle requests?&rdquo; Fails and passes throughout the container&rsquo;s life. When it fails, the pod is removed from the load balancer. This is what makes a rolling deploy safe.</p>
<p><strong>livenessProbe</strong> — &ldquo;Is the application still alive?&rdquo; If it fails, Kubernetes restarts the container. Meant to catch hung processes, not slow startups.</p>
<p>The sixty-second polling loop belongs in the startupProbe&rsquo;s patience, not in application code:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">startupProbe</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">path</span>: <span style="color:#ae81ff">/metrics</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">port</span>: <span style="color:#ae81ff">2019</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">failureThreshold</span>: <span style="color:#ae81ff">12</span>    <span style="color:#75715e"># 12 attempts × 5s = 60s max</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">5</span>
</span></span></code></pre></div><p>Once the startupProbe passes, a readinessProbe on the same endpoint takes over — telling Kubernetes when the pod is safe to receive traffic — and a livenessProbe watches for hung processes. But the startupProbe is the one that absorbs the slow start. The entrypoint polling loop becomes redundant: its job was to keep the container alive while the database caught up. Without it, the application attempts to connect, fails, and the container exits — Kubernetes restarts the pod, and the startupProbe maintains its retry cycle until the database responds and the application starts cleanly. The retry responsibility moves from inside the entrypoint to the orchestrator, which is exactly where it belongs.</p>
<h2 id="the-migration-problem">The migration problem</h2>
<p>The polling loop is the most visible issue, but the migrations create a subtler one.</p>
<p>With a rolling deploy and two replicas, Kubernetes starts a new pod while the old one still serves traffic. Both pods run the same entrypoint. Both reach <code>doctrine:migrations:migrate</code>.</p>
<p>Doctrine&rsquo;s migration table tracks which migrations have already executed, so a completed migration won&rsquo;t run twice. But if two pods start simultaneously and both see a pending migration, both attempt to run it at the same time. Whether that&rsquo;s safe depends on the migration: additive schema changes are usually fine; destructive ones less so. And you don&rsquo;t get to choose which ones run on a deploy that didn&rsquo;t expect to coordinate. <code>--all-or-nothing</code> wraps migrations in a transaction and rolls back everything if one fails — it&rsquo;s about atomicity within a single run, not coordination across processes.</p>
<p>The cleaner approach separates the two concerns into two init containers: one that waits for the database, one that runs migrations. The main container starts only after both complete:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">initContainers</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">wait-for-db</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">image</span>: <span style="color:#ae81ff">authentication:latest</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;php&#34;</span>, <span style="color:#e6db74">&#34;bin/console&#34;</span>, <span style="color:#e6db74">&#34;dbal:run-sql&#34;</span>, <span style="color:#e6db74">&#34;-q&#34;</span>, <span style="color:#e6db74">&#34;SELECT 1&#34;</span>]
</span></span><span style="display:flex;"><span>    - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">migrate</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">image</span>: <span style="color:#ae81ff">authentication:latest</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">command</span>: [<span style="color:#e6db74">&#34;php&#34;</span>, <span style="color:#e6db74">&#34;bin/console&#34;</span>, <span style="color:#e6db74">&#34;doctrine:migrations:migrate&#34;</span>, <span style="color:#e6db74">&#34;--no-interaction&#34;</span>, <span style="color:#e6db74">&#34;--all-or-nothing&#34;</span>]
</span></span></code></pre></div><p>Both init containers reuse the application image. That&rsquo;s not waste: they need the same PHP binary and the same environment wiring to reach the database and resolve the migration classes. A lighter purpose-built image would reduce startup overhead, but would require maintaining a separate PHP installation in sync with the main image.</p>
<p>Even with init containers, multiple pods starting simultaneously — initial deploy, after a node failure, or under autoscaling pressure — will each attempt to run migrations. Solving that properly — through a Helm pre-upgrade hook, a <code>maxSurge: 0</code> strategy, or a separate migration Job — is a topic in itself. What matters here is that the entrypoint is the wrong place to host that decision: it can&rsquo;t coordinate across pods, and it ties migration execution to application startup in a way that&rsquo;s hard to untangle later. The question of which approach fits this codebase — and why the entrypoint hasn&rsquo;t been replaced — gets its own treatment in <a href="/2026/05/17/eleven-out-of-twelve/">the next article in this series</a>
.</p>
<p>Factor XII of the <a href="https://12factor.net/admin-processes" target="_blank" rel="noopener noreferrer">twelve-factor methodology</a> — admin processes run in the same environment as the application — is satisfied either way. The question is whether &ldquo;same environment&rdquo; means &ldquo;same entrypoint script&rdquo; or &ldquo;same image, separate process&rdquo;. In Kubernetes, the latter is safer.</p>
<h2 id="what-the-entrypoints-real-job-is">What the entrypoint&rsquo;s real job is</h2>
<p>Strip out the database wait (now a startupProbe or init container), the migrations (now an init container or Job), and the assets install (a build-time operation that belongs in the Dockerfile), and the entrypoint has one remaining job: start the application.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-sh" data-lang="sh"><span style="display:flex;"><span>exec docker-php-entrypoint <span style="color:#e6db74">&#34;</span>$@<span style="color:#e6db74">&#34;</span>
</span></span></code></pre></div><p>Factor IX of the twelve-factor app asks for fast startup and graceful shutdown. A container whose startup takes sixty seconds because it&rsquo;s waiting for external dependencies is not fast. It means rolling deploys are slow, recovery after a crash is slow, and horizontal scale-out creates a sixty-second gap before each new pod contributes.</p>
<p>Fast startup is not just a nice-to-have. It&rsquo;s what makes the rest of the cloud model work. When a pod can start in seconds, the orchestrator can scale aggressively and recover quickly. When it takes a minute, you add headroom everywhere — longer probe timeouts, larger deployment windows, more conservative scaling policies — and the system becomes rigid.</p>
<h2 id="the-docker-compose-tax">The Docker Compose tax</h2>
<p>The entrypoint accumulates these responsibilities for a reason. In Docker Compose, there is no init container concept. There is no startupProbe. Services declare <code>depends_on</code>, but without health conditions, that&rsquo;s just startup ordering — not readiness. The entrypoint fills the gap.</p>
<p>This is not a design flaw. It&rsquo;s a reasonable adaptation to the constraints of Docker Compose. The script works. It handles edge cases (the database timeout, unrecoverable errors, missing migrations directory). Someone tested it.</p>
<p>The issue is the assumption that the same script works equally well in Kubernetes. It runs. The application eventually starts. But it bypasses the probe system that makes Kubernetes deployments reliable, and it puts migration responsibility in a place where coordination across pods is difficult to reason about.</p>
<p>Several of the changes in this series — <a href="/2026/05/14/the-ghost-of-the-ci-runner/">media storage</a>
, <a href="/2026/05/14/what-survives-the-build/">secrets in image layers</a>
, <a href="/2026/05/15/no-witnesses/">log handlers</a>
, <a href="/2026/05/15/the-host-that-hid-the-graph/">service dependencies</a>
, <a href="/2026/05/16/fifteen-minutes-before-the-first-test/">CI environment parity</a>
, <a href="/2026/05/16/the-cache-that-was-lying-to-us/">cache adapters</a>
 — were changes to application code or configuration. This one is different. It requires the infrastructure to gain awareness of what &ldquo;ready&rdquo; means for this application, and it requires the entrypoint to give up responsibilities it currently owns.</p>
<p>That&rsquo;s a harder conversation. But the startupProbe is waiting for it.</p>
]]></content:encoded></item><item><title>The Cache That Was Lying to Us</title><link>https://guillaumedelre.github.io/2026/05/16/the-cache-that-was-lying-to-us/</link><pubDate>Sat, 16 May 2026 15:00:00 +0000</pubDate><guid>https://guillaumedelre.github.io/2026/05/16/the-cache-that-was-lying-to-us/</guid><description>Part 6 of 8 in &amp;quot;Symfony to the Cloud: Twelve Factors, Thirteen Services&amp;quot;: How a single config line blocked horizontal scaling across 13 Symfony microservices, and what the twelve-factor app had to say about it.</description><category>symfony-to-the-cloud</category><content:encoded><![CDATA[<p>The first time we ran two replicas of the same Symfony service behind a load balancer, everything looked fine. Health checks passed. Traffic split cleanly. Response times were good.</p>
<p>Then someone noticed the rate limiter was acting strange. Hit the API five times, get blocked. Hit it five more times on the next request, get through. Depending on which pod answered, you were a different person.</p>
<p>That was the cache talking. One config line, replicated across thirteen services, was blocking horizontal scaling entirely.</p>
<h2 id="one-config-file-thirteen-times">One config file, thirteen times</h2>
<p>We were preparing a platform of thirteen Symfony microservices to move to Kubernetes. The stack was already in good shape: FrankenPHP for the HTTP server, multi-stage Dockerfiles, a GitLab CI that pushed tagged images to a cloud registry. The pieces were there. We just needed to verify nothing would break when we started scaling pods horizontally.</p>
<p>A good checklist for that kind of audit is the <a href="https://12factor.net" target="_blank" rel="noopener noreferrer">twelve-factor app methodology</a> — twelve principles for building software that runs cleanly in cloud environments. Most factors were already covered without us doing anything deliberate about it.</p>
<p>Factor VII (port binding) came for free. FrankenPHP embeds Caddy directly into the PHP process. The container exposes its own HTTP endpoint, no Apache or Nginx to bolt on. The image is self-contained, which is exactly what the factor requires:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-dockerfile" data-lang="dockerfile"><span style="display:flex;"><span><span style="color:#66d9ef">HEALTHCHECK</span> --start-period<span style="color:#f92672">=</span>60s CMD curl -f http://localhost:2019/metrics <span style="color:#f92672">||</span> exit <span style="color:#ae81ff">1</span><span style="color:#960050;background-color:#1e0010">
</span></span></span></code></pre></div><p>Factor II (dependencies) was handled by <code>composer.json</code> and the Dockerfile extensions. Factor X (dev/prod parity) was covered enough for our scope: same image, same backing services locally and in CI, which is the part that actually matters for what we were auditing.</p>
<p>Then I got to Factor VI.</p>
<h2 id="the-problem-with-it-works-on-one-server">The problem with &ldquo;it works on one server&rdquo;</h2>
<p>Factor VI says processes must share nothing. Nothing written to disk between requests, nothing in local memory that another instance can&rsquo;t see. If you need to persist state, put it in a backing service — a database, a cache cluster, a queue. The process itself stays disposable.</p>
<p>I opened <code>authentication/config/packages/cache.yaml</code>. Then <code>content/config/packages/cache.yaml</code>. Then <code>media/config/packages/cache.yaml</code>.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">framework</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cache</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">app</span>: <span style="color:#ae81ff">cache.adapter.filesystem</span>
</span></span></code></pre></div><p>Thirteen services. Thirteen times, word for word.</p>
<p>Every instance of every service was writing its cache to the local filesystem. Which meant every pod had its own private cache, invisible to every other pod. When the load balancer sent a request to pod A, it got pod A&rsquo;s cached version of reality. Pod B had built its own. They might have been generated at different times, from different source data, or one of them might not have been built yet at all.</p>
<p>The rate limiter was the most visible symptom because it had a counter. But the same divergence affected every piece of data we were caching: serializer metadata, route collections, Doctrine result caches. Two users sending identical requests could get different responses depending on which node happened to pick up the connection.</p>
<h2 id="redis-was-already-there">Redis was already there</h2>
<p>This is the part that stings a little. Redis was already in the stack. Every service had it configured via SncRedisBundle:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#75715e"># config/packages/snc_redis.yaml — present on all 13 services</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">snc_redis</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">clients</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">default</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#39;phpredis&#39;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">alias</span>: <span style="color:#e6db74">&#39;default&#39;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">dsn</span>: <span style="color:#e6db74">&#39;%env(IN_MEM_STORE__URI)%&#39;</span>
</span></span></code></pre></div><p>Factor IV of the twelve-factor app says backing services should be attached resources, interchangeable through configuration. Redis was exactly that: reachable via an environment variable, ready to be swapped for a managed instance in the cloud. The plumbing was done. We just weren&rsquo;t using it for the application cache.</p>
<p>Some services even had it right for specific pools. The rate limiter in the authentication service:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">pools</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">rate_limiter.cache</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">adapter</span>: <span style="color:#ae81ff">cache.adapter.redis</span>
</span></span></code></pre></div><p>Which explains the inconsistency we saw first. The rate limit <em>count</em> went to Redis (shared across pods). The cache backing the rate limit <em>check</em> went to the filesystem (local to the pod). Two sources of truth, one invisible to the other.</p>
<p>The fix was one line per service:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">framework</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">cache</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">app</span>: <span style="color:#ae81ff">cache.adapter.redis</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">default_redis_provider</span>: <span style="color:#ae81ff">snc_redis.default</span>
</span></span></code></pre></div><p>Thirteen files. Thirteen identical changes. The kind of fix that makes you feel like you should have caught it earlier, except it&rsquo;s perfectly invisible when you&rsquo;re running a single instance.</p>
<h2 id="what-needs-to-move-to-redis">What needs to move to Redis</h2>
<p>The filesystem cache violated Factor VI (processes carry local state they shouldn&rsquo;t) and Factor VIII (you can&rsquo;t scale out without sharing that state). They&rsquo;re the same problem seen from two angles: VI describes what&rsquo;s wrong, VIII describes what you can&rsquo;t do because of it.</p>
<p>With a shared cache backend, a second pod is safe. The two pods build the same cache, see the same invalidations, agree on the same rate limits. You can add a third pod under load and remove it when traffic drops. The orchestrator handles it; the application doesn&rsquo;t need to know.</p>
<p>Without it, horizontal scaling is a liability. More pods means more divergence, more &ldquo;works on my machine&rdquo; bugs that are impossible to reproduce locally because local only runs one container.</p>
<p>Sessions had the same problem — and potentially a worse one. Twelve of the thirteen services were using <code>session.storage.factory.native</code> — which writes sessions to the filesystem by default. A user whose request lands on pod A gets a session tied to pod A. Their next request goes to pod B. Session gone, they&rsquo;re logged out. Only one service had <code>RedisSessionHandler</code> configured.</p>
<p>The partial mitigation is that most of the platform runs stateless JWT-based APIs, so session usage is limited. But &ldquo;limited&rdquo; isn&rsquo;t &ldquo;zero&rdquo;. The services that do create sessions — authentication flows, temporary state during OAuth handshakes — have a user-visible failure mode waiting for the second pod. Either those sessions get moved to Redis, or the code that creates them gets removed. Leaving them as-is is a decision that waits for the first user whose session disappears without explanation.</p>
<h2 id="the-other-kind-of-state">The other kind of state</h2>
<p>Redis fixes the cross-pod problem. FrankenPHP introduces a different one worth knowing about.</p>
<p>In the standard PHP-FPM model, each request forks a fresh process. Every in-memory object — every cached value, every computed result — dies with the response. The process is stateless by construction.</p>
<p>FrankenPHP has a worker mode that doesn&rsquo;t follow that model. In worker mode, a single PHP process boots once, loads the kernel, wires the container, and handles multiple successive requests without restarting. Request throughput improves: no autoloader cold start, no container rebuild per request, fewer allocations. The tradeoff is that the PHP process now has a lifecycle that spans requests.</p>
<p>For cache, this adds a wrinkle. An <code>array</code> adapter or APCu pool accumulates entries across requests on the same worker. A cache invalidation pushed to Redis reaches the other pods immediately — but doesn&rsquo;t clear what&rsquo;s sitting in a worker&rsquo;s in-process memory. Two requests on the same pod can see different things: one hits a warm in-memory entry, the next triggers a Redis fetch after the in-process entry expires.</p>
<p>The platform keeps worker mode disabled (<code>APP__WORKER_MODE__ENABLED=false</code>). It&rsquo;s available — the infrastructure is there, the flag is wired — but it&rsquo;s not active. The performance gain didn&rsquo;t justify the audit. Every cache pool would need to be verified against worker-mode semantics; every place where state leaks between requests would become a potential bug.</p>
<p>The conservative position: keep PHP stateless at the process level even when the runtime doesn&rsquo;t require it. Factor VI&rsquo;s shared-nothing principle applies not just to the filesystem — it applies to the process itself.</p>
<h2 id="what-was-already-working">What was already working</h2>
<p>To be fair to the codebase: the Symfony Scheduler was already using Redis for distributed locks:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-php" data-lang="php"><span style="display:flex;"><span>$schedule<span style="color:#f92672">-&gt;</span><span style="color:#a6e22e">lock</span>($this<span style="color:#f92672">-&gt;</span><span style="color:#a6e22e">lockFactory</span><span style="color:#f92672">-&gt;</span><span style="color:#a6e22e">createLock</span>(<span style="color:#e6db74">&#39;schedule_purge&#39;</span>));
</span></span></code></pre></div><p>In a multi-pod environment, you don&rsquo;t want five instances running the same purge job simultaneously. The lock prevents it. Redis makes the lock visible across pods. Whoever wrote the scheduler knew exactly what they were doing.</p>
<p>The same reasoning just hadn&rsquo;t propagated to the cache configuration — probably because when you&rsquo;re running a single instance, <code>cache.adapter.filesystem</code> is invisible. It works, it&rsquo;s fast, it requires zero configuration. The problem only appears at two.</p>
<h2 id="the-four-questions">The four questions</h2>
<p>Factor VI catches most applications off guard during a cloud migration. Not because developers don&rsquo;t know about stateless processes — they usually do — but because the filesystem is always there, and the problem stays hidden until you try to run a second instance.</p>
<p>Before scaling a Symfony service horizontally, four questions are worth answering:</p>
<ul>
<li>Where does the application cache go? (<code>cache.adapter.filesystem</code> needs to become <code>cache.adapter.redis</code>)</li>
<li>Where do sessions go? (<code>session.storage.factory.native</code> needs Redis — or remove sessions entirely if you&rsquo;re JWT-only)</li>
<li>Does anything write to <code>var/</code> at runtime that another pod would need to read?</li>
<li>Is anything in your code path that needs to be mutually exclusive across pods? (if yes, that&rsquo;s a job for the <a href="https://symfony.com/doc/current/components/lock.html" target="_blank" rel="noopener noreferrer">Symfony Lock component</a> backed by Redis, not a local mutex)</li>
</ul>
<p>If the answers all point to shared backing services, you&rsquo;re ready. If any of them points to the local filesystem, production will find the pod that built its cache three hours ago and serve it to the user who least expects it.</p>
]]></content:encoded></item><item><title>Fifteen Minutes Before the First Test</title><link>https://guillaumedelre.github.io/2026/05/16/fifteen-minutes-before-the-first-test/</link><pubDate>Sat, 16 May 2026 10:00:00 +0000</pubDate><guid>https://guillaumedelre.github.io/2026/05/16/fifteen-minutes-before-the-first-test/</guid><description>Part 5 of 8 in &amp;quot;Symfony to the Cloud: Twelve Factors, Thirteen Services&amp;quot;: How a CI pipeline that provisioned an Azure VM per run — missing RabbitMQ, MinIO, and Varnish — became one that assembles the production environment from the same images it ships.</description><category>symfony-to-the-cloud</category><content:encoded><![CDATA[<p>The pipeline had two stages that had nothing to do with code: <code>provision</code> and <code>deprovision</code>. Between them, in sequence, came <code>phpunit</code>, <code>phpmetrics</code>, and <code>behat</code>.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">stages</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">build</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">provision</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">phpunit</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">phpmetrics</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">behat</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">deprovision</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">deploy</span>
</span></span></code></pre></div><p>Before the first assertion ran, fifteen minutes had passed. Terraform had cloned an infrastructure repository, authenticated to Azure, and applied a VM configuration. Ansible had connected to the new VM, installed PHP, configured the application, wired up a database and a Redis instance. Then the tests ran. Then Terraform destroyed what Ansible had built.</p>
<p>For every pipeline. From every branch. For every pull request, from open to merge.</p>
<h2 id="what-those-fifteen-minutes-were-missing">What those fifteen minutes were missing</h2>
<p>The <code>provision</code> stage set up two services: PostgreSQL and Redis. Three services that the application depended on in production were absent: RabbitMQ, MinIO, and Varnish.</p>
<p>RabbitMQ processed all asynchronous work — 56 consumers across 14 microservices. MinIO handled media storage. Varnish fronted the HTTP cache. In CI, none of them existed. Tests that exercised message queuing or file storage had two options: skip these paths, or leave them untested until staging. Varnish is a different case: tests hit the application directly and intentionally bypass the cache layer, so its absence in CI is a deliberate choice rather than a gap.</p>
<p>This is the problem <a href="https://12factor.net/dev-prod-parity" target="_blank" rel="noopener noreferrer">Factor X</a>
 describes as the environment gap. The gap here wasn&rsquo;t a matter of configuration — it was structural. The VM was built by Ansible from a script in a separate repository. It wasn&rsquo;t a container image. It wasn&rsquo;t versioned alongside the application. If a branch modified the RabbitMQ message topology, there was no way to test that modification in CI. The topology change and the code that relied on it would only meet in staging.</p>
<p>The Ansible provisioning script itself is part of the problem:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">launch_vm</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">stage</span>: <span style="color:#ae81ff">provision</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">script</span>:
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">git clone git@gitlab.internal/infra/ci-vm.git</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">cd ci-vm</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">az login --service-principal -u $ARM_CLIENT_ID ...</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">terraform apply -var &#34;prefix=${CI_PIPELINE_ID}-vm&#34; ...</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">sleep 45</span>
</span></span><span style="display:flex;"><span>    - <span style="color:#ae81ff">ansible-playbook behat/test-env.yml ...</span>
</span></span></code></pre></div><p>The <code>sleep 45</code> is there because Ansible needs the VM to finish booting before it can connect. It&rsquo;s not an oversight — it&rsquo;s the minimum time a freshly provisioned VM needs before SSH works. It&rsquo;s baked into the process.</p>
<h2 id="what-replaced-it">What replaced it</h2>
<p>The new pipeline has no <code>provision</code> stage. It has no <code>deprovision</code> stage. The environment is the images, and the images exist before the tests begin.</p>
<p>Each test job declares its dependencies as Docker services:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">$REGISTRY_URL/platform/rabbitmq:$CI_COMMIT_REF_SLUG</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">alias</span>: <span style="color:#ae81ff">rabbitmq</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">$REGISTRY_URL/platform/minio:$CI_COMMIT_REF_SLUG</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">alias</span>: <span style="color:#ae81ff">minio</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">redis:7.4.1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">alias</span>: <span style="color:#ae81ff">redis</span>
</span></span><span style="display:flex;"><span>  - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">$ARTIFACTORY_URL/postgresql:13</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">alias</span>: <span style="color:#ae81ff">postgresql</span>
</span></span></code></pre></div><p>The services start in parallel when the job begins. Before the test script runs, a <code>before_script</code> waits for all of them to be ready:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">before_script</span>:
</span></span><span style="display:flex;"><span>  - <span style="color:#ae81ff">$CI_PROJECT_DIR/dockerize</span>
</span></span><span style="display:flex;"><span>      -<span style="color:#ae81ff">wait tcp://postgresql:5432</span>
</span></span><span style="display:flex;"><span>      -<span style="color:#ae81ff">wait tcp://rabbitmq:5672</span>
</span></span><span style="display:flex;"><span>      -<span style="color:#ae81ff">wait tcp://minio:9000</span>
</span></span><span style="display:flex;"><span>      -<span style="color:#ae81ff">wait tcp://redis:6379</span>
</span></span><span style="display:flex;"><span>      -<span style="color:#ae81ff">timeout 120s</span>
</span></span></code></pre></div><p>From pipeline start to first assertion: ninety seconds — assuming images are already cached on the runner; a cold pull adds time, but becomes negligible once the pipeline has run once on a given branch.</p>
<h2 id="what-ci_commit_ref_slug-means">What <code>$CI_COMMIT_REF_SLUG</code> means</h2>
<p>The timing is the visible result. What produces it is more interesting: the image names.</p>
<p><code>$REGISTRY_URL/platform/rabbitmq:$CI_COMMIT_REF_SLUG</code> is not the official RabbitMQ image from Docker Hub. It&rsquo;s an image built by the same pipeline, from the same branch, at the same commit as the code being tested. The RabbitMQ image carries the topology: a <code>definitions.json</code> with every exchange, every queue, every binding, every dead-letter configuration — versioned in git alongside the application that depends on them.</p>
<p>If a branch modifies the messaging topology, the CI pipeline builds a new RabbitMQ image that includes those modifications, then runs the tests against it. The topology change and the code that relies on it are tested together, at the same commit, before anything reaches staging.</p>
<p>The same logic applies to MinIO, as described in the <a href="/2026/05/14/the-ghost-of-the-ci-runner/">first article in this series</a>
: the MinIO image carries preloaded test fixtures. The CI environment doesn&rsquo;t need a setup step to populate storage. The state is built in.</p>
<p>The test runner itself follows the same pattern. Each job uses a debug variant of the application image — built from the same branch, same commit — with the test dependencies included:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">image</span>: <span style="color:#ae81ff">$REGISTRY_URL/platform/$service:$CI_COMMIT_REF_SLUG-debug</span>
</span></span></code></pre></div><p>The whole environment assembles from artifacts built at the same point in the git history.</p>
<h2 id="what-this-required-dropping">What this required dropping</h2>
<p>Behat and the provisioned VM were coupled. The Behat test suite ran against an HTTP server on the VM; removing the VM meant removing Behat.</p>
<p>That turned out not to be the obstacle it looked like. The Behat suite lived in a separate repository, required the VM to run, and had accumulated significant maintenance overhead. PHPUnit, running inside the application container with Docker services, covered the same scenarios through a more direct path: functional tests exercising the HTTP layer, unit tests for individual components, suites organized per feature area and generated dynamically into parallel CI jobs.</p>
<p>The BDD layer went away. The test coverage stayed — and could now run against the actual services.</p>
<h2 id="factor-x-applied">Factor X, applied</h2>
<p><a href="https://12factor.net/dev-prod-parity" target="_blank" rel="noopener noreferrer">Factor X</a>
 is often read as &ldquo;use the same database locally as in production.&rdquo; That&rsquo;s the simplest version. The deeper version is about the gap between what you test and what you ship.</p>
<p>The gap in the old pipeline was wide: a manually configured VM, missing key services, rebuilt from scratch on every run. The gap in the new pipeline is narrow: the CI assembles the environment from the same images as production, built from the same commit as the code under test.</p>
<p>The fifteen minutes of Terraform and Ansible were not just slow. They were building something that wasn&rsquo;t what production ran, every time, before any test could begin. The ninety seconds of <code>docker pull</code> build exactly what production runs — and the tests that follow are testing that, not an approximation of it.</p>
]]></content:encoded></item><item><title>The Host That Hid the Graph</title><link>https://guillaumedelre.github.io/2026/05/15/the-host-that-hid-the-graph/</link><pubDate>Fri, 15 May 2026 15:00:00 +0000</pubDate><guid>https://guillaumedelre.github.io/2026/05/15/the-host-that-hid-the-graph/</guid><description>Part 4 of 8 in &amp;quot;Symfony to the Cloud: Twelve Factors, Thirteen Services&amp;quot;: Thirteen services sharing six identical gateway variables. The config looked simple. The dependency graph was invisible — until Kubernetes asked where each service actually lived.</description><category>symfony-to-the-cloud</category><content:encoded><![CDATA[<p>Every service in the platform had these six variables:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>APP__GATEWAY__PRIVATE__HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;platform.internal&#34;</span>
</span></span><span style="display:flex;"><span>APP__GATEWAY__PRIVATE__PORT<span style="color:#f92672">=</span><span style="color:#ae81ff">80</span>
</span></span><span style="display:flex;"><span>APP__GATEWAY__PRIVATE__SCHEME<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;http&#34;</span>
</span></span><span style="display:flex;"><span>APP__GATEWAY__PUBLIC__HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;platform.internal&#34;</span>
</span></span><span style="display:flex;"><span>APP__GATEWAY__PUBLIC__PORT<span style="color:#f92672">=</span><span style="color:#ae81ff">80</span>
</span></span><span style="display:flex;"><span>APP__GATEWAY__PUBLIC__SCHEME<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;http&#34;</span>
</span></span></code></pre></div><p>Thirteen services, six variables each, one value. Reading any service&rsquo;s configuration, the architecture looked flat. Everything talked to the same host. That was the whole picture.</p>
<p>It wasn&rsquo;t.</p>
<h2 id="how-the-gateway-worked">How the gateway worked</h2>
<p>The gateway sat in front of every service and handled all inter-service traffic. A service calling the content API would construct a request to <code>http://platform.internal/content/api/</code> — the gateway received it, identified the target from the URL path, and forwarded it to the right backend. Every inter-service HTTP client in <code>framework.yaml</code> followed the same pattern:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">content.client</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">base_uri</span>: <span style="color:#e6db74">&#34;%http_client.gateway.base_uri%/content/api/&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">headers</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">Host</span>: <span style="color:#e6db74">&#34;%env(APP__GATEWAY__PRIVATE__HOST)%&#34;</span>
</span></span></code></pre></div><p>The <code>http_client.gateway.base_uri</code> parameter was assembled from the GATEWAY vars. The gateway knew where each service ran. The services didn&rsquo;t need to know. From their perspective, everything was <code>platform.internal</code>.</p>
<p>This worked. For years, it worked well. Adding a service meant adding one DNS alias in the gateway config, not touching thirteen <code>.env</code> files. The gateway abstracted the topology. The services stayed decoupled from the infrastructure detail of who ran where.</p>
<h2 id="what-the-gateway-was-absorbing">What the gateway was absorbing</h2>
<p>The abstraction had a cost that didn&rsquo;t show up until you tried to read the system.</p>
<p>Looking at <code>content</code>&rsquo;s env file, you saw six gateway variables and nothing else about inter-service communication. To find out that <code>content</code> called <code>conversion</code>, <code>shorty</code>, and <code>media</code>, you had to read <code>framework.yaml</code>. To find out that <code>pilot</code> called ten external services, you had to trace through the HTTP clients one by one and count.</p>
<p>The number was ten. Authentication, bam, config, content, conversion, media, product, shorty, sitemap, social. Ten of the platform&rsquo;s thirteen services that <code>pilot</code> depended on at runtime, none of them visible from its configuration. Six variables said: talk to the gateway. They said nothing about the shape of what lay behind it.</p>
<p>That information existed — in the code, in the framework config, in the heads of the people who had built those integrations. It just didn&rsquo;t live anywhere you could read at a glance.</p>
<h2 id="what-kubernetes-made-explicit">What Kubernetes made explicit</h2>
<p>On-premise, the gateway was a single resolvable hostname. One DNS record, one set of variables, one place to update. Kubernetes doesn&rsquo;t work that way. Each service gets its own DNS name inside the cluster — <code>content.namespace.svc.cluster.local</code>, <code>conversion.namespace.svc.cluster.local</code>. Inter-service traffic goes directly, service to service, not through a shared gateway.</p>
<p>Moving to Kubernetes meant the gateway abstraction had to give way. Each service needed to know, concretely, where each of its dependencies lived. The six generic variables couldn&rsquo;t express that.</p>
<p>The refactor replaced them with per-target HOST variables — one per service dependency, named for the target:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e"># content/.env — content calls these four services</span>
</span></span><span style="display:flex;"><span>APP__CONFIG__HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;platform.internal&#34;</span>
</span></span><span style="display:flex;"><span>APP__CONVERSION__HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;platform.internal&#34;</span>
</span></span><span style="display:flex;"><span>APP__MEDIA__HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;platform.internal&#34;</span>
</span></span><span style="display:flex;"><span>APP__SHORTY__HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;platform.internal&#34;</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e"># pilot/.env — ten service dependencies</span>
</span></span><span style="display:flex;"><span>APP__AUTHENTICATION__HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;platform.internal&#34;</span>
</span></span><span style="display:flex;"><span>APP__BAM__HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;platform.internal&#34;</span>
</span></span><span style="display:flex;"><span>APP__CONFIG__HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;platform.internal&#34;</span>
</span></span><span style="display:flex;"><span>APP__CONTENT__HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;platform.internal&#34;</span>
</span></span><span style="display:flex;"><span>APP__CONVERSION__HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;platform.internal&#34;</span>
</span></span><span style="display:flex;"><span>APP__MEDIA__HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;platform.internal&#34;</span>
</span></span><span style="display:flex;"><span>APP__PRODUCT__HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;platform.internal&#34;</span>
</span></span><span style="display:flex;"><span>APP__SHORTY__HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;platform.internal&#34;</span>
</span></span><span style="display:flex;"><span>APP__SITEMAP__HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;platform.internal&#34;</span>
</span></span><span style="display:flex;"><span>APP__SOCIAL__HOST<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;platform.internal&#34;</span>
</span></span></code></pre></div><p>Each HTTP client in <code>framework.yaml</code> got its own <code>base_uri</code> built from its target&rsquo;s HOST variable, and the <code>Host</code> header gave way to a <code>User-Agent</code> that identified the caller:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">content.client</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">base_uri</span>: <span style="color:#e6db74">&#34;%env(APP__HTTP__SCHEME)%://%env(APP__CONTENT__HOST)%:%env(APP__HTTP__PORT)%/content/api/&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">headers</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">User-Agent</span>: <span style="color:#e6db74">&#34;Platform Content - %semver%&#34;</span>
</span></span></code></pre></div><p>The change isn&rsquo;t cosmetic. In the old setup, the explicit <code>Host</code> header ensured requests reached the correct gateway virtual host regardless of URL resolution. In the new setup, each client points directly at its target&rsquo;s DNS name — the right <code>Host</code> is derived from the <code>base_uri</code> automatically. The header slot doesn&rsquo;t go empty: <code>User-Agent</code> now identifies the calling service, which surfaces in logs and distributed traces without any additional instrumentation.</p>
<h2 id="the-discomfort-of-legibility">The discomfort of legibility</h2>
<p><code>pilot</code>&rsquo;s env file went from nine gateway variables to ten service-specific HOST variables. The file got longer. The architecture didn&rsquo;t get simpler — the ten dependencies were there before and they&rsquo;re still there now. What changed is that they&rsquo;re readable.</p>
<p><a href="https://12factor.net/config" target="_blank" rel="noopener noreferrer">Factor III</a>
 says to store configuration in the environment. The old approach satisfied that literally: six variables, all in env files, none hardcoded. But variables that collapse the entire dependency graph into a single opaque hostname aren&rsquo;t really configuration — they&rsquo;re a shorthand that trades legibility for convenience. Factor III doesn&rsquo;t ask only that configuration be externalized — it implicitly assumes the externalized configuration remains informative.</p>
<p>The refactor didn&rsquo;t simplify anything. It made the complexity visible. <code>pilot</code>&rsquo;s ten HOST variables document, in the <code>.env</code> file itself, the ten services it depends on. A new team member reading that file learns something real about the architecture. The old file taught them that there was a gateway.</p>
<p>There&rsquo;s a version of this story where you read the final state and conclude the team did unnecessary work — they replaced six variables with ten, all pointing at the same host anyway. In local development, <code>platform.internal</code> still resolves to the same place. The functional behavior didn&rsquo;t change.</p>
<p>The change is in what the configuration communicates. In Kubernetes, the HOST values diverge: each target gets its own cluster-internal DNS name, different per environment. The variables now carry real information. The refactor prepared the config to be honest about a topology it had been quietly simplifying for years.</p>
]]></content:encoded></item><item><title>No Witnesses</title><link>https://guillaumedelre.github.io/2026/05/15/no-witnesses/</link><pubDate>Fri, 15 May 2026 10:00:00 +0000</pubDate><guid>https://guillaumedelre.github.io/2026/05/15/no-witnesses/</guid><description>Part 3 of 8 in &amp;quot;Symfony to the Cloud: Twelve Factors, Thirteen Services&amp;quot;: A service crashed in production and left no logs behind. Here is why fingers_crossed and cloud deployments do not mix well.</description><category>symfony-to-the-cloud</category><content:encoded><![CDATA[<p>The service had crashed. We had the alert. We had the timestamp down to the second. We had <a href="https://grafana.com/oss/loki/" target="_blank" rel="noopener noreferrer">Loki</a> open and a query ready.</p>
<p>What we didn&rsquo;t have was any logs from the five minutes before the crash.</p>
<p>Promtail was running. It was healthy. It had been collecting logs from every other service without issue. But for this one, in the window that mattered, there was nothing. The service had crashed without leaving a trace.</p>
<h2 id="the-setup-that-looked-correct">The setup that looked correct</h2>
<p>The logging stack was reasonable. Each service wrote structured JSON to stdout using Monolog&rsquo;s logstash formatter:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">stdout</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">stream</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;php://stdout&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">level</span>: <span style="color:#e6db74">&#34;%env(MONOLOG_LEVEL__DEFAULT)%&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">formatter</span>: <span style="color:#e6db74">&#39;monolog.formatter.logstash&#39;</span>
</span></span></code></pre></div><p><a href="https://grafana.com/docs/loki/latest/" target="_blank" rel="noopener noreferrer">Promtail</a> collected container output via the Docker socket, parsed the JSON, extracted labels, pushed to Loki:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">scrape_configs</span>:
</span></span><span style="display:flex;"><span>    -
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">job_name</span>: <span style="color:#ae81ff">docker</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">docker_sd_configs</span>:
</span></span><span style="display:flex;"><span>            -
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">host</span>: <span style="color:#ae81ff">unix:///var/run/docker.sock</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">refresh_interval</span>: <span style="color:#ae81ff">5s</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">pipeline_stages</span>:
</span></span><span style="display:flex;"><span>            -
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">drop</span>:
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">older_than</span>: <span style="color:#ae81ff">168h</span>
</span></span><span style="display:flex;"><span>            -
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">json</span>:
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">expressions</span>:
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">level</span>: <span style="color:#ae81ff">level</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">msg</span>: <span style="color:#ae81ff">message</span>
</span></span><span style="display:flex;"><span>                        <span style="color:#f92672">service</span>: <span style="color:#ae81ff">service</span>
</span></span><span style="display:flex;"><span>            -
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">level</span>:
</span></span><span style="display:flex;"><span>                    <span style="color:#f92672">service</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">relabel_configs</span>:
</span></span><span style="display:flex;"><span>            -
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">source_labels</span>: [ <span style="color:#e6db74">&#39;__meta_docker_container_log_stream&#39;</span> ]
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">target_label</span>: <span style="color:#ae81ff">stream</span>
</span></span></code></pre></div><p>Two stages in that pipeline do more work than the others. The <code>json</code> stage extracts <code>level</code> and <code>service</code> from each log line; the <code>labels</code> stage immediately following promotes them to Loki index labels, making <code>{service=&quot;content&quot;, level=&quot;error&quot;}</code> a direct index lookup rather than a full-text scan across stored lines. The <code>stream</code> relabeling preserves whether a line came from stdout or stderr — a distinction that becomes queryable once Monolog sends errors to stderr and everything else to stdout. The <code>drop older_than: 168h</code> stage is a safety valve: if Promtail restarts after a long gap and replays buffered lines, anything older than seven days is discarded before reaching Loki.</p>
<p>In theory: logs go to stdout, Promtail reads stdout, logs appear in Loki. The <a href="https://12factor.net/logs" target="_blank" rel="noopener noreferrer">twelve-factor app methodology</a> describes exactly this model for Factor XI — treat logs as event streams, write to stdout, let the environment handle collection and routing.</p>
<p>The application had stdout. Promtail was reading stdout. What could go wrong.</p>
<h2 id="what-fingers_crossed-takes-with-it">What fingers_crossed takes with it</h2>
<p>In production, the <code>when@prod</code> block replaced the simple <code>stream</code> handler with something more sophisticated:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">when@prod</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">monolog</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">handlers</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">main</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">type</span>: <span style="color:#ae81ff">fingers_crossed</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">action_level</span>: <span style="color:#ae81ff">error</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">handler</span>: <span style="color:#ae81ff">main_group</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">excluded_http_codes</span>: [<span style="color:#ae81ff">404</span>]
</span></span></code></pre></div><p>The <code>excluded_http_codes: [404]</code> line is itself a tell: without it, every 404 from a scanner or crawler triggers a full buffer flush, dumping megabytes of debug logs for malformed URLs. Someone had already learned that the hard way.</p>
<p><code>fingers_crossed</code> is a well-known Monolog pattern. The idea is elegant: don&rsquo;t flood production logs with debug noise, but if something goes wrong, retroactively show what happened before the error. The handler buffers every log record in memory. The moment it sees an <code>error</code>, it flushes the entire buffer to the nested handler — giving you the full context leading up to the failure.</p>
<p>The problem is what happens when the failure isn&rsquo;t a logged error. It&rsquo;s an OOM kill. A SIGKILL from the orchestrator. A segfault. A process that stops responding and gets forcibly terminated.</p>
<p>In those cases, <code>fingers_crossed</code> never reaches its <code>action_level</code>. The buffer exists, full of the last five minutes of activity, and it vanishes with the process. The logs were there. They were in memory. They died before reaching stdout.</p>
<p>Factor IX of the twelve-factor app talks about disposability: processes should start fast and stop gracefully. On a clean shutdown (SIGTERM), a well-behaved process finishes its current work and exits. But crashes are not clean shutdowns, and memory buffers are not crash-safe. The service had been disposable in the sense that we could restart it; it was not disposable in the sense that its exit was transparent.</p>
<h2 id="the-files-nobody-was-reading">The files nobody was reading</h2>
<p>There was a second problem, quieter but just as persistent.</p>
<p>Every service had a <code>main_group</code> handler that routed logs to two destinations in parallel:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">main_group</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">group</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">members</span>: [<span style="color:#ae81ff">main_file, stdout]</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">main_file</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">stream</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;%kernel.logs_dir%/%kernel.environment%.log&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">formatter</span>: <span style="color:#e6db74">&#34;monolog.formatter.logstash&#34;</span>
</span></span></code></pre></div><p><code>var/log/prod.log</code> was being written on every service, in every environment, including production. The same content that went to stdout also went to a file inside the container. The file grew without rotation. The file was not accessible to Promtail (which read from the Docker socket, not from the container filesystem). The file consumed disk space. Nobody was reading it.</p>
<p>The audit channel was worse:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">audit_file</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">stream</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;%kernel.logs_dir%/audit.log&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">formatter</span>: <span style="color:#e6db74">&#39;monolog.formatter.line&#39;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">audit</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#ae81ff">group</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">members</span>: [<span style="color:#ae81ff">audit_file, stderr]</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">channels</span>: [<span style="color:#e6db74">&#39;audit&#39;</span>]
</span></span></code></pre></div><p>Audit logs went to <code>stderr</code> (visible to Promtail) and to <code>audit.log</code> (not visible to Promtail). The format in the file was a plain line format, not the structured JSON that Promtail expected. In practice, the audit trail existed in two places: one queryable, one buried in a container directory that survived only as long as the container did.</p>
<h2 id="what-factor-xi-actually-requires">What Factor XI actually requires</h2>
<p>The eleventh factor is direct about this: an app should not concern itself with routing or storage of its output stream. It writes to stdout. Everything else is the environment&rsquo;s job.</p>
<p>That means no file handlers in production. Not as a backup. Not for audit trails. Not &ldquo;just in case&rdquo;. The moment an application starts managing files, it takes on responsibility for rotation, retention, disk space, and accessibility — none of which belong inside a container.</p>
<p>The fix for the file handlers is straightforward. In <code>when@prod</code>, remove every <code>*_file</code> handler and every group that includes one. The audit channel gets the same treatment: stderr only, structured JSON, no file:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">when@prod</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">monolog</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">handlers</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">stdout</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">type</span>: <span style="color:#ae81ff">stream</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;php://stdout&#34;</span>
</span></span><span style="display:flex;"><span>                <span style="color:#75715e"># defaults to &#34;warning&#34; — overridable per-deploy via env var for targeted debugging</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">level</span>: <span style="color:#e6db74">&#34;%env(default:default_log_level:MONOLOG_LEVEL__DEFAULT)%&#34;</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">formatter</span>: <span style="color:#e6db74">&#39;monolog.formatter.logstash&#39;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">stderr</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">type</span>: <span style="color:#ae81ff">stream</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;php://stderr&#34;</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">level</span>: <span style="color:#ae81ff">error</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">formatter</span>: <span style="color:#e6db74">&#39;monolog.formatter.logstash&#39;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">main</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">type</span>: <span style="color:#ae81ff">group</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">members</span>: [<span style="color:#ae81ff">stdout]</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">channels</span>: [<span style="color:#e6db74">&#39;!event&#39;</span>, <span style="color:#e6db74">&#39;!http_client&#39;</span>, <span style="color:#e6db74">&#39;!doctrine&#39;</span>, <span style="color:#e6db74">&#39;!deprecation&#39;</span>, <span style="color:#e6db74">&#39;!audit&#39;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">audit</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">type</span>: <span style="color:#ae81ff">stream</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;php://stderr&#34;</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">level</span>: <span style="color:#ae81ff">debug</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">formatter</span>: <span style="color:#e6db74">&#39;monolog.formatter.logstash&#39;</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">channels</span>: [<span style="color:#e6db74">&#39;audit&#39;</span>]
</span></span></code></pre></div><p>stdout for the main channel. stderr for errors and audit. Nothing else. Promtail picks up both via the Docker socket. The container writes nothing to disk. And audit logs are now structured JSON, queryable in Loki alongside everything else.</p>
<h2 id="the-harder-question-about-fingers_crossed">The harder question about fingers_crossed</h2>
<p>The file handlers were easy. <code>fingers_crossed</code> is more nuanced.</p>
<p>The pattern solves a real problem: in a busy production service, logging everything at debug level creates noise and cost. <code>fingers_crossed</code> lets you capture context without paying for it unless something actually goes wrong. It is a reasonable tradeoff when the failure mode you&rsquo;re protecting against is an application-level error (an exception, a 500, a slow query).</p>
<p>It is not a reasonable tradeoff when the failure mode is a process crash. And in a Kubernetes environment, process crashes happen: OOM evictions, liveness probe failures, node pressure. Exactly the cases where you most need the logs.</p>
<p>One approach: keep <code>fingers_crossed</code> but reduce the buffer size. By default it keeps everything since the last reset. Set <code>buffer_size: 50</code> and you cap memory usage, which also limits what gets lost on crash. You won&rsquo;t have the full context, but you&rsquo;ll have the last fifty records. This patches the blast radius rather than removing the root cause: the opacity still depends on an error threshold that may never fire.</p>
<p>Another approach: accept that debug logs are expensive and raise the default log level in production. Then you don&rsquo;t need <code>fingers_crossed</code> at all — if info and above go directly to stdout, nothing is ever buffered.</p>
<p>The approach we landed on: drop <code>fingers_crossed</code>, raise the default level to <code>warning</code>, keep a debug override available via env var for targeted investigation. The logs we care about appear immediately. The ones we don&rsquo;t are never written. Nothing is buffered.</p>
<h2 id="crashes-dont-flush">Crashes don&rsquo;t flush</h2>
<p>Factor XI and Factor IX meet at the same point: a process dying mid-request. <a href="/2026/05/16/the-cache-that-was-lying-to-us/">another article in this series</a>
 described the illusion of a service that worked perfectly on one pod but quietly misbehaved on two. This is the same illusion, one layer up: a service that appeared to log correctly, until the moment it most needed to.</p>
<p>The rule for production Monolog is blunt: if it doesn&rsquo;t reach stdout or stderr before the process exits, it doesn&rsquo;t exist. A file handler inside a container is invisible to the log collector and dies with the pod. A <code>fingers_crossed</code> buffer is invisible to the log collector and dies with the process.</p>
<p>Production tends to create the conditions where you need logs the most — OOM pressure, cascading failures, bad deploys — and those are exactly the conditions where both of these patterns fail you simultaneously. Write to stdout, default to a level that doesn&rsquo;t require buffering, and make the override available for when you actually need to debug something. The logs will be there. They won&rsquo;t be waiting for an error threshold that never fires.</p>
]]></content:encoded></item><item><title>What Survives the Build</title><link>https://guillaumedelre.github.io/2026/05/14/what-survives-the-build/</link><pubDate>Thu, 14 May 2026 15:00:00 +0000</pubDate><guid>https://guillaumedelre.github.io/2026/05/14/what-survives-the-build/</guid><description>Part 2 of 8 in &amp;quot;Symfony to the Cloud: Twelve Factors, Thirteen Services&amp;quot;: How committed .env files feed a build step that bakes credentials into Docker layers — and what it takes to empty the file down to four lines.</description><category>symfony-to-the-cloud</category><content:encoded><![CDATA[<p>At some point during a cloud migration audit, someone ran this:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>docker run --rm &lt;image&gt; php -r <span style="color:#e6db74">&#34;var_dump(require &#39;.env.local.php&#39;);&#34;</span>
</span></span></code></pre></div><p>The output showed everything that <code>composer dump-env prod</code> had compiled into the image at build time. Which meant it showed everything that had been in the <code>.env</code> file when the image was built. Which meant it showed these, among others:</p>
<pre tabindex="0"><code class="language-dotenv" data-lang="dotenv">INFLUXDB_INIT_ADMIN_TOKEN=&lt;influxdb-admin-token&gt;
GF_SECURITY_ADMIN_USER=admin
GF_SECURITY_ADMIN_PASSWORD=admin123
BLACKFIRE_CLIENT_ID=&lt;blackfire-client-id&gt;
BLACKFIRE_CLIENT_TOKEN=&lt;blackfire-client-token&gt;
BLACKFIRE_SERVER_ID=&lt;blackfire-server-id&gt;
BLACKFIRE_SERVER_TOKEN=&lt;blackfire-server-token&gt;
NGROK_AUTHTOKEN=replace-me-optionnal
</code></pre><p>Twenty-five variables in total. Every credential that had accumulated in the root <code>.env</code> over three years, now permanent in an image layer.</p>
<h2 id="how-dump-env-works">How <code>dump-env</code> works</h2>
<p><code>composer dump-env prod</code> is a legitimate Symfony optimization. Instead of parsing <code>.env</code> files on every request, the runtime loads a pre-compiled PHP array from <code>.env.local.php</code>. Faster and simpler.</p>
<p>The problem is what it reads. The Dockerfile copies the repository into the image with <code>COPY . ./</code>, <code>.env</code> included. Then <code>dump-env prod</code> reads that file and compiles every variable into <code>.env.local.php</code>. The image ships with a frozen snapshot of the credentials that were in <code>.env</code> at build time.</p>
<p>Docker layers are immutable archives. Even if a subsequent step removed <code>.env</code> from the container filesystem, the layer containing it would still exist inside the image. <code>docker save &lt;image&gt;</code> produces a tarball of every layer; extracting any file from any point in the build history is straightforward. The credentials are invisible at runtime. They are not gone.</p>
<p><a href="https://12factor.net/build-release-run" target="_blank" rel="noopener noreferrer">Factor V</a>
 calls this out directly: a build artifact should be environment-agnostic, with config arriving at the release step from outside. Once credentials are compiled in, the image is no longer portable. You can&rsquo;t promote it across environments. You build twice and hope the second build behaves like the first.</p>
<h2 id="how-twenty-five-variables-accumulate">How twenty-five variables accumulate</h2>
<p>Before tracing how this gets fixed, it&rsquo;s worth understanding how it happened.</p>
<p>The <code>BLACKFIRE_*</code> tokens are the easy case to understand. A team member sets up profiling, needs to share the configuration, and the repository is already open to everyone. One line in <code>.env</code> is the path of least resistance. The InfluxDB and Grafana credentials follow the same logic — shared tooling, shared repo, one commit.</p>
<p>Then there are the variables that reveal a different kind of drift. In some of the service-level <code>.env</code> files:</p>
<pre tabindex="0"><code class="language-dotenv" data-lang="dotenv">APP__RATINGS__SERIALS=&#39;{&#34;brand1&#34;:{&#34;fr&#34;:&#34;12345&#34;},...}&#39;  # ~40 lines of JSON
APP__YOUTUBE__CREDENTIALS=&#39;{&#34;brand1&#34;:{&#34;client_id&#34;:&#34;xxx&#34;,&#34;refresh_token&#34;:&#34;yyy&#34;},...}&#39;
</code></pre><p>Audience measurement serial numbers. YouTube API refresh tokens per brand. These aren&rsquo;t secrets in the Blackfire sense. They&rsquo;re business data — the kind of values that vary between brands and environments, that someone decided to version in <code>.env</code> because they behaved like configuration and <code>.env</code> was where configuration lived.</p>
<p>Twenty-five variables is the sum of incremental decisions, none of which felt wrong in isolation. The problem is structural: when <code>.env</code> is the only answer available, everything starts looking like it belongs there.</p>
<h2 id="where-things-actually-belong">Where things actually belong</h2>
<p>Emptying the file required answering one question for each variable: <em>where does this actually belong?</em></p>
<p>The answers revealed three categories that the team had never explicitly named:</p>
<p><strong>Static config</strong> lives in code. Business rules, routing logic, Symfony parameter files — anything that doesn&rsquo;t vary between deployments. A change requires a rebuild. The JSON blobs for audience measurement serials turned out not to be static config at all: they were queried from a dedicated Config service at runtime. They had no business being in a file.</p>
<p><strong>Environment config</strong> varies between deployments: hostnames, connection strings, third-party credentials. This is what <a href="https://12factor.net/config" target="_blank" rel="noopener noreferrer">Factor III</a>
 means by &ldquo;config in environment variables&rdquo; — real OS-level variables injected by the runtime, never files that travel with the code. In Kubernetes, this becomes a ConfigMap for non-sensitive values and a Kubernetes Secret for credentials. The choice for secrets management was SOPS — credentials are encrypted and committed to git, rather than stored in an external vault like Azure Key Vault or HashiCorp Vault. A vault trades simplicity for auditability: automatic rotation, centralized audit logs, workload identity-based access with no key to protect. SOPS trades those capabilities for a simpler operational model — no external service to query at deploy time, secrets travel through the normal code review process, git history serves as the audit trail. The accepted downsides are manual rotation and the responsibility of protecting the decryption key itself. For the team&rsquo;s scale, the tradeoff was deliberate.</p>
<p><strong>Dynamic config</strong> changes without a deployment: editorial parameters, per-brand thresholds, content moderation settings. It belongs in a database, managed through the application&rsquo;s Config service. Some of what had accumulated in <code>.env</code> files was this category all along, passing as static defaults because it changed rarely enough that nobody noticed.</p>
<p>Once the categories had names, the variables sorted themselves. The root <code>.env</code> ended at four lines:</p>
<pre tabindex="0"><code class="language-dotenv" data-lang="dotenv">DOMAIN=platform.127.0.0.1.sslip.io
XDEBUG_MODE=off
SERVER_NAME=:80
APP_ENV=dev
</code></pre><p>Safe defaults. Nothing sensitive. <code>dump-env prod</code> now compiles empty strings; real values arrive at runtime from Kubernetes.</p>
<h2 id="the-postgresql-image">The PostgreSQL image</h2>
<p>The PostgreSQL image used in CI has a hardcoded password:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-dockerfile" data-lang="dockerfile"><span style="display:flex;"><span><span style="color:#66d9ef">FROM</span> <span style="color:#e6db74">postgres:15</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#66d9ef">ENV</span> POSTGRES_PASSWORD<span style="color:#f92672">=</span>admin123
</span></span></code></pre></div><p>This looks like the same problem. It isn&rsquo;t, because the threat model is different. The CI database is ephemeral — it exists for the duration of a pipeline run, contains no real data, and runs in an isolated network. A hardcoded password on a throwaway test database is an acceptable risk, not a policy exception.</p>
<p>In production, the question doesn&rsquo;t arise: the platform uses Azure Flexible Server, a managed PostgreSQL service. There is no Docker image. Credentials arrive via Helm chart injection, never touching a layer.</p>
<h2 id="what-survives-the-build-now">What survives the build now</h2>
<p>The image that ships to production now contains a guarantee: <code>var_dump(require '.env.local.php')</code> returns only empty strings and safe defaults. The credentials aren&rsquo;t there because they were never put there — they arrive at runtime, from outside.</p>
<p>That&rsquo;s the responsibility boundary <code>dump-env</code> had been quietly erasing: the image is the application, the runtime is the environment. They should not know each other&rsquo;s secrets.</p>
]]></content:encoded></item><item><title>The Ghost of the CI Runner</title><link>https://guillaumedelre.github.io/2026/05/14/the-ghost-of-the-ci-runner/</link><pubDate>Thu, 14 May 2026 10:00:00 +0000</pubDate><guid>https://guillaumedelre.github.io/2026/05/14/the-ghost-of-the-ci-runner/</guid><description>Part 1 of 8 in &amp;quot;Symfony to the Cloud: Twelve Factors, Thirteen Services&amp;quot;: How a CI runner&amp;#39;s home directory in production config revealed a storage architecture problem — and how Flysystem lazy adapters fixed it.</description><category>symfony-to-the-cloud</category><content:encoded><![CDATA[<pre tabindex="0"><code class="language-dotenv" data-lang="dotenv">APP__COLD_STORAGE__FILESYSTEM_PATH=&#34;/home/jenkins-slave/share_media/media&#34;
APP__COLD_STORAGE__FILESYSTEM_PATH_CACHE=&#34;/home/jenkins-slave/share_media/media/cache&#34;
APP__COLD_STORAGE__RAW_IMAGE_PATH=&#34;/home/jenkins-slave/share_media/media_raw&#34;
APP__SHARE_STORAGE__FILESYSTEM_PATH=&#34;/home/jenkins-slave/share_storage&#34;
</code></pre><p>These lines were in the production <code>.env</code> of the media service. Not staging. Not a local override. Production, committed to the repository, read on every startup.</p>
<p>The paths end where you&rsquo;d expect: <code>/media</code>, <code>/share_storage</code>. They start somewhere more surprising: <code>/home/jenkins-slave</code>, the home directory of a CI runner from an old Jenkins setup.</p>
<h2 id="how-a-runners-home-directory-ends-up-in-production-config">How a runner&rsquo;s home directory ends up in production config</h2>
<p>The platform had grown from a single machine. One server ran everything — the application, the CI runner, the database, the file storage. Files moved between the app and the CI system via NFS: a directory mounted on the same host, accessible to both the containers and the runner.</p>
<p>The path <code>/home/jenkins-slave/share_media</code> was where the NFS share landed on that machine. When the team migrated to Docker Compose, the containers inherited the NFS mount. The path made it into the <code>.env</code> because the application needed to know where to find files. Nobody changed it because it worked. The mount was still there. The path was valid. The application started. Files appeared where they should.</p>
<p>Three years later, nobody thought about it at all. It was just how the media path was configured.</p>
<h2 id="what-kubectl-apply-found">What kubectl apply found</h2>
<p>The first <code>kubectl apply</code> for the media service ended with a pod stuck in CrashLoopBackOff. The container started. The entrypoint ran. The application tried to access <code>/home/jenkins-slave/share_media/media</code>. No such file or directory. No NFS mount. No runner.</p>
<p>The path didn&rsquo;t document a design decision. It documented the machine that happened to be running at the time the <code>.env</code> was written.</p>
<p>This is what <a href="https://12factor.net/backing-services" target="_blank" rel="noopener noreferrer">Factor IV</a>
 of the twelve-factor app is warning against. Backing services — storage, queues, databases — should be attached resources, configured via URL or connection string, interchangeable between environments without code changes. A filesystem path on a shared host is not a backing service. It&rsquo;s a physical assumption about the machine. When the machine changes, the assumption fails.</p>
<h2 id="the-path-was-the-symptom">The path was the symptom</h2>
<p>The obvious first step was removing the runner reference:</p>
<pre tabindex="0"><code class="language-dotenv" data-lang="dotenv">APP__COLD_STORAGE__FILESYSTEM_PATH=&#34;/share_media/media&#34;
APP__SHARE_STORAGE__FILESYSTEM_PATH=&#34;/share_storage&#34;
</code></pre><p>Cleaner. No more CI references in a production config. Still not right. The application still assumed a POSIX filesystem — either a volume mount or a directory on the node. In Kubernetes, a volume shared between multiple pods requires a <code>ReadWriteMany</code> PersistentVolumeClaim. Most storage providers don&rsquo;t support it. Those that do tend to be slow and expensive. And even where it works, you&rsquo;ve replaced one shared filesystem assumption with another.</p>
<p>Renaming the path bought time. It didn&rsquo;t fix the problem.</p>
<p>The problem was that roughly twelve terabytes of images — originals and pre-generated derivatives in multiple formats — from multiple editorial brands — were treated as a directory. A directory can&rsquo;t be mounted cleanly across pods. A backing service can.</p>
<h2 id="flysystem-as-the-shape-of-the-solution">Flysystem as the shape of the solution</h2>
<p>The media service already had a Flysystem dependency. Three concrete adapters — local filesystem, AWS S3, Azure Blob — and one lazy adapter sitting on top:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#75715e"># config/packages/flysystem.yaml</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">flysystem</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">storages</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">media.storage.local</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">adapter</span>: <span style="color:#e6db74">&#39;local&#39;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">options</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">directory</span>: <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">media.storage.aws</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">adapter</span>: <span style="color:#e6db74">&#39;aws&#39;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">options</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">client</span>: <span style="color:#e6db74">&#39;aws_client_service&#39;</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">bucket</span>: <span style="color:#e6db74">&#39;media&#39;</span>
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">streamReads</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">media.storage</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">adapter</span>: <span style="color:#e6db74">&#39;lazy&#39;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">options</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">source</span>: <span style="color:#e6db74">&#39;%env(APP__FLYSYSTEM_MEDIA_STORAGE)%&#39;</span>
</span></span></code></pre></div><p>All application code depends on <code>media.storage</code>. It doesn&rsquo;t know whether files live on the filesystem or in a cloud bucket. One environment variable determines which backend is active:</p>
<pre tabindex="0"><code class="language-dotenv" data-lang="dotenv">APP__FLYSYSTEM_MEDIA_STORAGE=media.storage.aws   # production
APP__FLYSYSTEM_MEDIA_STORAGE=media.storage.local  # local fallback still available
</code></pre><p>The path is gone. The filesystem assumption is gone. What remains is a service name — an attached resource in the twelve-factor sense, configurable without rebuilding the image.</p>
<p>The same pattern extends to the thumbnail cache. <a href="https://github.com/liip/LiipImagineBundle" target="_blank" rel="noopener noreferrer">LiipImagine</a>
 generates resized images on demand; both the source originals and the generated cache go through separate Flysystem adapters:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">liip_imagine</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">loaders</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">default</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">flysystem</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">filesystem_service</span>: <span style="color:#e6db74">&#39;media.storage&#39;</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">default_cache</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">flysystem</span>:
</span></span><span style="display:flex;"><span>                <span style="color:#f92672">filesystem_service</span>: <span style="color:#e6db74">&#39;media.cache.storage&#39;</span>
</span></span></code></pre></div><p>Two environment variables, two buckets. The full pipeline — receive upload, store original, generate thumbnail, cache it — is cloud-portable without touching a line of PHP.</p>
<p>What this doesn&rsquo;t cover is moving the data. The lazy adapter changes one environment variable. Getting twelve terabytes from an NFS mount into an S3 bucket is a different project — a migration window, double-write during cutover, verification that nothing was missed.</p>
<h2 id="what-minio-makes-possible-in-ci">What Minio makes possible in CI</h2>
<p>Production uses S3. Local development uses <a href="https://min.io/" target="_blank" rel="noopener noreferrer">Minio</a>
, an S3-compatible object store that runs in a Docker container. The AWS adapter talks to Minio locally and to S3 in production. The application doesn&rsquo;t notice the difference:</p>
<pre tabindex="0"><code class="language-dotenv" data-lang="dotenv"># local/CI
APP__FLYSYSTEM_MEDIA_STORAGE=media.storage.aws
APP__MINIO_ENDPOINT=http://minio:9000
APP__MINIO_ACCESS_KEY=minioadmin
APP__MINIO_SECRET_KEY=minioadmin
</code></pre><p>The same code, the same adapter, a different endpoint. No mocking, no special test paths, no environment-specific branches.</p>
<p>But the CI configuration goes one step further. The Minio image used in the pipeline isn&rsquo;t the standard upstream one — it&rsquo;s a custom image built with test fixtures preloaded:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-dockerfile" data-lang="dockerfile"><span style="display:flex;"><span><span style="color:#66d9ef">FROM</span> <span style="color:#e6db74">minio/minio:latest</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#66d9ef">COPY</span> tests/fixtures/ /fixtures_media/<span style="color:#960050;background-color:#1e0010">
</span></span></span></code></pre></div><p>Every CI run starts with a Minio instance that already contains the data the test suite expects. No setup script, no seed command, no &ldquo;wait for fixtures to load&rdquo; step before tests begin. The initial state of the test environment is part of the build artifact.</p>
<p><a href="https://12factor.net/build-release-run" target="_blank" rel="noopener noreferrer">Factor V</a>
 applied to test infrastructure: the environment state is built, versioned, and immutable. The CI pipeline builds the Minio image from the same source and at the same commit as the application image. The test fixtures and the code that exercises them are always in sync.</p>
<h2 id="the-s3-tradeoff-honestly">The S3 tradeoff, honestly</h2>
<p>S3 introduces a latency cost that local storage doesn&rsquo;t have. The first bytes of a file take 10 to 30 milliseconds to arrive from S3 — that&rsquo;s the documented first-byte latency for the service, not a measurement on this specific workload.</p>
<p>At 300 requests per second, the reasoning for accepting it was this: most reads hit already-generated thumbnails in the S3-backed cache, not the original files. A freshly uploaded image pays the cold-miss penalty once, on the first thumbnail request. Everything after that is a cache hit. Whether the actual tail latency under load bore that reasoning out required performance testing that was tracked separately — the architecture decision and the validation were decoupled.</p>
<p>The tradeoff was accepted: predictable behavior across multiple pods, no shared-state problems, a storage layer that scales without coordination. The full measurement story belongs in the load test report, not here.</p>
<h2 id="the-ghost-leaves">The ghost leaves</h2>
<p>The path <code>/home/jenkins-slave</code> no longer appears in the configuration. But what it pointed to was a coupling that predated Docker, predated microservices, predated any conversation about cloud migration. The CI runner and the production application shared a filesystem because they lived on the same machine. Nobody designed it that way. It accumulated.</p>
<p>A <code>kubectl apply</code> error on a path that shouldn&rsquo;t have existed forced the question: why does this application assume a specific CI runner is present on the host? The answer was &ldquo;because it always has.&rdquo; That&rsquo;s not a reason. It&rsquo;s a history.</p>
<p>Renaming the path was a paper fix. Flysystem&rsquo;s lazy adapter was the actual answer — not because it&rsquo;s more elegant, but because it makes the storage backend a decision that belongs to the environment, not to the application. The container starts, reads one variable, connects to whatever is at the other end. It doesn&rsquo;t know whether that&rsquo;s a bucket in a data center or a container on a laptop.</p>
<p>The runner&rsquo;s home directory is gone from the config. What replaced it is a service name. That&rsquo;s the difference.</p>
]]></content:encoded></item></channel></rss>