Resolving Service Startup Order and Healthcheck Races

Your API container crashes on boot with a connection-refused error against the database, even though depends_on lists db — because plain depends_on waits for the container to start, not for the application inside it to be ready. This is the classic startup race in Multi-Service Orchestration with Compose, and it is fixed by gating dependents on a healthcheck.

Diagnostic

The dependent service exits early or restart-loops while the dependency is still initializing:

#!/usr/bin/env bash
# trace the boot order
set -euo pipefail
docker compose up -d
docker compose logs api | head -n 20
docker compose ps --format '{{.Name}}\t{{.Status}}'
# BAD: api tried to connect before postgres finished initdb
api-1  | Error: connect ECONNREFUSED 172.28.0.3:5432
api-1  | at TCPConnectWrap.afterConnect [as oncomplete]
api-1 exited with code 1
NAME    STATUS
db-1    Up 2 seconds
api-1   Restarting (1) Less than a second ago

The dependency container is Up, but its service has not finished starting — Postgres runs init scripts and a restart cycle before it accepts connections.

Root cause

depends_on: [db] only orders container creation and start; it returns as soon as the container process launches. The database daemon then takes several more seconds to run init scripts, bind its socket, and accept connections. During that window the API connects, is refused, and exits. Without a readiness condition, the order is correct but the timing is not — a race.

Resolution

  1. Add a real healthcheck to the dependency so Compose can observe readiness, not just liveness.
# docker-compose.yml
services:
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: app_db
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres -d app_db"]
      interval: 3s
      timeout: 5s
      retries: 5
      start_period: 10s
  1. Gate the dependent on condition: service_healthy so it does not start until the healthcheck passes.
# docker-compose.yml
services:
  api:
    build: .
    depends_on:
      db:
        condition: service_healthy
      migrate:
        condition: service_completed_successfully
    ports:
      - "${APP_PORT:-3000}:3000"
  1. For one-shot setup work (migrations, seeds) use a short-lived service and condition: service_completed_successfully so the API waits for it to exit cleanly.
# docker-compose.yml
services:
  migrate:
    build: .
    command: ["npm", "run", "db:migrate"]
    depends_on:
      db:
        condition: service_healthy
    restart: "no"
  1. Keep a defensive retry in the application or entrypoint for services that lack a native health endpoint, so a slow start degrades into a brief wait rather than a crash.
#!/usr/bin/env bash
# entrypoint.sh — wait for the API's own dependency, then exec
set -euo pipefail
until nc -z "${DB_HOST:-db}" "${DB_PORT:-5432}"; do
  echo "waiting for ${DB_HOST:-db}:${DB_PORT:-5432}..."
  sleep 1
done
exec "$@"
  1. Bring the stack up with --wait so up itself blocks until every healthcheck passes or fails.
#!/usr/bin/env bash
set -euo pipefail
docker compose up -d --wait

Expected output

[+] Running 3/3
 ✔ Container db-1       Healthy
 ✔ Container migrate-1  Exited (0)
 ✔ Container api-1      Healthy
docker compose ps --format '{{.Name}}\t{{.Status}}'
# db-1       Up 12 seconds (healthy)
# api-1      Up 4 seconds (healthy)

Prevention

  1. Require a healthcheck on every stateful dependency (databases, brokers, caches) and reject depends_on lists that lack a condition.
#!/usr/bin/env bash
# bin/lint-depends.sh — fail if depends_on lacks a condition
set -euo pipefail
if docker compose config | grep -A2 'depends_on:' | grep -qE '^\s+-\s'; then
  echo "Found short-form depends_on without a condition; use condition: service_healthy." >&2
  exit 1
fi
echo "depends_on conditions OK"
  1. Tune start_period to the dependency's real cold-start time so early failing probes during init do not count against retries.

macOS (Docker Desktop): healthcheck probes traverse the Linux VM, adding ~200ms latency; raise timeout slightly so probes do not flap on slower machines. WSL2: enable systemd in /etc/wsl.conf or pg_isready may fail on a missing socket path and the dependency never reports healthy. Apple Silicon (ARM64): pull architecture-matched healthcheck binaries or wrap probes in CMD-SHELL to avoid exec format error.

Rollback

#!/usr/bin/env bash
set -euo pipefail
git checkout -- docker-compose.yml
docker compose down && docker compose up -d --wait