A teammate runs make bootstrap, it fails with an opaque Docker error, and they have no idea whether the cause is an old Node, a busy port, a stopped daemon, or a missing env var. A scripts/doctor.sh health check turns that guessing game into one actionable report, completing the README-driven automation loop.

Diagnostic

The symptom is a setup failure with no signal about which precondition broke. Confirm the gap — there is no single command that inventories the environment:

#!/usr/bin/env bash
set -euo pipefail
[ -x scripts/doctor.sh ] || echo "BAD: no scripts/doctor.sh health check present"

Expected BAD output when the check is missing:

BAD: no scripts/doctor.sh health check present

Without it, a developer sees only the downstream failure — for example Error response from daemon: Ports are not available: 0.0.0.0:5432 — and cannot tell that the real problem is a local Postgres already holding 5432.

Root cause

Bootstrap failures are confusing because the failing command is several steps removed from the actual unmet precondition. docker compose up fails on a busy port, but the message blames Docker, not the conflicting process; a build fails on an old toolchain, but the error is a cryptic native-module compile. Each precondition — tool version, free port, running daemon, present env var — has a clear, cheap test, but nobody runs all of them up front. A doctor script runs every test, collects failures instead of aborting on the first, and prints a remediation line per failure, so the developer fixes causes rather than chasing symptoms.

Resolution

  1. Start with strict mode and accumulate failures in a counter so one bad check does not hide the rest.
  2. Compare each tool's version against a pinned floor, not mere presence.
  3. Confirm the Docker daemon is reachable, not just installed.
  4. Probe each required port and name the process holding it.
  5. Diff .env keys against .env.example so missing config surfaces early.
  6. Exit non-zero with a summary so CI and humans both get a clear verdict.
#!/usr/bin/env bash
# scripts/doctor.sh — onboarding health check
set -euo pipefail

FAIL=0
note() { printf '  - %s\n' "$1"; FAIL=$((FAIL + 1)); }

# 1. Tool versions against pinned floors.
require_version() {
  local bin=$1 min=$2 got
  if ! command -v "$bin" >/dev/null 2>&1; then note "$bin not installed (need >= $min)"; return; fi
  got=$("$bin" --version 2>&1 | grep -oE '[0-9]+\.[0-9]+' | head -1)
  if [ "$(printf '%s\n%s\n' "$min" "$got" | sort -V | head -1)" != "$min" ]; then
    note "$bin $got is older than required $min"
  fi
}
echo "Checking tools..."
require_version docker 24.0
require_version jq 1.6

# 2. Docker daemon reachable.
echo "Checking Docker daemon..."
docker info >/dev/null 2>&1 || note "Docker daemon not reachable — start Docker Desktop or 'systemctl start docker'"

# 3. Required ports free.
echo "Checking ports..."
for port in 3000 5432; do
  if lsof -iTCP:"$port" -sTCP:LISTEN -P -n >/dev/null 2>&1; then
    pid=$(lsof -tiTCP:"$port" -sTCP:LISTEN | head -1)
    note "port $port in use by PID $pid ($(ps -p "$pid" -o comm= 2>/dev/null || echo unknown)) — stop it or remap"
  fi
done

# 4. Required env vars present.
echo "Checking env contract..."
if [ -f .env.example ] && [ -f .env ]; then
  while IFS= read -r key; do
    [ -z "$key" ] && continue
    grep -qE "^${key}=" .env || note "missing env var '$key' in .env (declared in .env.example)"
  done < <(grep -vE '^\s*#|^\s*$' .env.example | cut -d= -f1)
else
  note ".env or .env.example missing — run 'make env'"
fi

# 5. Verdict.
if [ "$FAIL" -eq 0 ]; then
  echo "doctor: environment healthy"
else
  echo "doctor: $FAIL issue(s) found — fix the items above and re-run"
  exit 1
fi

Wire it into the Makefile so make doctor is the documented entry point:

.PHONY: doctor
doctor: ## Diagnose a broken local environment
	@./scripts/doctor.sh

Expected output

A healthy workstation prints a clean pass:

$ make doctor
Checking tools...
Checking Docker daemon...
Checking ports...
Checking env contract...
doctor: environment healthy

A broken one names each cause and its fix, then exits non-zero:

$ make doctor
Checking tools...
  - jq 1.5 is older than required 1.6
Checking Docker daemon...
Checking ports...
  - port 5432 in use by PID 8123 (postgres) — stop it or remap
Checking env contract...
  - missing env var 'DATABASE_URL' in .env (declared in .env.example)
doctor: 3 issue(s) found — fix the items above and re-run

Prevention

  1. Run make doctor as the last step of make bootstrap so a successful setup is also a verified one.
  2. Execute scripts/doctor.sh in CI against a clean checkout to keep the env contract honest, alongside catching missing env vars before container startup.
  3. Add new ports and tools to the script in the same commit that introduces them — gate this with the drift workflow in README-driven automation.

macOS (Docker Desktop): lsof and ps ship by default; sort -V for version comparison requires GNU coreutils — install with brew install coreutils and prefer gsort, or the floor check may misorder versions. WSL2: lsof does not see processes bound on the Windows side; a port held by a Windows app shows as free here. Cross-check with netsh interface portproxy show all from PowerShell. Apple Silicon (ARM64): no behavioral difference, but ensure jq and lsof are the native arm64 builds or they fail with exec format error under strict mode.

Rollback

#!/usr/bin/env bash
set -euo pipefail
git checkout -- scripts/doctor.sh Makefile   # revert the health check and its make target