Building an Onboarding Health-Check Script
Write a scripts/doctor.sh that verifies tool versions, free ports, a running daemon, and required env vars, printing actionable failures so onboarding self-diagnoses.
A teammate runs make bootstrap, it fails with an opaque Docker error, and they have no idea whether the cause is an old Node, a busy port, a stopped daemon, or a missing env var. A scripts/doctor.sh health check turns that guessing game into one actionable report, completing the README-driven automation loop.
Diagnostic
The symptom is a setup failure with no signal about which precondition broke. Confirm the gap — there is no single command that inventories the environment:
#!/usr/bin/env bash
set -euo pipefail
[ -x scripts/doctor.sh ] || echo "BAD: no scripts/doctor.sh health check present"
Expected BAD output when the check is missing:
BAD: no scripts/doctor.sh health check present
Without it, a developer sees only the downstream failure — for example Error response from daemon: Ports are not available: 0.0.0.0:5432 — and cannot tell that the real problem is a local Postgres already holding 5432.
Root cause
Bootstrap failures are confusing because the failing command is several steps removed from the actual unmet precondition. docker compose up fails on a busy port, but the message blames Docker, not the conflicting process; a build fails on an old toolchain, but the error is a cryptic native-module compile. Each precondition — tool version, free port, running daemon, present env var — has a clear, cheap test, but nobody runs all of them up front. A doctor script runs every test, collects failures instead of aborting on the first, and prints a remediation line per failure, so the developer fixes causes rather than chasing symptoms.
Resolution
- Start with strict mode and accumulate failures in a counter so one bad check does not hide the rest.
- Compare each tool's version against a pinned floor, not mere presence.
- Confirm the Docker daemon is reachable, not just installed.
- Probe each required port and name the process holding it.
- Diff
.envkeys against.env.exampleso missing config surfaces early. - Exit non-zero with a summary so CI and humans both get a clear verdict.
#!/usr/bin/env bash
# scripts/doctor.sh — onboarding health check
set -euo pipefail
FAIL=0
note() { printf ' - %s\n' "$1"; FAIL=$((FAIL + 1)); }
# 1. Tool versions against pinned floors.
require_version() {
local bin=$1 min=$2 got
if ! command -v "$bin" >/dev/null 2>&1; then note "$bin not installed (need >= $min)"; return; fi
got=$("$bin" --version 2>&1 | grep -oE '[0-9]+\.[0-9]+' | head -1)
if [ "$(printf '%s\n%s\n' "$min" "$got" | sort -V | head -1)" != "$min" ]; then
note "$bin $got is older than required $min"
fi
}
echo "Checking tools..."
require_version docker 24.0
require_version jq 1.6
# 2. Docker daemon reachable.
echo "Checking Docker daemon..."
docker info >/dev/null 2>&1 || note "Docker daemon not reachable — start Docker Desktop or 'systemctl start docker'"
# 3. Required ports free.
echo "Checking ports..."
for port in 3000 5432; do
if lsof -iTCP:"$port" -sTCP:LISTEN -P -n >/dev/null 2>&1; then
pid=$(lsof -tiTCP:"$port" -sTCP:LISTEN | head -1)
note "port $port in use by PID $pid ($(ps -p "$pid" -o comm= 2>/dev/null || echo unknown)) — stop it or remap"
fi
done
# 4. Required env vars present.
echo "Checking env contract..."
if [ -f .env.example ] && [ -f .env ]; then
while IFS= read -r key; do
[ -z "$key" ] && continue
grep -qE "^${key}=" .env || note "missing env var '$key' in .env (declared in .env.example)"
done < <(grep -vE '^\s*#|^\s*$' .env.example | cut -d= -f1)
else
note ".env or .env.example missing — run 'make env'"
fi
# 5. Verdict.
if [ "$FAIL" -eq 0 ]; then
echo "doctor: environment healthy"
else
echo "doctor: $FAIL issue(s) found — fix the items above and re-run"
exit 1
fi
Wire it into the Makefile so make doctor is the documented entry point:
.PHONY: doctor
doctor: ## Diagnose a broken local environment
@./scripts/doctor.sh
Expected output
A healthy workstation prints a clean pass:
$ make doctor
Checking tools...
Checking Docker daemon...
Checking ports...
Checking env contract...
doctor: environment healthy
A broken one names each cause and its fix, then exits non-zero:
$ make doctor
Checking tools...
- jq 1.5 is older than required 1.6
Checking Docker daemon...
Checking ports...
- port 5432 in use by PID 8123 (postgres) — stop it or remap
Checking env contract...
- missing env var 'DATABASE_URL' in .env (declared in .env.example)
doctor: 3 issue(s) found — fix the items above and re-run
Prevention
- Run
make doctoras the last step ofmake bootstrapso a successful setup is also a verified one. - Execute
scripts/doctor.shin CI against a clean checkout to keep the env contract honest, alongside catching missing env vars before container startup. - Add new ports and tools to the script in the same commit that introduces them — gate this with the drift workflow in README-driven automation.
macOS (Docker Desktop):
lsofandpsship by default;sort -Vfor version comparison requires GNU coreutils — install withbrew install coreutilsand prefergsort, or the floor check may misorder versions. WSL2:lsofdoes not see processes bound on the Windows side; a port held by a Windows app shows as free here. Cross-check withnetsh interface portproxy show allfrom PowerShell. Apple Silicon (ARM64): no behavioral difference, but ensurejqandlsofare the native arm64 builds or they fail withexec format errorunder strict mode.
Rollback
#!/usr/bin/env bash
set -euo pipefail
git checkout -- scripts/doctor.sh Makefile # revert the health check and its make target