New hires used to be productive in twenty minutes; after a dependency or toolchain upgrade, first-run setup now takes an hour — a measurable time-to-first-PR regression you can attribute to a specific commit.

Diagnostic

Time a cold bootstrap (no caches) and compare it against the pre-upgrade baseline.

#!/usr/bin/env bash
set -euo pipefail
# cold-bootstrap-timer.sh — measure setup from a pristine state.
docker compose down -v >/dev/null 2>&1 || true
docker builder prune -af >/dev/null 2>&1 || true
rm -rf node_modules
start=$(date +%s)
make bootstrap >/dev/null
end=$(date +%s)
echo "cold_bootstrap_seconds=$((end - start))"

Expected BAD output — the cold setup time has roughly tripled versus the recorded baseline:

cold_bootstrap_seconds=3120
# baseline (pre-upgrade): cold_bootstrap_seconds=1080

Root cause

A dependency or toolchain upgrade slows onboarding when it changes how the dependency graph resolves or how caches are keyed. Common triggers: a major version bump pulls a transitive dependency that now compiles a native module from source (minutes of node-gyp/cffi build per cold install); a lockfile churn invalidates every entry so the package manager re-resolves the whole tree instead of replaying the lock; or a base-image/tool bump changes a cache key (hashFiles('**/package-lock.json'), a .tool-versions line, a Dockerfile layer) so every Docker layer and CI cache misses and rebuilds. The regression is invisible on a warm machine — the author's caches are already populated — and only appears on a cold clone, which is exactly the new-hire path. Bisecting cold-bootstrap time across the suspect commit range pinpoints the offending change.

Resolution

  1. Bisect the commit range with the cold-bootstrap timer as the test.
  2. Inspect what the bad commit changed in the lockfile or cache key.
  3. Restore deterministic resolution (commit the lockfile, pin the native dep to a prebuilt) and stable cache keys.
  4. Re-time a cold bootstrap to confirm the regression is gone.

Use git bisect run with the timer wrapped to fail above a threshold:

#!/usr/bin/env bash
set -euo pipefail
# bisect-test.sh — exit non-zero when cold setup exceeds the SLA (seconds).
SLA=1500
secs=$(./cold-bootstrap-timer.sh | sed 's/cold_bootstrap_seconds=//')
[ "$secs" -le "$SLA" ]
#!/usr/bin/env bash
set -euo pipefail
git bisect start HEAD <last-good-tag>
git bisect run ./bisect-test.sh   # prints the first commit that blew the SLA
git bisect reset

Once found, restore deterministic install and a prebuilt binary so cold installs stop compiling:

// package.json — pin a version with a prebuilt arm64/x86_64 binary and keep the lock authoritative
{
  "dependencies": {
    "sharp": "0.33.4"
  },
  "overrides": {
    "node-gyp": "10.1.0"
  }
}

Stabilize the Docker/CI cache key so a tool bump no longer busts every layer:

# .github/workflows/ci.yml — key on the lockfile only, restore on partial match
steps:
  - uses: actions/cache@v4
    with:
      path: |
        ~/.npm
        node_modules
      key: ${{ runner.os }}-node-${{ hashFiles('package-lock.json') }}
      restore-keys: |
        ${{ runner.os }}-node-

Expected output

A cold bootstrap after the fix returns to (or below) the baseline:

$ ./cold-bootstrap-timer.sh
cold_bootstrap_seconds=1015

The bisect run names the exact regressing commit, making the cause auditable:

$ git bisect run ./bisect-test.sh
4f2a9c1 is the first bad commit
    chore: bump image-lib to 5.x (drops prebuilt binaries)

Prevention

  1. Add a CI job that runs cold-bootstrap-timer.sh and fails when setup exceeds an SLA, so a future upgrade that slows onboarding is blocked at PR time.
  2. Commit lockfiles and prefer dependencies that ship prebuilt binaries for both architectures — coordinate with debugging works-on-my-machine runtime drift.
  3. Watch the dependency graph for newly introduced heavy transitive deps with mapping microservice dependencies for local dev.
# .github/workflows/onboarding-sla.yml
name: Onboarding Time Gate
on: [pull_request]
jobs:
  cold-setup:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Time a cold bootstrap
        run: |
          chmod +x ./cold-bootstrap-timer.sh ./bisect-test.sh
          ./bisect-test.sh

macOS (Docker Desktop): cold timings include VirtioFS sync overhead; record the baseline on the same OS you compare against, never mix host platforms. WSL2: measure on the Linux filesystem — node_modules extraction over /mnt/c inflates cold setup several-fold and pollutes the bisect signal. Apple Silicon (ARM64): a dep that dropped its arm64 prebuilt forces compilation locally but not on x86_64 CI; bisect on the developer architecture, not the runner's, to see the real regression.

Rollback

#!/usr/bin/env bash
set -euo pipefail
git revert <bad-commit> && npm ci   # revert the upgrade and restore the prior lockfile state