Your Check Script Was Built for a Human. Redesign It for an LLM.

When you run your check script, you scan 300 lines of test output in two seconds and skip to the failures. An LLM reads every character — and gets charged for all of it.

Most JavaScript projects have a check script that looks something like this:

bun run typecheck && bun run lint:check && bun run test:unit && bun run test:nuxt && bun run knip

Chain a few tasks with &&, pipe them to a terminal, done. When you run it yourself, it works fine. You wait, skim the output, check the exit code. When something fails, it stops and shows the error.

Now hand that same script to an LLM and ask whether the code is ready to ship. Every design choice that felt reasonable reveals itself as a liability.

The Sequential Trap

&& is a human convenience, not an engineering requirement. It means: run this, wait for it to succeed, then run the next thing. For a human watching a terminal, this is sensible — you don’t want to wait for all six checks when the first one fails.

For an LLM running a check and waiting for the result, sequential means every independent step adds latency. Typecheck and unit tests don’t depend on each other. Running them in sequence when they could run in parallel is just burning time — and in an automated loop, time compounds. A check script that takes 45 seconds when everything passes is one that gets called less often.

The fix is a parallel runner that captures output per-job and reports only what’s relevant:

#!/bin/bash
set -eo pipefail

FAILED=()
declare -A JOB_LOGS
declare -A JOB_NAMES

run_check() {
  local name="$1"; shift
  local logfile
  logfile=$(mktemp)
  JOB_NAMES[$!]="$name"
  "$@" > "$logfile" 2>&1 &
  JOB_LOGS[$!]="$logfile"
}

run_check "typecheck"   bun run typecheck
run_check "lint"        bun run lint:check
run_check "unit tests"  bun run test:unit
run_check "nuxt tests"  bun run test:nuxt
run_check "knip"        bun run knip

for pid in "${!JOB_NAMES[@]}"; do
  if ! wait "$pid"; then
    name="${JOB_NAMES[$pid]}"
    log="${JOB_LOGS[$pid]}"
    echo "=== FAILED: $name ==="
    cat "$log"
    FAILED+=("$name")
  fi
done

[ ${#FAILED[@]} -eq 0 ] && echo "All checks passed." || exit 1

Everything that can run in parallel does. The LLM gets an answer faster, with less to read.

The Verbosity Problem

The deeper issue is output design. CLI tools were built for humans scanning terminals. They print progress, warnings, stats, coverage percentages, summaries — because a human reading the output benefits from all of it.

An LLM doesn’t benefit from context it doesn’t need. It pays for it, token by token.

When all your tests pass, bun prints passing test names, timing per suite, coverage stats by file, and a final summary. On a project with a hundred tests, that’s potentially 200 lines the LLM has to consume to learn one bit of information: everything passed.

The right design for an LLM-facing tool: print nothing on success. Print everything on failure. A clean exit code is sufficient for the success case; when something breaks, give the LLM everything it needs to understand and fix it.

The parallel script above does this by default — output is captured per-job, discarded on success, and dumped on failure. Instead of a flood of green checkmarks followed by one red line, the LLM gets a concise report: what failed and why.

The Deeper Principle

This isn’t just a CI optimization. It’s a signal that adding a new kind of consumer to an existing system requires auditing the system’s outputs from that consumer’s perspective.

Your build tools have always had one reader in mind: a human who can scan, pattern-match, and tolerate noise. The new reader — an LLM running checks in an automated loop — is fast at some things and genuinely bad at others. It doesn’t scan. It reads. It processes every line at the same cost. It has no intuition for “this output is boilerplate, skip it.”

As LLMs become participants in development workflows — not just assistants you consult but agents that run checks, make readiness assessments, and propose fixes — poorly designed tooling compounds. Every tool that prints 200 lines when 5 would do is debt. Not the kind you can keep ignoring.

The audit is simple: look at your dev loop through the eyes of a reader with perfect patience, no ability to skim, and a hard cost per token consumed. What would you change?

For most teams, the answer starts with the check script.