Your frontend stays stateless. Your heavy workers don't.
herd sits in front of your OS processes — browsers, LLMs, AI agents.
It spawns workers on demand, routes every request to the exact memory state via X-Session-ID, and kills the process the instant the client disconnects.
Pre-compiled binaries on GitHub Releases. Verify checksums before production use.
The gap no one talks about
Every standard infrastructure tool is blind to half the picture.
Nginx knows the request. systemd knows the PID. Nothing knows both.
herd does.
A reverse proxy knows your HTTP session but not its OS process. A supervisor knows the PID but not its client. herd binds both: this PID exists only for this network stream. When the client drops, the compute is reclaimed—instantly, atomically, with no polling loop.
Application-level heartbeat loops drift, starve under load, and introduce race conditions. herd sets pdeathsig on every managed process. When the session is invalidated, the kernel delivers SIGKILL to the entire process group—no timer, no poll, no grace-period bug to exploit.
Playwright browser sessions, Ollama inference contexts, sandboxed code runners—workloads you can't just restart on every request. herd gives each one a session-scoped dead-man's switch: the process lives exactly as long as the client needs it, and not a millisecond longer.
The problem
You built an AI agent using Playwright and a local LLM. It works perfectly on the first run.
On the third run, your laptop fans are screaming, RAM is maxed at 24 GB, and you have
15 detached orphan processes running in the background because you hit
Ctrl+C too fast.
You kill them one by one. You run it again. Same thing.
What you tried instead
os.Exec timeouts
Drift under load. A 5-second timeout becomes a 60-second timeout when the system is saturated. The process outlives your deadline.
Custom heartbeat pings
Race conditions. The ping thread and the crash handler both try to clean up. One wins. The other corrupts state or panics.
Reaching for Docker
Docker packages the app — it doesn't route traffic to hot memory. Mapping 50 concurrent WebSockets to 50 isolated Chromium instances means custom port-proxying, external TTL logic, and cleanup scripts Docker will never write for you.
Visual proof
✗ without herd — 23:47:09
PID USER %CPU %MEM COMMAND 4821 deploy 98.7 12.3 chromium [orphan] 4822 deploy 97.1 11.8 chromium [orphan] 4823 deploy 95.4 11.2 chromium [orphan] 4824 deploy 94.9 10.9 chromium [orphan] 4825 deploy 93.2 10.6 chromium [orphan] [python3.11 crawler.py exited with SIGSEGV] [parent PID 4820 is gone — children abandoned] Mem used: 14.2 GB / 16 GB OOM killer invoked at 23:47:31 System unresponsive.
✓ with herd — 23:47:09
herd[data-plane] stream breach detected session: sess_7f3a2b parent: PID 4820 (SIGSEGV) action: reaping orphan group herd[reaper] SIGKILL → PID 4821 ✓ herd[reaper] SIGKILL → PID 4822 ✓ herd[reaper] SIGKILL → PID 4823 ✓ herd[reaper] SIGKILL → PID 4824 ✓ herd[reaper] SIGKILL → PID 4825 ✓ Mem freed: 14.1 GB in 3ms System nominal. Next session ready.
The herd way
Define your workers in herd.yaml. Connect from any language using a session header. Everything else is handled.
# Warning: this is what you write without herd import subprocess, signal, atexit, threading, time procs = {} lock = threading.Lock() def cleanup(): with lock: for pid, p in list(procs.items()): try: p.kill() del procs[pid] except Exception: pass atexit.register(cleanup) signal.signal(signal.SIGTERM, lambda s,f: cleanup()) signal.signal(signal.SIGINT, lambda s,f: cleanup()) def launch_worker(session_id): port = find_free_port() # your problem p = subprocess.Popen( ["npx","playwright","run-server", "--port", str(port)], ) wait_for_health(port) # your problem with lock: procs[session_id] = p return port def poll_health(): # your problem while True: time.sleep(5) for sid, p in list(procs.items()): if p.poll() is not None: cleanup_session(sid) threading.Thread(target=poll_health, daemon=True).start() # ...40 more lines of port proxying, reconnect logic...
workers: browser: cmd: ["npx", "playwright", "run-server", "--port", "{{.Port}}"] min: 1 max: 5 ttl: "15m" reuse: false health_path: "/"
# herd intercepts this, spawns a dedicated worker, # routes the socket, and locks the PID to this session. browser = await p.chromium.connect( "ws://localhost:8080/", headers={"X-Session-ID": "user-42"} )
Dead-man's switch
When the WebSocket closes — intentional or not — the kernel delivers SIGKILL to the entire Playwright process group via pdeathsig. No polling. No timer. No orphan.
Runs locally today
herd's data plane is a standard HTTP/TCP reverse proxy. The same herd.yaml you run locally is the same contract your workloads run on in production — routed across a distributed mesh without changing a line of application code. No lock-in. No rewrite.