← Back to blog

RUNNER: How I Built an Autonomous Development Board

Jan 16, 2026 11 min read

Last week, I handed a task to my tool: "Implement dark mode with system preference detection." Then I went to grab a coffee. When I came back, the task was done. Code written, tested, working. The tool had needed three iterations to fix an edge case that I probably wouldn't have noticed until days later.

This is RUNNER. A Kanban board that doesn't just manage tasks—it executes them autonomously.

The Problem

As software developers, we spend a significant portion of our time on repetitive tasks. Fixing bugs, implementing features, writing tests, refactoring code. Many of these tasks follow a similar pattern: analyze the existing code, plan the changes, implement, test, and iterate on failures until everything works. This is exactly the kind of work an LLM like Claude can handle well.

The idea came to me when I discovered RALPH by Frank Bria. RALPH is a Claude integration tool that pursues similar concepts. I was immediately inspired, but wanted something that fits my workflow exactly. A tool I can run locally on my Mac, that uses my own Claude subscription, and gives me a visual overview of all my tasks. No cloud dependency, no additional costs, full control over the code and data.

What RUNNER Does

At its core, RUNNER is a Kanban board with three columns: Backlog, In Progress, and Done. Nothing special so far. The difference lies in what happens when a task is moved to "In Progress."

RUNNER then automatically starts Claude Code in the terminal, passes the task with its description and acceptance criteria, and lets Claude work. Claude analyzes the existing code, understands the architecture, writes the implementation, runs tests, and iterates on errors. All of this happens in real-time, and I can follow every step in the live log.

When Claude is finished or gets stuck, the process stops automatically. I review the result, and if something is missing or incorrect, I can add a "Continue" prompt with one click and let Claude keep working. Sometimes it takes two or three iterations until everything is perfect. But that's okay, because that's exactly what the tool is designed for.

The Tech Stack

A deliberate decision was to keep the stack as simple as possible. After years with complex frameworks, build pipelines, and dependency hell, I wanted something that just works.

Backend: Go with the standard library, gorilla/websocket for real-time updates, and SQLite for data storage. No ORM, no complex frameworks, no external service dependencies. A single binary that does everything. Go compiles fast, runs everywhere, and the standard library covers 95% of what you need for a web application.

Frontend: Plain HTML, CSS, and JavaScript with jQuery. Yes, jQuery in 2026. No build step, no npm install, no 500MB node_modules, no Webpack, no Vite, no TypeScript compiler. I open the HTML file, change something, refresh the browser, done. The change is immediately visible.

Why this simplicity? Because RUNNER is developed with RUNNER itself. The more complex the build process, the more can go wrong when an LLM changes code. With this stack, Claude can modify a file, and the change is immediately visible. No compilation errors from forgotten dependencies, no build failures due to incompatible package versions, no TypeScript errors because some type doesn't match somewhere.

// The entire backend setup in a few lines
func main() {
    db := initDB()
    defer db.Close()
    
    http.HandleFunc("/api/tasks", handleTasks(db))
    http.HandleFunc("/api/tasks/", handleTaskByID(db))
    http.HandleFunc("/ws", handleWebSocket)
    http.Handle("/", http.FileServer(http.Dir("static")))
    
    log.Printf("RUNNER starting on :8080")
    log.Fatal(http.ListenAndServe(":8080", nil))
}

Key Features

Drag & Drop Kanban: Tasks can be moved between columns via drag & drop. Intuitive and fast. Everyone who has worked with Trello, Jira, or similar tools knows this basic paradigm.

Queue System: Multiple tasks can be placed in the "In Progress" queue. RUNNER processes them sequentially, one after another. This prevents merge conflicts and ensures that each task is based on the current state of the code. No parallel chaos, no inconsistent states.

Live Log Streaming: While Claude works, RUNNER streams the output in real-time via WebSocket to the browser window. You see exactly what Claude is doing, which files are being analyzed, what code is being written, which tests are running. This gives you a sense of control, even when you're not actively intervening.

Trunk-Based Development: RUNNER works directly on the main or working branch. No feature branches, no complex merging, no cherry-picking. This might sound risky, but in practice it works well because:

Each task is processed in isolation and sequentially
There's a built-in rollback function
Most tasks are small and manageable
You can see the current state at any time and intervene

Rollback: If Claude breaks something (and it happens), you can return to the last known good state with one click. RUNNER automatically creates a Git checkpoint before each task. One click, and all changes are undone.

GitHub Integration: After successful tasks, you can create PRs or push to the remote directly from RUNNER. One click, done. No terminal switching, no manual Git commands.

Light/Dark Mode: Because the eyes matter too. Automatic system preference detection included, with manual override if desired.

Keyboard Shortcuts: Quick task creation with Cmd+N, navigation between tasks with arrow keys, everything accessible without a mouse. For power users who want to keep their hands on the keyboard.

Mobile Responsive: The board also works on phones and tablets. Practical when you want to create tasks or check progress on the go. Touch gestures for drag & drop, adapted layouts for small screens.

The Typical Workflow

Here's what a typical flow looks like:

1. Create a task: I describe what needs to be implemented and define clear acceptance criteria. The more precise the description, the better the result. Vague tasks lead to vague results.

Title: Implement Dark Mode

Description:
Implement a dark mode toggle in the settings area.
The mode should be switchable via button and 
automatically detect the system preference on first load.

Acceptance Criteria:
- Toggle button visible in settings area
- User preference stored in LocalStorage
- Automatic detection of system preference as default
- Smooth CSS transition when switching (300ms)
- All UI elements must be readable in both modes
- Toggle state persists across browser refresh

2. Move task to queue: Drag & drop to "In Progress." Or right-click and "Add to Queue" if you want to prepare multiple tasks.

3. RUNNER works: Claude analyzes the existing code, understands the structure, plans the implementation, writes the code, tests. On errors, Claude iterates automatically. I can watch or do something else.

4. Review: I look at the result. Code looks good? Done. Something missing? Continue with additional context like "The toggle should also have a hover effect" or "The transition is too abrupt, make it smoother."

5. Commit & Push: When satisfied, one click to commit, optionally directly as a PR. RUNNER automatically generates a meaningful commit message based on the task.

The Development Journey

RUNNER was originally called "GRINDER." The name was a play on the continuous processing of tasks, the grinding. But at some point the name felt too negative—too much like "fighting through" instead of "running through," too much like tedious work instead of elegant automation. RUNNER fit better: fast, direct, forward, positive.

The interesting thing about development: RUNNER was largely built with itself. Once the basic functionality was in place—a simple board with task management and Claude integration—I created new features as tasks and RUNNER implemented them. A self-reinforcing loop, bootstrapping at its best.

This led to some interesting situations. Once, Claude fixed a bug in the queue logic that caused the queue to stop working. Ironically, I couldn't put the fix task in the queue because the queue was broken. Rollback, manual fix in the code, and on we went. You quickly learn to appreciate the rollback function.

Challenges along the way:

Branch Strategy: Initially, I wanted to use feature branches, as one does. This quickly led to chaos because Claude changed something in Branch A that was missing in Branch B. Merge conflicts, inconsistent states, confusion. Trunk-based development was the solution: one branch, sequential changes, clear state.

Parallel Tasks: The first attempt to run multiple tasks in parallel sounded efficient. In practice, it ended in merge conflicts and inconsistent state. The sequential queue was a compromise that works better in practice than theoretically more efficient parallelization. Predictability beats speed.

WebSocket Stability: Live updates sound simple, but reconnection handling, heartbeats, race conditions on state updates, and browser tab sleeping were surprisingly tricky. Many edge cases that you only discover when you actually use the tool.

Learnings:

Simplicity wins: Every complexity I removed made RUNNER more stable and maintainable. Less code means fewer bugs, less attack surface, less cognitive load.
Good prompts > complex logic: The quality of task descriptions determines the quality of results more than any clever algorithms. Investing time in better prompts pays off.
Sequential beats parallel: At least for my use case. Predictability is more important than speed. Better one task at a time, but reliably.
Dogfooding works: Building a tool that you use daily yourself automatically leads to better design decisions. You feel every pain point firsthand.

Results

After several weeks of intensive use, I can concretely say what has changed:

Time Savings: Tasks that used to cost me an hour of focused work are now done in 10-15 minutes, including review. Not because Claude types faster than me, but because I can do other things in the meantime. Answer emails, grab coffee, think about the next feature.

Mobile Working: I can create tasks and queue them from anywhere. On the way to work, during lunch break, in the evening on the couch. By the time I'm back at my computer, they're often already done. It feels almost like magic.

Focus on "What" Instead of "How": I spend more time thinking about what should be built and less time implementing it. This feels like a shift toward higher-value work. Product thinking instead of code monkey.

Learning Effect: Surprisingly, I also learn quite a bit from reviewing Claude's code. Different approaches, patterns I didn't know, sometimes more elegant solutions than I would have found myself. It's like pair programming with a very patient partner.

Less Context-Switching: Because I can have tasks processed asynchronously, I no longer have to interrupt myself in the middle of a task. I define the task, queue it, and continue working on what I'm currently doing.

What's Coming Next

RUNNER is far from finished. On the roadmap are several features that would make the tool even more useful:

Remote Access: Access from anywhere, not just on the local network. Probably via a tunnel service like Cloudflare Tunnel or Tailscale.
Push Notifications: Notification to phone when a task is done or stuck. So you don't have to constantly check the board.
Task Templates: Predefined structures for recurring task types. Bug reports, feature requests, refactoring tasks, each with pre-filled acceptance criteria.
Multi-Project Support: Currently RUNNER is tied to one project. Managing multiple projects in parallel and switching between them would be the next level.
Analytics: How many tasks per day? Average processing time? Success rate? Data that helps understand and optimize your own workflow.
Open Source? I'm still considering it. The tool is very tailored to my workflow, but maybe it's still interesting for others as a starting point.

Conclusion

RUNNER has changed how I develop software. Not in the sense of "now AI does everything for me," but more in the sense of "I can focus on what really needs human decisions." The routine work—typing code, debugging off-by-one errors, writing boilerplate—I delegate to a system that is faster and often better at it than me.

Is this the future of software development? I don't know. But for me, right now, it works damn well. It feels like a productivity cheat code that you discover once and then don't want to miss anymore.

If you have questions, want to give feedback, or want to build something similar yourself and exchange ideas, feel free to reach out via X. I'm curious what others think of this approach.