Agents
My agentic workflow
How I run AI coding agents on remote machines and steer them from my phone. A tmux session per agent, the session picker to jump between them, images for bugs, and voice for prompts.
Most of my real work now happens while I’m not at a computer. An agent is running on a machine somewhere, working through a task, and I check on it the way you’d glance at something simmering on the stove. I can be on the couch, in a queue, or out on a walk, and still point an agent at a problem and watch it go. Here is the shape of that workflow and the pieces that make it hold together.
One machine is never enough, so put them all on the tailnet
I don’t have one dev machine, I have a few. Two of them run at home around the clock: a Mac for the iOS builds and a Linux box for everything else. There are also a couple of VPS boxes in the cloud. Because the home machines stay on, one is always ready for an agent to pick up where I left off, and the agents live on whichever one fits the task.
The thing that makes “any machine” true is that they’re all on my Tailscale tailnet, and so is my phone. From TermRover I can open any of them by name, from anywhere, without exposing a single port. The location of the box stops being something I think about. It’s just another host in the list.
A session per agent, and a tap to switch
Each agent runs inside its own tmux session. One for the project I’m actively pushing on, one for a longer background task, one for a quick experiment. tmux keeps every one of them alive whether or not my phone is connected, which is the whole reason any of this is possible from a device that drops off the network constantly.
The friction used to be switching between them on a phone. Typing the tmux prefix on a touch keyboard is miserable. TermRover’s session picker turns that into one tap: I see my running sessions and jump straight to the one I want. In practice it feels less like SSH and more like flipping between apps, except every “app” is an agent chewing on a different problem.
Talking to an agent with thumbs is the bottleneck
Once you can reach the agents, the real constraint is input. A phone keyboard is slow, and good agent prompts are not short. Two features carry most of the weight here.
Voice. For anything longer than a sentence, I dictate. Voice mode drops the transcribed text straight into the composer, where I can glance over it and send. Saying what I want out loud is far faster than thumbing it in, and a spoken paragraph turns out to be a perfectly good prompt.
Images. When the problem is visual, I send the agent an image. A screenshot of a broken layout, a photo of a whiteboard, an error dialog. “Fix this” with a picture attached beats three paragraphs trying to describe what’s on screen. This is the single biggest difference between steering an agent from a phone and just reading its output.
The loop, end to end
Put together, a normal session looks like this. I open TermRover, tap into the session where my main agent is waiting, and dictate what I want next. It gets to work. I detach and put the phone away. Later I glance back, read what it did by swiping through the scrollback, then screenshot anything that looks off and send it back with a correction. The agent is doing the typing and the running. I’m doing the deciding.
On Android there’s an extra turn of the loop I’ve written about separately: the agent can build the app and install it straight onto the phone in my hand, so even testing doesn’t need a desk.
Deciding, not babysitting
The point of all this isn’t to be more glued to the phone. It’s the opposite. Being able to reach any agent from anywhere only makes sense if reaching them is something I choose to do, not something I get pulled into. That’s a deliberate line, and it’s why TermRover doesn’t ping you every time an agent wants attention. I direct the work, then I get on with my day. The agent waits.