Using "applications" to make a smaller model more effective at bigger tasks.
Demo of an idea I had for my personal JARVIS, quickly put together with a vibe coded browser based display so I could easily see what the agent sees.
Giving the agent a limited scope, a view similar to that of a dedicated application. It has a limited number of actions it can take inside of these applications, with a dedicated clipboard and scratch pad for carrying information between these views as they are removed from context (other than a reference to their existence and a tool to return to them) when they leave. So far I only have 2 of these applications made, one that is functionally a text-only web browser for the model, and the other which is an interface for controlling computer under the agent's system (like my PC).
These applications (called workflows within the scripts) replace what used to be 20 different tools for the computer control app, and 3 tools for the web browsing app. The issue I was targeting with this was the tendency for the local models to fuck up URLs and other text that generally needs to be exact by creating menus that are navigated with a simple verb and a number (open 1, copy 2, etc).
The agent can open as many as it needs, and each one holds a persistent state, so if it leaves one and comes back to it, it is left in exactly the state it left it in. It can leave these applications and return to a much more general mode where it has access to the remaining ~100 or so tools (not all available at once, requestable in groups).
The task I gave it here was to find (what I expected to be) a rare part for my Project car the agent was designed to help me manage. It knows a good few sources for the harder to find parts from our past conversations, so it ended up picking one it knew likely had them for the search.
This was designed specifically for use with gemma4 26b(unsloth QaT Q4_K_XL), but this test was run on gemma4 E4B(also unsloth QaT Q4_K_XL) to show that the system could work under a much smaller, less capable model. Interestingly, this model actually seems to perform better than the 26b under this workflow setup. I've noticed that the 26B seems to have an aversion to the dedicated planning tools I gave it (which put a persistent block of text containing it's "plan" for a task at the very tail of it's context each turn/tool round).
According to llama.cpp, model is getting between 70 and 85 t/s depending on MTP accuracy, and at a context of about 10k tokens by the end of this task, 800 t/s prefill. This is running on an RX6600XT with the vulkan backend.
I'm sure I'm not the first person to come up with something like this, and I'm certain it can be done better than I've done it here, but I wanted to share it in case the idea would be useful to anyone else building their own agent architecture like me. The key idea is that the agent only carries a limited toolset and a small amount of context into these apps. User input, a couple of agent-maintained (sometimes by harness, other times by the model itself) fields, and not much else get to come with it. It's full context is given back when it leaves the apps.