agents that read across multiple sources finish faster now.
Why we changed the runner
When an LLM emits multiple tool calls in a single turn, it has already decided those calls are data-independent. It cannot use a result it has not seen yet, so read-only calls in that batch do not need to wait on each other.
The old datavessel agent runner did wait. It executed every tool call one at a time, even when the model had already grouped independent reads together. A turn with three reads paid the latency cost of all three end-to-end.
That mattered most for agents reading across connected data sources like Google Analytics, Search Console, Shopify, and similar APIs. Those calls often take 300ms–2s each. In multi-tool turns, serialization became the dominant avoidable latency.
What changed
Read tools now fan out concurrently. If the model emits three independent read calls in one turn, the runner starts them together and waits for the batch to complete.
This does not make any individual tool faster. It removes unnecessary waiting between independent calls. The result contract stays the same: results are returned to the model in the order they were emitted.
Write tools still run strictly one at a time, in emission order. Two writes can be data-independent and still be order-sensitive: Slack messages should not land out of order, and HTTP calls may be expected to follow a queue. For writes, correctness wins over speed. Approval-gated writes are still gated.
Failures also isolate within a read batch. If one tool fails, its error is returned to the model normally while the other tools still complete. This change does not add per-provider rate-limit handling. If a provider rate-limits a fan-out, the affected tool gets the rate-limit error.
What we measured
We tested the change against a real Google Analytics agent, ga_traffic_health_check, over a 28-day period on GPT-5.5.
In one turn, the agent issued three concurrent runReport calls. Per api_usage_logs, all three calls started within 100ms of each other. Their execution times were 979ms, 1078ms, and 1065ms.
The total wall-clock time for the parallel batch was about 1.08 seconds. The same batch run serially would have taken about 3.12 seconds. That is a 65% reduction on the batch step itself: 1 second instead of 3.
Where you will feel it
LLM thinking time dominates most runs, not tool execution. The full-run impact depends on how often an agent fans out.
For agents that issue 2–3 parallel reads per turn, the end-to-end run typically shaves about 2–5%. For research-heavy agents that issue 5–10 reads per turn, such as AEO scans, content audits, and multi-site research, the improvement is more substantial.
Simple Q&A flows that only ever issue one tool per turn see no change. There is nothing to parallelize.
What this does not change
This does not make any individual tool call faster. It does not reduce token cost. It does not change agent correctness. Results still return in emission order, and write approvals still behave the same way.
No API change, no new configuration — every datavessel agent that fans out gets it automatically.


Leave a Reply