Open source for the agent spend era

Cost firewall for AI agents.

LOCO-Agent is the open-source scheduler, budget circuit breaker, and cost attribution layer for teams running agentic AI in production. Wrap the calls you already make. See who spent what. Decide who gets the next expensive slot.

Run it in 5 minutes Trace spend Read source

install pip install loco-agent

$ loco doctor
found: anthropic, openai, google-adk, langchain
suggested: shared scheduler with capacity=3

$ LOCO_LOG=pretty python production_agents.py
[ENQUEUE] security model=opus   team=soc     workflow=incident-review
[GRANT]   security score=0.91   budget=critical-path remaining
[WAIT]    growth   model=sonnet queue=17.0   reason=capacity
[FLAG]    support  mode=downgrade budget=exceeded
[ATTR]    soc      workflow=incident-review model=opus cost=5.0

$ python - <<'PY'
print(scheduler.metrics.attribution.cost_by_team())
PY
{'soc': 91.0, 'support': 38.0, 'growth': 19.0}

7framework adapters

4Dteam, workflow, model, agent

0required core deps

486tests on GitHub main

One allocatorQueue depth plus wait time decides who gets scarce LLM capacity next. Budget circuit breakerReject, alert, or mark work for downgrade before one agent drains the pool. Spend ledgerBreak down cost by team, workflow, model, session, and agent. Ops-native metricsPrometheus export and a Grafana dashboard for the queues behind your agents.

Built for builders who read the invoice.

Every agent framework makes it easier to call a model. LOCO is for the moment after that: when the demo becomes a system, traffic spikes, premium models get expensive, and someone asks where the tokens went.

01 / Own the scheduler

No black-box traffic cop.

LOCO is a Python library you run in your app. The load equation is documented, deterministic, testable, and small enough to understand.

02 / Label the work

Cost starts at the task.

Attach team, workflow, model, session, tenant, and outcome metadata where the agent actually does work.

03 / Govern before spend

Budgets are runtime policy.

Set limits per agent, team, or tenant, then reject, alert, or flag downgrade paths before the call becomes a surprise line item.

04 / Route by pressure

Priority rules do not survive bursts.

LOCO re-scores waiters on each release, so urgent work can climb while long-running batch jobs still make progress.

05 / Bring your framework

Adapters, not lock-in.

Anthropic, OpenAI, Google ADK, LangChain, CrewAI, Bedrock, AutoGen, and plain async Python compete for the same shared slots.

06 / Ship the dashboard

Ops should see the queue.

Export scheduler state, wait time, utilization, policy violations, trust scores, and cost attribution into the stack you already operate.

The CFO view and the SRE view are the same trace.

LOCO connects the dispatch decision to the spend story: who waited, which model ran, which budget was touched, and whether the outcome was worth the tokens.

Cost by team, workflow, model, agent, and session
Token-to-outcome tracking for ROI attribution
Trust scoring and multi-tenant isolation
Prometheus metrics plus an importable Grafana dashboard

LOCO-Agent Grafana dashboard showing cost, queue depth, wait time, utilization, trust scores, and policy panels

Same policy across the agent zoo.

Your LangChain batch job, ADK webhook handler, OpenAI assistant, and Anthropic analyst should not each invent their own concurrency, budget, and attribution rules.

Anthropic SDK

OpenAI SDK

Google ADK

LangChain

CrewAI

AWS Bedrock

AutoGen

Wrap one call. Keep control.

Start with loco.wrap() around any async LLM call. Add adapters, budgets, tenant pools, policy enforcement, and dashboards as the system grows.

Quick Start Adapters

import asyncio
import loco

async def call_llm(prompt: str):
    return await your_model_client.generate(prompt)

async def main():
    loco.configure(capacity=3, budget_mode="downgrade")
    loco.set_budget("support", max_cost=25.0)

    await loco.wrap(
        call_llm,
        agent_id="support",
        weight=2.0,
        prompt="summarize customer thread",
    )

    attr = loco.get_scheduler().metrics.attribution
    print(attr.cost_by_model())

asyncio.run(main())