Skip to main content
You’ve instrumented your application and can see every LLM call. Now let’s add the safety rails that prevent a single bug from draining your budget.

Why Cost Control Matters

If you’ve ever woken up to a surprise $500 bill from OpenAI, you know the pain. Unlike traditional software where costs are predictable, AI applications have a hidden danger: a single bug or bad prompt can drain your budget in minutes. Imagine this: your AI agent gets stuck in a loop, calling GPT-4 thousands of times. By the time you notice, the damage is done. Traditional usage alerts from LLM providers arrive too late—sometimes hours after the spending happened. Aden acts as a speed bump between your code and the LLM API. It tracks every call in real-time and can automatically take action before costs spiral out of control.

The Four Actions

Aden gives you four levels of protection, from gentle warnings to hard stops:

Alert

Send notifications when spending reaches warning thresholds. Your code keeps running, but you know something needs attention.

Throttle

Add delays between requests to slow the burn rate. Gives you time to investigate without completely stopping your users.

Degrade

Automatically switch to cheaper models. Your users still get answers, you save money.

Block

Hard stop when budget is exhausted. The most effective cost-cutting action—stops the bleeding immediately.

Configuring Budgets on the Server

The SDK enforces budgets that you configure on the Aden control server. When your application starts, it connects to the server and receives the current policy. All enforcement happens locally—no per-request latency added.

Budget Threshold Progression

Configure thresholds to trigger different actions as spending increases:
0% ─────────── 50% ─────────── 80% ─────────── 95% ─────────── 100%
   [ALLOW]       [ALERT]        [DEGRADE]       [THROTTLE]      [BLOCK]
                   │                │                │              │
                   ↓                ↓                ↓              ↓
              "Warning!"      Switch to        Slow down       Hard stop
                            gpt-4o-mini

Server-Side Configuration

Configure budgets via the Aden dashboard or API. Here’s an example policy with multiple thresholds:
Budget Policy
{
  "type": "global",
  "limit_usd": 100.00,
  "thresholds": [
    {"percent": 50, "action": "alert", "level": "warning"},
    {"percent": 80, "action": "degrade", "provider": "openai", "to_model": "gpt-4o-mini"},
    {"percent": 95, "action": "throttle", "delay_ms": 2000}
  ],
  "limit_action": "block"
}

What Happens in Your App

Once budgets are configured on the server, your SDK automatically enforces them:
// At $0 spent (0% of $100 budget)
await openai.chat.completions.create({ model: "gpt-4o", ... });
// → Uses gpt-4o ✓

// At $50 spent (50% of budget)
await openai.chat.completions.create({ model: "gpt-4o", ... });
// → Uses gpt-4o, triggers onAlert callback with "warning"

// At $80 spent (80% of budget)
await openai.chat.completions.create({ model: "gpt-4o", ... });
// → Automatically uses gpt-4o-mini instead (degraded)

// At $95 spent (95% of budget)
await openai.chat.completions.create({ model: "gpt-4o", ... });
// → Uses gpt-4o-mini, request delayed 2 seconds (throttled)

// At $100+ spent (100% of budget)
await openai.chat.completions.create({ model: "gpt-4o", ... });
// → Throws RequestCancelledError (blocked)
The SDK caches the policy locally and syncs with the server periodically. This means zero latency overhead on each request while still getting real-time budget updates.

Setting Up Alerts

Get notified before it’s too late. Alerts let you know when you’re approaching budget limits—not hours later, but in real-time.

Server-Side Alert Policy

Configure alert thresholds via the Aden dashboard or API:
Alert Configuration
[
  {
    "trigger": "budget_threshold",
    "threshold_percent": 80,
    "level": "warning",
    "message": "Approaching budget limit"
  },
  {
    "trigger": "budget_threshold",
    "threshold_percent": 95,
    "level": "critical",
    "message": "Budget nearly exhausted"
  }
]

SDK Alert Handler

When alerts trigger, your onAlert callback receives the notification:
import { instrument } from "aden-ts";
import OpenAI from "openai";

await instrument({
  apiKey: process.env.ADEN_API_KEY,
  serverUrl: process.env.ADEN_API_URL,
  sdks: { OpenAI },

  // This runs whenever an alert triggers
  onAlert: (alert) => {
    console.warn(`[${alert.level}] ${alert.message}`);

    if (alert.level === "warning") {
      sendSlackMessage(`Budget warning: ${alert.message}`);
    } else if (alert.level === "critical") {
      sendPagerDutyAlert(`URGENT: ${alert.message}`);
    }
  },
});

// Use your LLM normally - alerts trigger automatically
const openai = new OpenAI();
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
});

Throttling Runaway Agents

Alerts are great, but what if you want the system to automatically slow down? Throttling adds delays to requests when approaching limits—your agent keeps working, just more slowly.

Server-Side Throttle Policy

Configure throttling thresholds via the Aden dashboard or API:
Throttle Configuration
[
  {
    "trigger": "budget_threshold",
    "threshold_percent": 90,
    "delay_ms": 2000
  },
  {
    "trigger": "budget_threshold",
    "threshold_percent": 95,
    "delay_ms": 5000
  }
]

What Happens

Normal operation:    Request → Response → Request → Response
                        ↓         ↓          ↓         ↓
                       0ms       0ms        0ms       0ms

At 90% budget:       Request → [2 sec wait] → Response → Request → [2 sec wait] → Response
                                    ↓                         ↓
                                 Slower,                   but still
                                 but working               working!
Throttling gives you time to investigate without completely stopping your users. Think of it like a speed limiter in a car.

SDK Implementation

The SDK automatically enforces throttling. Your code stays the same:
import { instrument } from "aden-ts";
import OpenAI from "openai";

await instrument({
  apiKey: process.env.ADEN_API_KEY,
  serverUrl: process.env.ADEN_API_URL,
  sdks: { OpenAI },
});

const openai = new OpenAI();

// At 90% budget, this request will automatically wait 2 seconds
// Your code doesn't change at all - Aden handles it
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Generate a report" }],
});

Automatic Model Degradation

This is one of the most powerful features. When approaching your budget, why pay premium prices? Aden can automatically switch to a cheaper model—your users still get answers, and you save money.

Server-Side Degradation Policy

Configure model degradation rules via the Aden dashboard or API:
Degradation Configuration
[
  {
    "provider": "openai",
    "from_model": "gpt-4o",
    "to_model": "gpt-4o-mini",
    "trigger": "budget_threshold",
    "threshold_percent": 50
  },
  {
    "provider": "anthropic",
    "from_model": "claude-3-5-sonnet-latest",
    "to_model": "claude-3-5-haiku-latest",
    "trigger": "budget_threshold",
    "threshold_percent": 80
  }
]

What Happens

Budget at 0-50%:     "gpt-4o"  →  Full power, highest quality
Budget at 50-100%:   "gpt-4o-mini"  →  Cheaper, still good for most tasks
Budget exceeded:     [blocked]  →  No more requests
The magic is that your code still asks for gpt-4o—Aden silently swaps it to the cheaper model behind the scenes.
Model degradation keeps your application running without any code changes. Your users get answers, your wallet stays happy.

SDK Implementation

Your code stays the same—the SDK handles the model swap automatically:
import { instrument } from "aden-ts";
import OpenAI from "openai";

await instrument({
  apiKey: process.env.ADEN_API_KEY,
  serverUrl: process.env.ADEN_API_URL,
  sdks: { OpenAI },
});

const openai = new OpenAI();

// Early in the month (0% budget used)
await openai.chat.completions.create({
  model: "gpt-4o",  // You ask for gpt-4o
  messages: [{ role: "user", content: "Quick question" }],
});
// → Uses gpt-4o ✓ (full quality)

// Later (50%+ budget used)
await openai.chat.completions.create({
  model: "gpt-4o",  // You still ask for gpt-4o
  messages: [{ role: "user", content: "Another question" }],
});
// → Automatically uses gpt-4o-mini instead (cheaper!)

Blocking: The Emergency Stop

Sometimes you need to just stop. When your budget is exhausted, Aden can block requests entirely. This is your emergency stop button—it stops the bleeding immediately.

Server-Side Budget Limit

Configure the action when budget is exhausted:
Budget with Block Action
{
  "type": "global",
  "limit_usd": 100.00,
  "limit_action": "block"
}
When the budget hits 100%, any new LLM request fails with a RequestCancelledError. The request never reaches OpenAI/Anthropic, so you don’t get charged.
Make sure your application handles RequestCancelledError gracefully. Show users a friendly message like “You’ve reached your usage limit” rather than crashing.

SDK Implementation

import { instrument, RequestCancelledError } from "aden-ts";
import OpenAI from "openai";

await instrument({
  apiKey: process.env.ADEN_API_KEY,
  serverUrl: process.env.ADEN_API_URL,
  sdks: { OpenAI },
  getContextId: () => getCurrentUserId(),  // Track per-user
});

const openai = new OpenAI();

try {
  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
  });
  console.log(response.choices[0].message.content);
} catch (error) {
  if (error instanceof RequestCancelledError) {
    // Budget exhausted - request blocked BEFORE reaching OpenAI
    showUserMessage(
      "You've reached your usage limit. " +
      "Upgrade your plan to continue using AI features."
    );
  } else {
    throw error;
  }
}

Granular Budgets

Instead of one big budget for your whole organization, set separate budgets for different customers, features, agents, or custom tags.

Why granular budgets matter

Without granular budgets:
One big budget: $1000

Customer A uses $950 (they went wild!)

Customer B, C, D... all blocked (unfair!)
With Aden’s granular budgets:
Global budget: $1000 (safety net)
├── Customer A: $100 budget → blocked at $100 (fair!)
├── Customer B: $100 budget → still has full $100
├── Customer C: $50 budget  → smaller customer, smaller budget
└── ... each customer isolated

Server-Side Budget Configuration

Configure multiple budget types via the Aden dashboard or API:
Multi-Budget Configuration
[
  {
    "type": "global",
    "limit_usd": 1000.00,
    "limit_action": "block"
  },
  {
    "type": "customer",
    "match_value": "acme-corp",
    "limit_usd": 100.00,
    "limit_action": "block"
  },
  {
    "type": "feature",
    "match_value": "document-analysis",
    "limit_usd": 500.00,
    "limit_action": "degrade",
    "provider": "openai",
    "degrade_to": "gpt-4o-mini"
  }
]

Budget Types

TypeMatches OnUse Case
GlobalAll requestsOrganization-wide safety net
Customermetadata.customerMulti-tenant SaaS apps
Agentmetadata.agentDifferent AI agent tiers
Featuremetadata.featureChat vs. document analysis
Tagmetadata.tagProjects, teams, anything else
When a request matches multiple budgets, all are validated and the most restrictive action wins.

SDK Implementation

import { instrument, enterMeterContext } from "aden-ts";
import OpenAI from "openai";

await instrument({
  apiKey: process.env.ADEN_API_KEY,
  serverUrl: process.env.ADEN_API_URL,
  sdks: { OpenAI },
});

const openai = new OpenAI();

// In your API route, set the context
app.post("/analyze-document", async (req, res) => {
  enterMeterContext({
    metadata: {
      customer: req.user.companyId,     // "acme-corp"
      feature: "document-analysis",
    },
  });

  // This request is validated against:
  // 1. Global budget ($1000)
  // 2. Acme-corp's customer budget ($100)
  // 3. Document-analysis feature budget ($500)
  // Most restrictive wins!

  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: `Analyze: ${req.body.document}` }],
  });
});

Fail Open for Resilience

What happens if Aden’s control server goes down? By default, Aden “fails open”—requests proceed normally, ensuring your application stays available.
Fail-open means your users never see downtime due to Aden. You’ll get an alert about the connectivity issue, but your app keeps working.
import { instrument } from "aden-ts";
import OpenAI from "openai";

await instrument({
  apiKey: process.env.ADEN_API_KEY,
  serverUrl: process.env.ADEN_API_URL,
  sdks: { OpenAI },

  failOpen: true,  // Default - requests proceed if server unreachable

  onAlert: (alert) => {
    if (alert.message.includes("server unreachable")) {
      console.warn("Aden server down - operating without cost controls");
      notifyOpsTeam("Aden server unreachable");
    }
  },
});

// Even if Aden's server is down, your app keeps working
const openai = new OpenAI();
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
});

Quick Reference

ActionWhen to UseWhat Happens
AlertEarly warning (80%)Notification sent, request proceeds
ThrottleSlow down (90%)Request delayed, then proceeds
DegradeSave money (50%)Cheaper model used automatically
BlockHard stop (100%)Request fails with error
Budget TypeExampleUse For
Global$1000/monthOrganization-wide safety net
Customer$100/customerMulti-tenant fairness
Agent$200/agent-typeTiered pricing
Feature$500/featureFeature-level tracking
TagCustomTeams, projects
Always catch RequestCancelledError to handle blocked requests gracefully:
try {
  await openai.chat.completions.create({ ... });
} catch (error) {
  if (error instanceof RequestCancelledError) {
    showUserMessage("Usage limit reached");
  }
}

Next Steps