Cost Control

Overview

The Aden SDK provides real-time cost control through the control server. You can set budgets, throttle requests, degrade to cheaper models, and block requests when limits are exceeded.

Control Actions

Action	Effect	Use Case
allow	Request proceeds normally	Within budget
block	Request rejected with error	Budget exhausted
throttle	Request delayed	Rate limiting
degrade	Switch to cheaper model	Approaching budget
alert	Proceed with notification	Warning threshold

Setup

Connect to the control server:

import { instrument } from "aden";
import OpenAI from "openai";

await instrument({
  apiKey: process.env.ADEN_API_KEY,
  serverUrl: process.env.ADEN_API_URL, // Optional, has default

  sdks: { OpenAI },

  // Track usage per user for individual budgets
  getContextId: () => getCurrentUserId(),

  // Handle alerts
  onAlert: (alert) => {
    console.warn(`[${alert.level}] ${alert.message}`);
    // Send to Slack, PagerDuty, etc.
  },

  // What to do if control server is unreachable
  failOpen: true, // Allow requests (default)
});

Budget Configuration

Set budgets via the control server API:

Per-User Budget

curl -X POST https://kube.acho.io/v1/control/policy/budgets \
  -H "Authorization: Bearer $ADEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "context_id": "user_123",
    "limit_usd": 10.00,
    "period": "monthly",
    "action_on_exceed": "block"
  }'

Global Budget

curl -X POST https://kube.acho.io/v1/control/policy/budgets \
  -H "Authorization: Bearer $ADEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "limit_usd": 1000.00,
    "period": "monthly",
    "action_on_exceed": "alert"
  }'

Model Degradation

Automatically switch to cheaper models when approaching budget:

curl -X POST https://kube.acho.io/v1/control/policy/degradations \
  -H "Authorization: Bearer $ADEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "from_model": "gpt-4o",
    "to_model": "gpt-4o-mini",
    "trigger": "budget_threshold",
    "threshold_percent": 80,
    "context_id": "user_123"
  }'

When user_123 reaches 80% of their budget, gpt-4o requests are automatically served with gpt-4o-mini.

Local Cost Control

For testing or serverless environments, implement local control with beforeRequest:

import { instrument, BeforeRequestResult } from "aden";

// Track costs locally
const userCosts = new Map<string, number>();

await instrument({
  emitMetric: myEmitter,
  sdks: { OpenAI },

  getContextId: () => getCurrentUserId(),

  beforeRequest: async (request): Promise<BeforeRequestResult> => {
    const userId = getCurrentUserId();
    const currentCost = userCosts.get(userId) || 0;
    const estimatedCost = estimateCost(request.model, request.messages);

    // Block if over budget
    if (currentCost >= 10.0) {
      return {
        action: "cancel",
        reason: "Monthly budget exceeded",
      };
    }

    // Degrade expensive models when approaching budget
    if (currentCost >= 8.0 && request.model === "gpt-4o") {
      return {
        action: "degrade",
        toModel: "gpt-4o-mini",
        reason: "Approaching budget limit",
      };
    }

    // Throttle during high-cost periods
    if (currentCost >= 5.0) {
      return {
        action: "throttle",
        delayMs: 1000,
      };
    }

    // Alert when halfway through budget
    if (currentCost >= 5.0 && !hasAlertedUser(userId)) {
      return {
        action: "alert",
        level: "warning",
        message: "50% of monthly budget used",
      };
    }

    return { action: "proceed" };
  },
});

BeforeRequestResult Types

type BeforeRequestResult =
  | { action: "proceed" }
  | { action: "throttle"; delayMs: number }
  | { action: "cancel"; reason: string }
  | { action: "degrade"; toModel: string; reason?: string; delayMs?: number }
  | { action: "alert"; level: "info" | "warning" | "critical"; message: string };

Handling Blocked Requests

When a request is blocked, a RequestCancelledError is thrown:

import { RequestCancelledError } from "aden";

try {
  await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello" }],
  });
} catch (error) {
  if (error instanceof RequestCancelledError) {
    console.log("Request blocked:", error.reason);
    // Show user-friendly message
  }
  throw error;
}

Handling Degradation

When a model is degraded, the request proceeds with the cheaper model. You can detect this in responses:

const response = await openai.chat.completions.create({
  model: "gpt-4o", // Requested model
  messages: [{ role: "user", content: "Hello" }],
});

// Check actual model used
console.log(response.model); // Might be "gpt-4o-mini" if degraded

Alert Handling

Configure alert callbacks:

await instrument({
  // ...
  onAlert: async (alert) => {
    switch (alert.level) {
      case "critical":
        await pagerduty.trigger(alert.message);
        break;
      case "warning":
        await slack.postMessage("#llm-alerts", alert.message);
        break;
      case "info":
        console.log("[Info]", alert.message);
        break;
    }
  },
});

Rate Limiting

Implement rate limiting based on request count:

const requestCounts = new Map<string, number>();

await instrument({
  beforeRequest: async (request): Promise<BeforeRequestResult> => {
    const userId = getCurrentUserId();
    const count = (requestCounts.get(userId) || 0) + 1;
    requestCounts.set(userId, count);

    // Limit to 100 requests per minute
    if (count > 100) {
      return {
        action: "throttle",
        delayMs: 60000, // Wait 1 minute
      };
    }

    return { action: "proceed" };
  },
});

// Reset counts every minute
setInterval(() => requestCounts.clear(), 60000);

Fail-Open vs Fail-Closed

Configure behavior when the control server is unreachable:

await instrument({
  apiKey: process.env.ADEN_API_KEY,

  // Fail-open: Allow requests if server unreachable (default)
  failOpen: true,

  // Fail-closed: Block all requests if server unreachable
  // failOpen: false,
});

Cost Estimation

The SDK doesn’t calculate costs directly. Use the control server for accurate cost tracking, or implement local estimation:

const COSTS_PER_1K = {
  "gpt-4o": { input: 0.005, output: 0.015 },
  "gpt-4o-mini": { input: 0.00015, output: 0.0006 },
  "claude-3-5-sonnet-latest": { input: 0.003, output: 0.015 },
};

function estimateCost(model: string, inputTokens: number, outputTokens: number): number {
  const rates = COSTS_PER_1K[model];
  if (!rates) return 0;
  return (inputTokens * rates.input + outputTokens * rates.output) / 1000;
}

Getting Started

TypeScript SDK

Python SDK

Overview

Control Actions

Setup

Budget Configuration

Per-User Budget

Global Budget

Model Degradation

Local Cost Control

BeforeRequestResult Types

Handling Blocked Requests

Handling Degradation

Alert Handling

Rate Limiting

Fail-Open vs Fail-Closed

Cost Estimation

Next Steps

Agent Tracking

API Reference

Getting Started

TypeScript SDK

Python SDK

​Overview

​Control Actions

​Setup

​Budget Configuration

​Per-User Budget

​Global Budget

​Model Degradation

​Local Cost Control

​BeforeRequestResult Types

​Handling Blocked Requests

​Handling Degradation

​Alert Handling

​Rate Limiting

​Fail-Open vs Fail-Closed

​Cost Estimation

​Next Steps

Agent Tracking

API Reference

Overview

Control Actions

Setup

Budget Configuration

Per-User Budget

Global Budget

Model Degradation

Local Cost Control

BeforeRequestResult Types

Handling Blocked Requests

Handling Degradation

Alert Handling

Rate Limiting

Fail-Open vs Fail-Closed

Cost Estimation

Next Steps