Skip to main content

Overview

The Aden SDK provides real-time cost control through the control server. You can set budgets, throttle requests, degrade to cheaper models, and block requests when limits are exceeded.

Control Actions

ActionEffectUse Case
allowRequest proceeds normallyWithin budget
blockRequest rejected with errorBudget exhausted
throttleRequest delayedRate limiting
degradeSwitch to cheaper modelApproaching budget
alertProceed with notificationWarning threshold

Setup

Connect to the control server:
import { instrument } from "aden";
import OpenAI from "openai";

await instrument({
  apiKey: process.env.ADEN_API_KEY,
  serverUrl: process.env.ADEN_API_URL, // Optional, has default

  sdks: { OpenAI },

  // Track usage per user for individual budgets
  getContextId: () => getCurrentUserId(),

  // Handle alerts
  onAlert: (alert) => {
    console.warn(`[${alert.level}] ${alert.message}`);
    // Send to Slack, PagerDuty, etc.
  },

  // What to do if control server is unreachable
  failOpen: true, // Allow requests (default)
});

Budget Configuration

Set budgets via the control server API:

Per-User Budget

curl -X POST https://kube.acho.io/v1/control/policy/budgets \
  -H "Authorization: Bearer $ADEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "context_id": "user_123",
    "limit_usd": 10.00,
    "period": "monthly",
    "action_on_exceed": "block"
  }'

Global Budget

curl -X POST https://kube.acho.io/v1/control/policy/budgets \
  -H "Authorization: Bearer $ADEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "limit_usd": 1000.00,
    "period": "monthly",
    "action_on_exceed": "alert"
  }'

Model Degradation

Automatically switch to cheaper models when approaching budget:
curl -X POST https://kube.acho.io/v1/control/policy/degradations \
  -H "Authorization: Bearer $ADEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "from_model": "gpt-4o",
    "to_model": "gpt-4o-mini",
    "trigger": "budget_threshold",
    "threshold_percent": 80,
    "context_id": "user_123"
  }'
When user_123 reaches 80% of their budget, gpt-4o requests are automatically served with gpt-4o-mini.

Local Cost Control

For testing or serverless environments, implement local control with beforeRequest:
import { instrument, BeforeRequestResult } from "aden";

// Track costs locally
const userCosts = new Map<string, number>();

await instrument({
  emitMetric: myEmitter,
  sdks: { OpenAI },

  getContextId: () => getCurrentUserId(),

  beforeRequest: async (request): Promise<BeforeRequestResult> => {
    const userId = getCurrentUserId();
    const currentCost = userCosts.get(userId) || 0;
    const estimatedCost = estimateCost(request.model, request.messages);

    // Block if over budget
    if (currentCost >= 10.0) {
      return {
        action: "cancel",
        reason: "Monthly budget exceeded",
      };
    }

    // Degrade expensive models when approaching budget
    if (currentCost >= 8.0 && request.model === "gpt-4o") {
      return {
        action: "degrade",
        toModel: "gpt-4o-mini",
        reason: "Approaching budget limit",
      };
    }

    // Throttle during high-cost periods
    if (currentCost >= 5.0) {
      return {
        action: "throttle",
        delayMs: 1000,
      };
    }

    // Alert when halfway through budget
    if (currentCost >= 5.0 && !hasAlertedUser(userId)) {
      return {
        action: "alert",
        level: "warning",
        message: "50% of monthly budget used",
      };
    }

    return { action: "proceed" };
  },
});

BeforeRequestResult Types

type BeforeRequestResult =
  | { action: "proceed" }
  | { action: "throttle"; delayMs: number }
  | { action: "cancel"; reason: string }
  | { action: "degrade"; toModel: string; reason?: string; delayMs?: number }
  | { action: "alert"; level: "info" | "warning" | "critical"; message: string };

Handling Blocked Requests

When a request is blocked, a RequestCancelledError is thrown:
import { RequestCancelledError } from "aden";

try {
  await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello" }],
  });
} catch (error) {
  if (error instanceof RequestCancelledError) {
    console.log("Request blocked:", error.reason);
    // Show user-friendly message
  }
  throw error;
}

Handling Degradation

When a model is degraded, the request proceeds with the cheaper model. You can detect this in responses:
const response = await openai.chat.completions.create({
  model: "gpt-4o", // Requested model
  messages: [{ role: "user", content: "Hello" }],
});

// Check actual model used
console.log(response.model); // Might be "gpt-4o-mini" if degraded

Alert Handling

Configure alert callbacks:
await instrument({
  // ...
  onAlert: async (alert) => {
    switch (alert.level) {
      case "critical":
        await pagerduty.trigger(alert.message);
        break;
      case "warning":
        await slack.postMessage("#llm-alerts", alert.message);
        break;
      case "info":
        console.log("[Info]", alert.message);
        break;
    }
  },
});

Rate Limiting

Implement rate limiting based on request count:
const requestCounts = new Map<string, number>();

await instrument({
  beforeRequest: async (request): Promise<BeforeRequestResult> => {
    const userId = getCurrentUserId();
    const count = (requestCounts.get(userId) || 0) + 1;
    requestCounts.set(userId, count);

    // Limit to 100 requests per minute
    if (count > 100) {
      return {
        action: "throttle",
        delayMs: 60000, // Wait 1 minute
      };
    }

    return { action: "proceed" };
  },
});

// Reset counts every minute
setInterval(() => requestCounts.clear(), 60000);

Fail-Open vs Fail-Closed

Configure behavior when the control server is unreachable:
await instrument({
  apiKey: process.env.ADEN_API_KEY,

  // Fail-open: Allow requests if server unreachable (default)
  failOpen: true,

  // Fail-closed: Block all requests if server unreachable
  // failOpen: false,
});

Cost Estimation

The SDK doesn’t calculate costs directly. Use the control server for accurate cost tracking, or implement local estimation:
const COSTS_PER_1K = {
  "gpt-4o": { input: 0.005, output: 0.015 },
  "gpt-4o-mini": { input: 0.00015, output: 0.0006 },
  "claude-3-5-sonnet-latest": { input: 0.003, output: 0.015 },
};

function estimateCost(model: string, inputTokens: number, outputTokens: number): number {
  const rates = COSTS_PER_1K[model];
  if (!rates) return 0;
  return (inputTokens * rates.input + outputTokens * rates.output) / 1000;
}

Next Steps