> ## Documentation Index
> Fetch the complete documentation index at: https://docs.adenhq.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Testing Agents

> Validate agent behavior with goal-based and regression testing to build confidence and ensure production readiness

## Overview

Testing is the foundation of developer confidence. In Hive, comprehensive testing confirms three critical things:

* The agent meets success criteria defined by the goal
* Constraints are respected under normal and edge inputs
* Failure and escalation paths behave as expected

## Recommended Workflow

1. Generate or refine tests with the coding agent:

```bash theme={null}
claude> /hive-test
```

2. Run focused suites while iterating:

```bash theme={null}
PYTHONPATH=exports uv run pytest exports/your_agent/tests/ -v
```

3. Run goal-based checks before merge:

```bash theme={null}
uv run hive test-run exports/your_agent --goal your_goal_id
```

## Common Commands

### Run all tests for an agent

```bash theme={null}
PYTHONPATH=exports uv run pytest exports/your_agent/tests/ -v
```

### Run a single test

```bash theme={null}
PYTHONPATH=exports uv run pytest \
  exports/your_agent/tests/test_agent.py::test_happy_path -v
```

### Run goal-aware CLI test runner

```bash theme={null}
uv run hive test-run exports/your_agent --goal your_goal_id
```

### List generated tests

```bash theme={null}
uv run hive test-list exports/your_agent
```

### Debug a failing test

```bash theme={null}
uv run hive test-debug exports/your_agent test_constraint_budget_limit
```

## What to Test

### Goal Completion

* Primary success criteria are satisfied
* Weighted criteria do not regress across releases

### Constraints

* Hard constraints always fail safely
* Soft constraints emit warnings or fallback behavior

### Routing and Retries

* Conditional edges take the correct branch
* Retry loops terminate and do not stall the graph

### Human-in-the-Loop

* Pause/resume paths work
* Timeout and escalation behavior match requirements

## CI Example

```yaml theme={null}
name: Agent Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Setup
        run: ./quickstart.sh
      - name: Pytest
        run: PYTHONPATH=exports uv run pytest exports/*/tests/ -v --tb=short
```

## Best Practices

* Keep unit-level tests deterministic with mocked tool responses
* Add regression tests for every production failure you fix
* Treat constraints as mandatory API contracts, not optional hints
* Track test coverage across success, failure, retry, and HITL branches

> **Testing and Debugging**: Testing catches issues before production. Once your agent is live, [debugging](/building/debugging) tools help you diagnose and fix issues based on real-world behavior.

## Next Steps

<CardGroup cols={2}>
  <Card title="Build Your First Agent" icon="hammer" href="/building/first-agent">
    Create and export a new agent package
  </Card>

  <Card title="Human-in-the-Loop" icon="user" href="/building/human-in-the-loop">
    Add supervised decision points to sensitive actions
  </Card>

  <Card title="Debugging" icon="bug" href="/building/debugging">
    Diagnose and fix issues discovered in testing or production
  </Card>

  <Card title="Deployment" icon="rocket" href="/building/deployment">
    Deploy your tested agent to production
  </Card>
</CardGroup>
