Testing Agents

Overview

Testing is the foundation of developer confidence. In Hive, comprehensive testing confirms three critical things:

The agent meets success criteria defined by the goal
Constraints are respected under normal and edge inputs
Failure and escalation paths behave as expected

Recommended Workflow

Generate or refine tests with the coding agent:

claude> /hive-test

Run focused suites while iterating:

PYTHONPATH=exports uv run pytest exports/your_agent/tests/ -v

Run goal-based checks before merge:

uv run hive test-run exports/your_agent --goal your_goal_id

Common Commands

Run all tests for an agent

PYTHONPATH=exports uv run pytest exports/your_agent/tests/ -v

Run a single test

PYTHONPATH=exports uv run pytest \
  exports/your_agent/tests/test_agent.py::test_happy_path -v

Run goal-aware CLI test runner

uv run hive test-run exports/your_agent --goal your_goal_id

List generated tests

uv run hive test-list exports/your_agent

Debug a failing test

uv run hive test-debug exports/your_agent test_constraint_budget_limit

What to Test

Goal Completion

Primary success criteria are satisfied
Weighted criteria do not regress across releases

Constraints

Hard constraints always fail safely
Soft constraints emit warnings or fallback behavior

Routing and Retries

Conditional edges take the correct branch
Retry loops terminate and do not stall the graph

Human-in-the-Loop

Pause/resume paths work
Timeout and escalation behavior match requirements

CI Example

name: Agent Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Setup
        run: ./quickstart.sh
      - name: Pytest
        run: PYTHONPATH=exports uv run pytest exports/*/tests/ -v --tb=short

Best Practices

Keep unit-level tests deterministic with mocked tool responses
Add regression tests for every production failure you fix
Treat constraints as mandatory API contracts, not optional hints
Track test coverage across success, failure, retry, and HITL branches

Testing and Debugging: Testing catches issues before production. Once your agent is live, debugging tools help you diagnose and fix issues based on real-world behavior.

Next Steps

Build Your First Agent

Create and export a new agent package

Human-in-the-Loop

Add supervised decision points to sensitive actions

Debugging

Diagnose and fix issues discovered in testing or production

Deployment

Deploy your tested agent to production

Agent Development

Testing & Debugging

Deploy & Iterate

Credentials

Overview

Recommended Workflow

Common Commands

Run all tests for an agent

Run a single test

Run goal-aware CLI test runner

List generated tests

Debug a failing test

What to Test

Goal Completion

Constraints

Routing and Retries

Human-in-the-Loop

CI Example

Best Practices

Next Steps

Build Your First Agent

Human-in-the-Loop

Debugging

Deployment

Agent Development

Testing & Debugging

Deploy & Iterate

Credentials

​Overview

​Recommended Workflow

​Common Commands

​Run all tests for an agent

​Run a single test

​Run goal-aware CLI test runner

​List generated tests

​Debug a failing test

​What to Test

​Goal Completion

​Constraints

​Routing and Retries

​Human-in-the-Loop

​CI Example

​Best Practices

​Next Steps

Build Your First Agent

Human-in-the-Loop

Debugging

Deployment

Overview

Recommended Workflow

Common Commands

Run all tests for an agent

Run a single test

Run goal-aware CLI test runner

List generated tests

Debug a failing test

What to Test

Goal Completion

Constraints

Routing and Retries

Human-in-the-Loop

CI Example

Best Practices

Next Steps