Three-Level Observability
Hive uses three levels of logging to give you the right amount of detail at each stage of debugging. Start at L1 for the big picture, then drill down as needed.| Level | What It Captures | File | When to Use |
|---|---|---|---|
| L1 (Summary) | Run outcomes — success/failure, execution quality, attention flags | summary.json | Quick health check: did the agent succeed? |
| L2 (Details) | Per-node results — retries, verdicts, latency, attention reasons | details.jsonl | Investigating which node failed or why routing went wrong |
| L3 (Tool Logs) | Step-by-step execution — tool calls, LLM responses, judge feedback | tool_logs.jsonl | Deep debugging: exactly what the LLM saw, said, and did |
Debugging Workflow
Check the summary
Look at
summary.json for the run outcome. Did the agent succeed or fail? Are there attention flags?Identify the failing node
If the run failed, check
details.jsonl to find which node produced the error or unexpected result. Look for retries, timeouts, and attention reasons.Inspect tool-level execution
For the failing node, examine
tool_logs.jsonl to see the exact sequence of tool calls, LLM responses, and judge evaluations.Reproduce interactively
Use the TUI to run the agent again with the same input and watch execution in real time.
AI-Assisted Debugging
Hive gives you an AI-assisted experience for checking logs and getting high signal-to-noise insights. Instead of reading through raw log files, the debugging tools help you:- Identify the root cause of failures from log patterns
- Suggest specific fixes based on common failure modes
- Highlight attention-worthy details you might otherwise miss
Common Issues
Agent fails on a specific tool call
Agent fails on a specific tool call
Check L3 logs for the exact tool call and its response. Common causes: missing credentials, API rate limits, or unexpected input format. Verify your credential store has the right keys set up.
Agent succeeds but produces wrong output
Agent succeeds but produces wrong output
Check L2 logs for routing decisions — the agent may be taking the wrong path through the graph. Review your edge conditions and goal criteria to make sure they match your intent.
Agent gets stuck in a retry loop
Agent gets stuck in a retry loop
Check L2 for the retry count and L3 for why each attempt fails. The root cause is often an unreachable success condition in the node’s evaluation logic. Adjust the node’s constraints or add a fallback edge.
Human-in-the-loop node times out
Human-in-the-loop node times out
Check the timeout configuration in your HITL node. Consider adjusting the timeout duration, changing the escalation policy, or adding an auto-approve fallback for low-risk decisions.
Agent costs more than expected
Agent costs more than expected
Check L1 for total token usage and L2 for per-node costs. Agents that retry frequently or use expensive models on simple tasks drive up costs. Review your model configuration and set budget limits.
Testing vs Debugging
Testing verifies your agent works correctly. Debugging investigates why it doesn’t. Use them together:- Before deployment: Write tests for your goal criteria and constraints
- After a failure: Use the debugging workflow above to diagnose and fix
- After a fix: Add a regression test so the same failure doesn’t happen again