Tutorial 8: AI Agents for Engineering
- Contributor
- Jun 4
- 3 min read
Agents do multi-step tasks autonomously. Bigger upside; bigger risk if unchecked.
Step 1: What an Agent Is (10 min)
A regular AI: you ask; it answers.
An agent: you ask; it plans, executes, checks, iterates.
Example:
"Add a /users endpoint with CRUD."
Agent:
Reads existing routes
Designs schema
Writes the endpoint
Writes tests
Runs tests
Fixes failures
Commits
You review at the end.
Step 2: When Agents Help (10 min)
Agents shine when:
Task is well-defined
Tools to execute are available (file edit, shell, git)
Iteration is straightforward
Risk of mistakes is bounded
✓ "Add a feature with tests" — bounded
✓ "Refactor X to Y across the codebase" — bounded
✓ "Find and fix lint errors" — bounded
✗ "Improve the system" — open-ended; agent flails
Step 3: Tools Available (10 min)
Modern agent tools (Claude Code, Aider, OpenHands, etc.) can:
Read / write files
Run shell commands
Execute tests
Search the codebase
Use the web
Call APIs
More tools = more autonomy.
Some agents have all of these; some limited. Pick by task.
Step 4: Working With Claude Code (15 min)
claude
Tell it:
"Implement a /users CRUD endpoint matching the existing /products
pattern. Use Pydantic models. Add tests."
It:
Lists files (ls)
Reads /products impl
Writes /users files
Runs tests
Reports
You review the diff; merge.
Step 5: Start Small (10 min)
First time using an agent:
Small bounded task
One file ideally
Easy to verify
Not:
"Refactor the codebase"
"Add the new feature"
Small successes; build trust. Then scale.
Step 6: Watch the Action (15 min)
For first agent runs:
Watch each step
Catch hallucinations
Stop if it goes off track
Don't:
Walk away while it works
Trust it on day one
Accept the final result without reading
After 5-10 successful runs: more autonomy.
Step 7: Branch + Commits (10 min)
Always run agents on a branch:
git checkout -b agent-add-users-endpoint
Agent commits as it goes (most do).
Review the branch when done. Squash; merge to main.
Easy to throw away if the agent went wrong.
Step 8: Sandboxing (10 min)
For risky tasks:
Run in container
Limited file access
No production credentials
Don't give an agent your prod DB password.
For local code work: usually fine.
For anything with network / data: think about blast radius.
Step 9: Cost of Agents (10 min)
Agents are token-hungry:
Multiple LLM calls per task
Reading many files
Iterating on failures
Can cost $1-$10 per task on frontier models.
For high-value tasks: worth it. For trivial work: overkill.
Budget your agent use.
Step 10: Verify Always (10 min)
Agents output:
Code (sometimes wrong)
Tests (sometimes weak)
Documentation (sometimes drift)
Read the diff. Run the tests. Smoke test the feature.
Auto-commit + auto-merge from an agent = bugs in main.
Always: human reviews before merge.
What You Just Did
AI agents: what they are, when they help, tools, working with Claude Code, start small, watch, branches, sandboxing, cost, verify. Productive agentic use.
Common Failure Modes
Hand agent open-ended task. Wanders.
Walk away during agent run. Hallucinations slip through.
Auto-merge agent output. Bugs in main.
Give agent production creds. Blast radius unbounded.
Use agents for trivial tasks. Cost > value.
Next Tutorial
Limits: Tutorial 9: When Not to Use AI.
Related reading
Keep learning. This article is part of the AI in Quality & Delivery path in the ShiftQuality Learning Center. Use AI in delivery — and evaluate it honestly — without the hype.


