Tutorial 8: AI Agents for Engineering

Contributor
Jun 4
3 min read

Agents do multi-step tasks autonomously. Bigger upside; bigger risk if unchecked.

Step 1: What an Agent Is (10 min)

A regular AI: you ask; it answers.

An agent: you ask; it plans, executes, checks, iterates.

Example:

"Add a /users endpoint with CRUD."

Agent:

Reads existing routes
Designs schema
Writes the endpoint
Writes tests
Runs tests
Fixes failures
Commits

You review at the end.

Step 2: When Agents Help (10 min)

Agents shine when:

Task is well-defined
Tools to execute are available (file edit, shell, git)
Iteration is straightforward
Risk of mistakes is bounded

✓ "Add a feature with tests" — bounded
✓ "Refactor X to Y across the codebase" — bounded
✓ "Find and fix lint errors" — bounded
✗ "Improve the system" — open-ended; agent flails

Step 3: Tools Available (10 min)

Modern agent tools (Claude Code, Aider, OpenHands, etc.) can:

Read / write files
Run shell commands
Execute tests
Search the codebase
Use the web
Call APIs

More tools = more autonomy.

Some agents have all of these; some limited. Pick by task.

Step 4: Working With Claude Code (15 min)

claude

Tell it:

"Implement a /users CRUD endpoint matching the existing /products
pattern. Use Pydantic models. Add tests."

It:

Lists files (ls)
Reads /products impl
Writes /users files
Runs tests
Reports

You review the diff; merge.

Step 5: Start Small (10 min)

First time using an agent:

Small bounded task
One file ideally
Easy to verify

Not:

"Refactor the codebase"
"Add the new feature"

Small successes; build trust. Then scale.

Step 6: Watch the Action (15 min)

For first agent runs:

Watch each step
Catch hallucinations
Stop if it goes off track

Don't:

Walk away while it works
Trust it on day one
Accept the final result without reading

After 5-10 successful runs: more autonomy.

Step 7: Branch + Commits (10 min)

Always run agents on a branch:

git checkout -b agent-add-users-endpoint

Agent commits as it goes (most do).

Review the branch when done. Squash; merge to main.

Easy to throw away if the agent went wrong.

Step 8: Sandboxing (10 min)

For risky tasks:

Run in container
Limited file access
No production credentials

Don't give an agent your prod DB password.

For local code work: usually fine.

For anything with network / data: think about blast radius.

Step 9: Cost of Agents (10 min)

Agents are token-hungry:

Multiple LLM calls per task
Reading many files
Iterating on failures

Can cost $1-$10 per task on frontier models.

For high-value tasks: worth it. For trivial work: overkill.

Budget your agent use.

Step 10: Verify Always (10 min)

Agents output:

Code (sometimes wrong)
Tests (sometimes weak)
Documentation (sometimes drift)

Read the diff. Run the tests. Smoke test the feature.

Auto-commit + auto-merge from an agent = bugs in main.

Always: human reviews before merge.

What You Just Did

AI agents: what they are, when they help, tools, working with Claude Code, start small, watch, branches, sandboxing, cost, verify. Productive agentic use.

Common Failure Modes

Hand agent open-ended task. Wanders.

Walk away during agent run. Hallucinations slip through.

Auto-merge agent output. Bugs in main.

Give agent production creds. Blast radius unbounded.

Use agents for trivial tasks. Cost > value.

Next Tutorial

Limits: Tutorial 9: When Not to Use AI.

ShiftQuality