top of page

Tutorial 8: AI Agents for Engineering

  • Contributor
  • Jun 4
  • 3 min read

Agents do multi-step tasks autonomously. Bigger upside; bigger risk if unchecked.

Step 1: What an Agent Is (10 min)

A regular AI: you ask; it answers.

An agent: you ask; it plans, executes, checks, iterates.

Example:

"Add a /users endpoint with CRUD."

Agent:

  • Reads existing routes

  • Designs schema

  • Writes the endpoint

  • Writes tests

  • Runs tests

  • Fixes failures

  • Commits

You review at the end.

Step 2: When Agents Help (10 min)

Agents shine when:

  • Task is well-defined

  • Tools to execute are available (file edit, shell, git)

  • Iteration is straightforward

  • Risk of mistakes is bounded

✓ "Add a feature with tests" — bounded
✓ "Refactor X to Y across the codebase" — bounded
✓ "Find and fix lint errors" — bounded
✗ "Improve the system" — open-ended; agent flails

Step 3: Tools Available (10 min)

Modern agent tools (Claude Code, Aider, OpenHands, etc.) can:

  • Read / write files

  • Run shell commands

  • Execute tests

  • Search the codebase

  • Use the web

  • Call APIs

More tools = more autonomy.

Some agents have all of these; some limited. Pick by task.

Step 4: Working With Claude Code (15 min)

claude

Tell it:

"Implement a /users CRUD endpoint matching the existing /products
pattern. Use Pydantic models. Add tests."

It:

  • Lists files (ls)

  • Reads /products impl

  • Writes /users files

  • Runs tests

  • Reports

You review the diff; merge.

Step 5: Start Small (10 min)

First time using an agent:

  • Small bounded task

  • One file ideally

  • Easy to verify

Not:

  • "Refactor the codebase"

  • "Add the new feature"

Small successes; build trust. Then scale.

Step 6: Watch the Action (15 min)

For first agent runs:

  • Watch each step

  • Catch hallucinations

  • Stop if it goes off track

Don't:

  • Walk away while it works

  • Trust it on day one

  • Accept the final result without reading

After 5-10 successful runs: more autonomy.

Step 7: Branch + Commits (10 min)

Always run agents on a branch:

git checkout -b agent-add-users-endpoint

Agent commits as it goes (most do).

Review the branch when done. Squash; merge to main.

Easy to throw away if the agent went wrong.

Step 8: Sandboxing (10 min)

For risky tasks:

  • Run in container

  • Limited file access

  • No production credentials

Don't give an agent your prod DB password.

For local code work: usually fine.

For anything with network / data: think about blast radius.

Step 9: Cost of Agents (10 min)

Agents are token-hungry:

  • Multiple LLM calls per task

  • Reading many files

  • Iterating on failures

Can cost $1-$10 per task on frontier models.

For high-value tasks: worth it. For trivial work: overkill.

Budget your agent use.

Step 10: Verify Always (10 min)

Agents output:

  • Code (sometimes wrong)

  • Tests (sometimes weak)

  • Documentation (sometimes drift)

Read the diff. Run the tests. Smoke test the feature.

Auto-commit + auto-merge from an agent = bugs in main.

Always: human reviews before merge.

What You Just Did

AI agents: what they are, when they help, tools, working with Claude Code, start small, watch, branches, sandboxing, cost, verify. Productive agentic use.

Common Failure Modes

Hand agent open-ended task. Wanders.

Walk away during agent run. Hallucinations slip through.

Auto-merge agent output. Bugs in main.

Give agent production creds. Blast radius unbounded.

Use agents for trivial tasks. Cost > value.

Next Tutorial

Related reading

Keep learning. This article is part of the AI in Quality & Delivery path in the ShiftQuality Learning Center. Use AI in delivery — and evaluate it honestly — without the hype.

bottom of page