top of page

Tutorial 2: Prompt Engineering for Production

  • Contributor
  • Jun 14
  • 3 min read

Prompt engineering is real engineering when shipping to production. Not magic incantations; systematic technique.

Step 1: System Prompts (15 min)

Set persona / rules:

System: You are a senior customer support agent for ShiftQuality, 
a SaaS company. Be helpful, concise, and professional. 
Only answer based on the provided context. 
If unsure, say "I don't know."

For: consistent behavior across turns.

Step 2: Be Specific (15 min)

Vague:

Summarize the article.

Specific:

Summarize this article in 3-5 bullet points. 
Each bullet under 15 words. 
Focus on actionable takeaways.
Do not include personal opinions.

Clear constraints = predictable output.

Step 3: Few-Shot Examples (15 min)

Classify the sentiment as positive, negative, or neutral.

Example 1:
Input: "Love this product!"
Output: positive

Example 2:
Input: "Disappointed in the quality."
Output: negative

Example 3:
Input: "It arrived on Tuesday."
Output: neutral

Now classify:
Input: "{user_text}"
Output:

For task patterns: examples beat verbal description.

Step 4: Chain of Thought (15 min)

For reasoning tasks:

Think step-by-step.

Question: A train leaves at 2pm at 60mph. Another leaves at 3pm at 80mph going the same direction. When does the second catch up?

Reasoning:
Step 1: First train's head start = 1 hour × 60 mph = 60 miles
Step 2: Speed difference = 80 - 60 = 20 mph
Step 3: Time to close gap = 60 / 20 = 3 hours
Step 4: 3 hours after 3pm = 6pm

Answer: 6pm

Models reason better when told to think step-by-step.

Step 5: Structured Output (15 min)

Extract the following from the email:
- recipient_name: string
- subject_line: string
- urgency: one of "low", "medium", "high"
- next_action: string

Respond ONLY in JSON. No explanation.

Or use structured outputs API:

response = openai.chat.completions.create(
  model='gpt-4o',
  messages=[...],
  response_format={ 'type': 'json_schema', 'json_schema': {...} },
)

(Tutorial 3 deep.)

Step 6: Role Inversion / XML Tags (10 min)

<context>
{retrieved_docs}
</context>

<question>
{user_question}
</question>

<instructions>
Answer based on context. If not in context, say "I don't know."
</instructions>

Tags help model parse structure. Especially useful with long inputs.

Step 7: Avoid Negatives (10 min)

Bad: "Don't include personal opinions."
Good: "Stick to facts from the context."

LLMs trained on positive instructions better.

Negatives sometimes ignored.

Step 8: Temperature and Sampling (10 min)

response = openai.chat.completions.create(
  temperature=0,    # deterministic
  top_p=1.0,
  max_tokens=500,
)

For deterministic / extractive: temp 0.

For creative: 0.7-1.0.

Most production: temp 0-0.3.

Step 9: Iteration and Versioning (15 min)

Treat prompts as code:

  • Version in git

  • A/B test new versions

  • Eval on judgment set

  • Track metrics over time

PROMPT_V2 = """..."""

Tag deploys; rollback possible.

For mature LLM apps: prompt as code is the standard.

Step 10: Common Anti-Patterns (15 min)

  • One mega-prompt that does five things: split.

  • Examples not chosen carefully: bias model.

  • "Be creative" with temp 0: contradiction.

  • No system prompt: inconsistent behavior.

  • Use new model without re-testing prompts: regressions.

Each prompt: a hypothesis. Test.

What You Just Did

Prompt engineering: system prompts, specific, few-shot, chain of thought, structured output, XML tags, avoid negatives, temperature, iteration, anti-patterns.

Common Failure Modes

Vague prompts. Unpredictable.

No system prompt. No consistency.

Don't version. Drift; regressions.

Prompt for everything. Use schema / API features.

Skip evaluation. Believing prompts work without proof.

Next Tutorial

Related reading

Keep learning. This article is part of the AI in Quality & Delivery path in the ShiftQuality Learning Center. Use AI in delivery — and evaluate it honestly — without the hype.

bottom of page