Tutorial 2: Prompt Engineering for Production
- Contributor
- Jun 14
- 3 min read
Prompt engineering is real engineering when shipping to production. Not magic incantations; systematic technique.
Step 1: System Prompts (15 min)
Set persona / rules:
System: You are a senior customer support agent for ShiftQuality,
a SaaS company. Be helpful, concise, and professional.
Only answer based on the provided context.
If unsure, say "I don't know."
For: consistent behavior across turns.
Step 2: Be Specific (15 min)
Vague:
Summarize the article.
Specific:
Summarize this article in 3-5 bullet points.
Each bullet under 15 words.
Focus on actionable takeaways.
Do not include personal opinions.
Clear constraints = predictable output.
Step 3: Few-Shot Examples (15 min)
Classify the sentiment as positive, negative, or neutral.
Example 1:
Input: "Love this product!"
Output: positive
Example 2:
Input: "Disappointed in the quality."
Output: negative
Example 3:
Input: "It arrived on Tuesday."
Output: neutral
Now classify:
Input: "{user_text}"
Output:
For task patterns: examples beat verbal description.
Step 4: Chain of Thought (15 min)
For reasoning tasks:
Think step-by-step.
Question: A train leaves at 2pm at 60mph. Another leaves at 3pm at 80mph going the same direction. When does the second catch up?
Reasoning:
Step 1: First train's head start = 1 hour × 60 mph = 60 miles
Step 2: Speed difference = 80 - 60 = 20 mph
Step 3: Time to close gap = 60 / 20 = 3 hours
Step 4: 3 hours after 3pm = 6pm
Answer: 6pm
Models reason better when told to think step-by-step.
Step 5: Structured Output (15 min)
Extract the following from the email:
- recipient_name: string
- subject_line: string
- urgency: one of "low", "medium", "high"
- next_action: string
Respond ONLY in JSON. No explanation.
Or use structured outputs API:
response = openai.chat.completions.create(
model='gpt-4o',
messages=[...],
response_format={ 'type': 'json_schema', 'json_schema': {...} },
)
(Tutorial 3 deep.)
Step 6: Role Inversion / XML Tags (10 min)
<context>
{retrieved_docs}
</context>
<question>
{user_question}
</question>
<instructions>
Answer based on context. If not in context, say "I don't know."
</instructions>
Tags help model parse structure. Especially useful with long inputs.
Step 7: Avoid Negatives (10 min)
Bad: "Don't include personal opinions."
Good: "Stick to facts from the context."
LLMs trained on positive instructions better.
Negatives sometimes ignored.
Step 8: Temperature and Sampling (10 min)
response = openai.chat.completions.create(
temperature=0, # deterministic
top_p=1.0,
max_tokens=500,
)
For deterministic / extractive: temp 0.
For creative: 0.7-1.0.
Most production: temp 0-0.3.
Step 9: Iteration and Versioning (15 min)
Treat prompts as code:
Version in git
A/B test new versions
Eval on judgment set
Track metrics over time
PROMPT_V2 = """..."""
Tag deploys; rollback possible.
For mature LLM apps: prompt as code is the standard.
Step 10: Common Anti-Patterns (15 min)
One mega-prompt that does five things: split.
Examples not chosen carefully: bias model.
"Be creative" with temp 0: contradiction.
No system prompt: inconsistent behavior.
Use new model without re-testing prompts: regressions.
Each prompt: a hypothesis. Test.
What You Just Did
Prompt engineering: system prompts, specific, few-shot, chain of thought, structured output, XML tags, avoid negatives, temperature, iteration, anti-patterns.
Common Failure Modes
Vague prompts. Unpredictable.
No system prompt. No consistency.
Don't version. Drift; regressions.
Prompt for everything. Use schema / API features.
Skip evaluation. Believing prompts work without proof.
Next Tutorial
Structured: Tutorial 3: Structured Outputs.
Related reading
Keep learning. This article is part of the AI in Quality & Delivery path in the ShiftQuality Learning Center. Use AI in delivery — and evaluate it honestly — without the hype.


