top of page

What LLMs Get Wrong About Code (and How to Catch It)

  • ShiftQuality Contributor
  • Dec 6, 2025
  • 5 min read

The previous posts in this path covered what large language models are, prompting basics, local vs. cloud models, and what LLMs cannot do. This post applies that understanding to the most common developer use case: using LLMs to write, review, and debug code.

AI coding assistants — Copilot, Claude, ChatGPT, Cursor — are the most impactful AI tools for developers today. They write boilerplate, suggest implementations, explain unfamiliar code, and debug errors. They are also reliably wrong in specific, predictable ways. Understanding these failure patterns is the difference between using AI coding tools effectively and introducing bugs that you trust because the AI generated them with confidence.

The Confidence Problem

LLMs generate text that is statistically likely given the prompt. They do not verify correctness. A function that looks syntactically perfect, follows naming conventions, and has sensible comments can still be logically wrong — and the LLM presents it with the same confidence as a function that is correct.

This is the fundamental challenge: the quality signal you rely on for human-written code (confident, well-structured presentation) is not a quality signal for AI-generated code. A human who writes confident code has usually tested it or has experience that supports the approach. An LLM that writes confident code has generated text that matches patterns from its training data. The confidence is not evidence of correctness.

The practice: treat every AI-generated code snippet as a draft from a junior developer who is good at syntax and bad at edge cases. Review it. Test it. Do not assume it is correct because it looks correct.

Hallucinated APIs

LLMs frequently invent APIs that do not exist. They generate calls to library functions with plausible names and sensible parameters — but the function does not exist in the library, or it exists with different parameters, or it exists in a different version than what you are using.

This happens because the LLM's training data includes many versions of many libraries, and the model does not track which functions exist in which versions. It generates code that looks like it should work based on patterns from similar libraries, similar function names, and similar parameter structures.

The defense: verify every import and every library call. Check that the function exists. Check that the parameters are correct for your version. If the code uses a library you are not deeply familiar with, check the documentation rather than trusting the generated code. This takes 30 seconds per function call and prevents hours of debugging a call that cannot possibly work.

Edge Case Blindness

LLMs are trained on code that overwhelmingly handles the happy path. The training data contains many examples of functions that process valid input correctly and fewer examples of functions that handle null inputs, empty collections, integer overflow, race conditions, and malformed data.

The result: AI-generated code typically handles the normal case well and misses edge cases. A function that processes a list might not handle an empty list. A function that divides might not check for zero. A function that parses user input might not validate the format. The code works in testing — because testing usually starts with the happy path — and fails in production when edge cases appear.

The review checklist: for every AI-generated function, ask "what happens with null/empty input?", "what happens with the largest possible input?", "what happens with malformed input?", and "what happens if this is called concurrently?" These questions catch the edge cases the LLM missed.

Security Blind Spots

AI-generated code inherits the security patterns (and anti-patterns) from its training data. If the training data contains many examples of SQL queries built with string concatenation, the model may generate string-concatenated SQL — which is a SQL injection vulnerability.

The model does not understand security as a concept. It generates code that follows patterns. If the prompted context does not emphasize security, the generated code may follow convenient patterns (string concatenation, hard-coded credentials, disabled HTTPS verification) rather than secure patterns.

The defense: apply the same security review to AI-generated code as to human-written code. Check for injection vulnerabilities, credential handling, input validation, and authentication/authorization logic. Do not assume the AI wrote secure code just because you asked it to write secure code — verify that the generated code actually follows secure patterns.

Outdated Patterns

LLMs have knowledge cutoffs and pattern preferences that may not reflect current best practices. The model might generate class-based React components instead of functional components with hooks. It might use deprecated API endpoints. It might follow patterns from an older version of a framework.

This is especially common with rapidly evolving tools and frameworks. The code the model generates works — it is not wrong — but it uses approaches that the community has moved away from for good reasons (performance, maintainability, security).

The defense: if the generated code uses a pattern you do not recognize or that seems outdated, check the current documentation. "Does React still use componentDidMount?" takes 10 seconds to verify and prevents you from building on deprecated foundations.

Using AI Coding Tools Effectively

Despite these limitations, AI coding tools provide genuine productivity gains when used correctly.

Use them for boilerplate. Repetitive code — API endpoint structure, data transfer objects, test setup, configuration files — is where AI assistants excel. The patterns are well-established, the edge cases are minimal, and the time saved is significant.

Use them for exploration. "How would I implement a rate limiter in Go?" gives you a starting point faster than searching documentation. The generated code may not be production-ready, but it provides a scaffold to build on and modify.

Use them for explanation. Paste unfamiliar code and ask the AI to explain what it does. LLMs are generally good at code explanation — better than they are at code generation — because explanation requires understanding patterns rather than generating correct implementations.

Do not use them as a substitute for understanding. If you do not understand the generated code well enough to review it, you should not use it. Using code you cannot evaluate is introducing risk you cannot assess. Either learn enough to review the code or do not use the generated version.

The Takeaway

AI coding assistants are useful tools with predictable failure modes: hallucinated APIs, edge case blindness, security blind spots, and outdated patterns. Knowing these failure modes transforms your relationship with the tools — from passively accepting generated code to actively reviewing it for the specific issues AI is likely to introduce.

The developer who uses AI tools most effectively is not the one who accepts the most suggestions. It is the one who reviews every suggestion with an eye for the patterns that AI gets wrong, catches the issues before they reach production, and uses the time saved on boilerplate to think more carefully about the problems that require human judgment.

Next in the "Intro to LLMs" learning path: This concludes the introductory LLM path. Continue your learning in the "ML Beyond Tutorials" path for intermediate content, or the "Building RAG Systems" path for hands-on LLM application development.

Comments


bottom of page