Architecture Decisions You'll Regret

Contributor
May 8
7 min read

Every architecture decision looks smart when you make it. You have a whiteboard, a set of requirements, and a team that agrees this is the right approach. Six months later, you are living with the consequences, and the whiteboard is long gone.

The previous posts in this path covered why architecture matters, which patterns help small teams, how to evaluate build-buy-borrow tradeoffs, and how to design for your first thousand users. This post is about the other side: the decisions that seemed right and turned out to be expensive. Not because the technology was bad, but because the reasoning was wrong.

These are not hypothetical examples. These are the kinds of mistakes that practitioners make — smart people, experienced people — because the failure mode is not ignorance. It is overconfidence about what the system will need before the system exists.

The Microservices That Should Have Been a Monolith

This is the most common regret in modern software architecture, and it happens the same way every time.

A team starts a new project. They have read the blog posts. They have seen the conference talks. They know that Netflix, Uber, and Amazon use microservices. They decide — before writing a line of business logic — that the system will be decomposed into services. User service. Order service. Notification service. Payment service. Each with its own database. Each deployed independently.

Twelve months later, they have five services that are always deployed together, because every feature touches at least three of them. They have a distributed monolith — all the complexity of microservices with none of the independence. The user service cannot change its schema without coordinating with the order service. The notification service cannot send an email without making three network calls that used to be function calls.

The deployment pipeline that was supposed to enable independent releases now requires orchestrating five services in the correct order. The debugging experience requires correlating logs across five systems instead of reading one stack trace. The team has spent more engineering time on service-to-service communication than on the product itself.

The mistake was not choosing microservices. It was choosing them before understanding the domain boundaries. Microservices make sense when you have clear, stable boundaries between parts of the system that change at different rates and scale independently. Those boundaries are almost impossible to identify before you have built the system and watched it evolve under real usage.

Start with a monolith. Decompose when you feel the pain that decomposition solves. Not before.

Premature Optimization and Its Cousins

"We need to handle a million concurrent users." No you don't. You need to handle the twelve people in your beta.

Premature optimization is not just about code-level performance tricks like loop unrolling or caching strategies. It is about making the entire system more complex to handle problems that do not exist yet and may never exist.

A team builds an event-driven architecture with Kafka because "we might need to process high-throughput data streams later." They spend weeks configuring topics, partitions, consumer groups, and dead letter queues. The system processes forty events per minute. A PostgreSQL table with a polling query would have done the job in an afternoon.

Another team implements a multi-region active-active database setup because "we need to be globally available." Their users are all in one time zone. The active-active replication introduces conflict resolution complexity that consumes a developer for three months. The latency improvement for their non-existent global users is irrelevant.

The pattern is always the same: solving tomorrow's problem with today's code, and paying for that complexity today. The cost is not just the initial implementation. It is the ongoing cognitive load of maintaining a system that is more complex than it needs to be. Every new team member has to understand the event bus, the replication strategy, the distributed transaction model — for a system that could have been a single database and a cron job.

Build for the load you have. Monitor so you know when the load changes. Scale when the metrics tell you to, not when your imagination does.

The Abstraction That Abstracted Too Much

There is a moment in every project where someone says, "We should build an abstraction layer so we can swap this out later." It sounds prudent. It is usually expensive.

A team wraps their database access in a custom ORM abstraction so they can "switch databases if needed." The abstraction prevents them from using database-specific features that would make their queries simpler and faster. They never switch databases. They spend two years working around the limitations of their own abstraction layer.

Another team builds a "provider-agnostic" cloud abstraction so they can "move between AWS and Azure." The abstraction covers the lowest common denominator of both platforms. They cannot use Lambda, or SQS, or any of the managed services that would eliminate entire categories of code they have to write and maintain. They never switch cloud providers. They pay the abstraction tax on every feature.

The wrong abstraction is worse than no abstraction. It adds a layer of indirection that obscures what the code actually does, prevents you from using the tools you already paid for, and solves a problem — the need to switch providers — that almost never materializes.

If you think you might need to swap something out later, do not build an abstraction now. Build a clean interface to the concrete thing you are using. If the day comes when you need to swap, the clean interface makes the migration manageable. The speculative abstraction makes everything worse in the meantime.

The Custom Solution for a Solved Problem

Authentication. Job scheduling. Full-text search. Email delivery. PDF generation. Rate limiting. Every one of these is a solved problem with mature, battle-tested solutions. Every one of them has been rebuilt from scratch by a team that believed their requirements were unique.

They are almost never unique.

A team builds a custom authentication system because "we need multi-tenant support and the existing solutions don't handle our use case." Three months and two security incidents later, they migrate to Auth0. The multi-tenant use case was supported all along — they just didn't read the documentation thoroughly enough.

Another team writes a custom job scheduler because "we need exactly-once execution guarantees." They discover that exactly-once semantics in distributed systems is one of the hardest problems in computer science. Their custom scheduler achieves at-least-once execution, which is what every off-the-shelf scheduler already provides with decades less technical debt.

The desire to build custom is not laziness. It is usually a combination of optimism about the difficulty of the problem and underestimation of the maturity of existing solutions. The real decision framework is simple: use the existing solution until you can articulate, with specifics, why it cannot work. "It doesn't do exactly what we want" is not a reason. "It cannot support our regulatory requirement for data residency in these three specific jurisdictions, and here is the section of the documentation that confirms the limitation" is a reason.

The Schema Designed for Flexibility

Flexible data models are a recurring source of regret. The impulse makes sense: we don't know exactly what data we'll need, so let's make the schema flexible enough to handle anything.

This shows up as the entity-attribute-value pattern, where every piece of data is stored as a key-value pair in a generic table. It shows up as the "just store everything as JSON" approach, where structured data gets dumped into a JSON column because defining columns felt too committal.

Both approaches trade query simplicity and data integrity for flexibility you rarely need. Want to find all orders over a hundred dollars? In a typed schema, that is a simple comparison. In an EAV table, it is a join and a cast and a prayer that every row stored the value as a number and not as a string with a dollar sign.

The JSON column approach is less extreme but has the same failure mode. You lose database-level validation, you lose the ability to create meaningful indexes without expression indexes, and you lose the self-documenting nature of a schema that tells new developers what data the system actually manages.

If you genuinely do not know what data you'll need, you are not ready to build the data layer. Figure out the data model first. A week spent understanding the domain is cheaper than a year spent querying around a flexible schema that makes everything harder.

How to Avoid the Regret

The common thread across all of these mistakes is the same: solving problems you don't have yet. Optimizing for a future that hasn't arrived. Building flexibility for changes that may never happen.

The antidote is not less thinking. It is more honest thinking. Before any architectural decision, ask three questions:

What problem am I solving right now? Not next quarter. Not at scale. Right now. If the answer is "well, eventually we might need to..." stop. Build for now.

What is the cost of being wrong? If you can reverse this decision in a week, make the simpler choice and move on. If reversing it takes six months, invest in understanding the problem before committing. The severity of the decision should dictate the rigor of the analysis.

Am I copying someone else's solution? Netflix needed microservices because they have thousands of engineers working on hundreds of products. You are not Netflix. Amazon needed a custom database because they process millions of transactions per second. You are not Amazon. The architecture that solves their problems at their scale is not the architecture that solves your problems at your scale.

The Takeaway

The best architecture decisions are boring. They use proven technology. They solve the problem as it exists today. They are simple enough that a new team member can understand the system in a week, not a quarter. They leave room to change direction without requiring a rewrite.

The decisions you regret are the ones that felt smart at the time — the clever abstractions, the premature decompositions, the optimizations for load you don't have. They felt like good engineering. They were engineering in service of imagination instead of evidence.

Build for what you know. Instrument so you learn. Change when the data tells you to. And when someone suggests adding complexity to handle a problem that doesn't exist yet, the right answer is almost always: not yet.

Next in the "Thinking Before You Build" learning path: This is the final post in the intermediate series. When you're ready for advanced territory, we'll cover distributed systems — the lies your diagrams tell and the truths your latency reveals.

ShiftQuality