MLOps for Real Teams: Building Organizations That Ship ML

Contributor
Sep 2, 2025
6 min read

Updated: Jun 22

You can have the best ML pipeline, the most rigorous feature engineering, and the most thorough model validation — and still not ship ML to production. Because shipping ML isn't a technical problem. It's an organizational problem.

The gap between "model works in a notebook" and "model runs in production" is usually not about infrastructure. It's about people. Who owns the model after a data scientist trains it? Who monitors it? Who retrains it when data drifts? Who decides when a model should be replaced? Who's responsible when it makes a bad prediction that affects a customer?

MLOps answers these questions. Not with tools alone — with roles, processes, and culture.

The Handoff Problem

In most organizations, the ML workflow crosses team boundaries:

Data engineering builds the pipelines that extract, transform, and load data. Data science trains models using that data. Software engineering integrates the model into the application. Operations keeps the application running.

Each handoff is a potential failure point. The data engineer doesn't know what format the data scientist needs. The data scientist's model works on their machine but not in the software engineer's production environment. The software engineer deploys the model but doesn't know how to monitor it. Operations gets paged when the model fails and doesn't know how to fix it.

MLOps is fundamentally about reducing or eliminating these handoffs — not by merging all roles into one, but by creating shared practices, shared tools, and shared responsibility.

Team Structures That Work

The Embedded Model

Data scientists and ML engineers sit within product teams. They're close to the product context, understand user needs directly, and own their models end-to-end — from training to production monitoring.

Works when: ML is a core part of multiple product features, and each product area has enough ML work to justify dedicated ML people.

Fails when: ML best practices aren't shared across teams, leading to inconsistent tooling, duplicated infrastructure, and varying quality standards.

The Platform Model

A centralized ML platform team builds the infrastructure — feature stores, training pipelines, model serving, monitoring — and product teams use the platform to build and deploy their models.

Works when: The organization is large enough to justify a dedicated platform team, and the platform is mature enough to serve multiple teams without constant custom work.

Fails when: The platform team is disconnected from product needs and builds infrastructure that nobody uses, or when the platform becomes a bottleneck because every product request requires platform team involvement.

The Hub-and-Spoke Model

A small central team sets standards, builds shared tools, and provides expertise. ML practitioners in product teams follow the standards and use the shared tools, but own their own models.

Works when: You want consistent practices without the overhead of a full platform team. The central team is small (2-4 people) and focused on enablement rather than execution.

This is the model that works for most mid-size organizations. It balances standardization with autonomy, and it scales incrementally — the central team grows as demand grows.

Roles and Responsibilities

Who Trains the Model

The data scientist or ML engineer who understands the problem, the data, and the modeling technique. They own the experiment — choosing the approach, evaluating options, and producing a model that meets the quality bar.

Who Deploys the Model

This is where organizations differ, and the choice matters. Options:

Data scientists deploy their own models using standardized tooling (MLflow, SageMaker, Vertex AI). This is the fastest path but requires data scientists to have deployment skills.

ML engineers deploy models that data scientists hand off. The ML engineer handles production concerns — serving infrastructure, latency optimization, scaling. This creates a handoff but ensures production readiness.

Automated pipelines deploy models when they meet quality criteria. The CI/CD pipeline handles deployment based on evaluation results. This is the most mature approach and the goal to work toward.

Who Monitors the Model

The person or team that can act on monitoring signals. If model drift is detected, who retrains? If a model error affects users, who diagnoses it? The monitoring owner needs both the skills to understand the signals and the authority to take action.

In practice, model monitoring often falls through the cracks — data scientists think operations handles it, operations thinks data scientists handle it, and nobody's watching. Assign it explicitly.

Who Decides to Replace a Model

Model replacement decisions involve product, data science, and engineering. The data scientist knows whether a new model is better. The product team knows whether the change affects user experience. Engineering knows whether the deployment is safe.

This should be a defined process — not an ad hoc conversation. A model review board (even if it's just three people in a Slack channel) that reviews model changes before deployment catches issues that any individual role would miss.

The Cultural Shift

From Notebooks to Production

Many data science teams are measured on experiment count, model accuracy, or research papers. None of these metrics measure production impact. A model that never deploys has zero business value, regardless of its accuracy score.

Shifting to production orientation means measuring what matters: models deployed, prediction latency, business metrics influenced by ML, and model reliability. This isn't about diminishing research — it's about connecting research to outcomes.

From Throw-It-Over-the-Wall to Shared Ownership

When a data scientist's job is "train the model" and an engineer's job is "deploy the model," neither owns the outcome. The data scientist isn't responsible for production performance. The engineer isn't responsible for model quality.

Shared ownership means the model has a single owner (or a small owning team) responsible for the full lifecycle: training, deployment, monitoring, retraining, and retirement. The owner may not do all the work personally, but they're accountable for the outcome.

From Manual to Automated

Every manual step in the ML lifecycle is a step that happens inconsistently, is forgotten during crunch time, and doesn't scale. Automating training, evaluation, deployment, and monitoring isn't about replacing people — it's about ensuring that the non-creative parts of the pipeline happen reliably every time.

The creative work — problem framing, feature design, model architecture decisions — stays with humans. The mechanical work — data validation, training execution, metric logging, deployment, monitoring — should be automated.

Getting Started

If your organization is at the "notebooks only" stage, here's a practical progression:

Phase 1: Standardize locally. Pick one tool for experiment tracking (MLflow), one for version control (Git + DVC), and get the team using them consistently. This costs nothing and builds foundational discipline.

Phase 2: Build a simple deployment path. A script or pipeline that takes a trained model and deploys it to a serving endpoint. Doesn't need to be fancy. Needs to be repeatable.

Phase 3: Add monitoring. Track prediction volume, latency, and data drift on deployed models. Set alerts for anomalies. This catches problems before users report them.

Phase 4: Automate the pipeline. Connect training, evaluation, and deployment into a CI/CD pipeline that runs on schedule or on trigger. Now models can be retrained and deployed without manual intervention.

Phase 5: Scale and optimize. Feature stores, model registries, A/B testing for model changes, automated retraining based on drift signals. This is where the organizational investment in ML infrastructure pays compound returns.

Each phase builds on the previous one. Skip phases and you build on a foundation that can't support the weight.

Key Takeaway

MLOps is about organizing people and processes around ML, not just implementing tools. The handoff between data science, engineering, and operations is where most ML projects stall. Choose a team structure that matches your organization's size and ML maturity. Define clear ownership for training, deployment, monitoring, and replacement decisions. Shift the culture from notebooks to production, from handoffs to shared ownership, and from manual to automated. Start with standardization and build toward automation incrementally.

This completes the ML Engineering at Scale learning path. You've covered pipeline reliability, collaborative feature engineering, validation beyond accuracy, and organizational MLOps. The throughline: ML at scale is an organizational capability, not an individual skill.

ShiftQuality