LinuSo
Tutorials

10 Python Design Patterns Every Data Engineer Should Know

Ulrich Mueller
#python#design-patterns#data-engineering#architecture#pipelines

Why This Post

Design patterns aren’t just for enterprise Java or abstract CS interviews.
In data engineering, real-world patterns emerge constantly: ingestion flows, retry wrappers, configuration strategies, and orchestration logic.

This post captures the 10 Python design patterns I’ve seen work in production data systems—whether you’re building pipelines, APIs, or platforms.

Let’s make your code cleaner, safer, and easier to scale.


1. Factory Pattern – for Plug-and-Play Pipelines

Problem:

You need to dynamically instantiate objects based on config (e.g. connectors, models, writers).

Example:

def get_writer(writer_type: str):
    if writer_type == "parquet":
        return ParquetWriter()
    elif writer_type == "postgres":
        return PostgresWriter()
    raise ValueError("Unknown writer type")

Used in: ingestion systems, sink abstraction, plug-in connectors.


2. Singleton – for Config, Logging, State Sharing

Problem:

You want a single instance of something: logger, config, stateful metric.

Example:

class Config:
    _instance = None

    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
            cls._instance.load()
        return cls._instance

Used in: config loaders, secrets managers, context objects.


3. Strategy Pattern – for Switching Logic

Problem:

Your algorithm changes based on data, environment, or runtime flags.

Example:

class Strategy:
    def execute(self, data): raise NotImplementedError()

class MeanStrategy(Strategy):
    def execute(self, data): return sum(data) / len(data)

class MedianStrategy(Strategy):
    def execute(self, data): return statistics.median(data)

Used in: model selection, pricing logic, retry policies.


4. Command Pattern – for Wrapping Steps as Objects

Problem:

You want to log, queue, undo, or replay steps.

Example:

class Step:
    def __init__(self, name, fn): self.name = name; self.fn = fn
    def run(self): return self.fn()

Used in: Airflow DAGs, manual pipelines, orchestration UIs.


5. Adapter Pattern – for Legacy Integration

Problem:

You need to integrate a class or API with a new interface.

Example:

class LegacyDB:
    def fetch_data(self): ...

class Adapter:
    def __init__(self, legacy): self.legacy = legacy
    def get(self): return self.legacy.fetch_data()

Used in: data migration tools, unifying SDKs, cleaning third-party messes.


6. Decorator Pattern – for Retry, Logging, Timing

Problem:

You want to wrap existing logic with additional behavior.

Example:

def retry(fn):
    def wrapper(*args, **kwargs):
        for _ in range(3):
            try: return fn(*args, **kwargs)
            except Exception: continue
    return wrapper

Used in: API clients, ETL wrappers, error reporting.


7. Builder Pattern – for Config-Driven Component Assembly

Problem:

You need to build complex components from a config file or env vars.

Example:

class PipelineBuilder:
    def __init__(self): self.steps = []
    def add_step(self, fn): self.steps.append(fn); return self
    def build(self): return Pipeline(self.steps)

Used in: ingest pipelines, backtest configs, dataflows.


8. Observer Pattern – for Metrics, Alerts, Triggers

Problem:

You want to watch an event and notify listeners.

Example:

class Event:
    def __init__(self): self.subs = []
    def subscribe(self, fn): self.subs.append(fn)
    def fire(self): [fn() for fn in self.subs]

Used in: alerting, log hooks, watchdogs.


9. Chain of Responsibility – for Cleaning, Validating, Enriching

Problem:

You want to process something through a flexible, dynamic series of steps.

Example:

class Step:
    def __init__(self, fn, next=None): self.fn = fn; self.next = next
    def run(self, data): return self.next.run(self.fn(data)) if self.next else self.fn(data)

Used in: data cleaning chains, validation rules, enrichment flows.


10. Dependency Injection – for Clean Separation of Logic

Problem:

You want to decouple logic from concrete implementations (great for tests!).

Example:

class DataService:
    def __init__(self, db_client): self.db = db_client
    def get_user(self, id): return self.db.fetch(id)

Used in: testable services, plug-and-play backends, modular jobs.


Final Thoughts

You don’t need to memorize all of these.

But the patterns mindset is powerful:

In my experience, these 10 patterns show up constantly when building real systems in Python.


Next up: “Fast, Clean, Reliable: How I Write Production-Grade Python” – my checklist for real-world quality.

← Back to Blog