Data pipeline architecture diagram with monitoring points
Prompt
Create a clean technical diagram of a [pipeline_type] data pipeline showing:
- Source systems on the left (3-4 sources)
- Ingestion layer with streaming and batch paths
- Transformation/processing layer
- Storage layer (data lake + data warehouse)
- Serving layer (dashboards, APIs, ML models)
- Monitoring and alerting touchpoints at each stage
- Data quality checkpoints
Style: professional technical diagram with a [color_scheme] palette. Use standard data engineering icons. Show data flow direction with arrows. Include latency expectations at each stage.Customise this prompt
Fill in 2 variables to personalise this prompt
Example output
A wide, horizontal technical architecture diagram titled "Customer 360 Data Pipeline" rendered in a clean, modern style against a white background. The diagram flows left-to-right across six vertical columns, connected by directional arrows. **Column 1: Sources (leftmost)** Four source boxes stacked vertically, each with a small icon: - PostgreSQL database icon: "App Database (events, users)" — label: "~500K events/day" - Stripe logo: "Stripe API (subscriptions, payments)" — label: "Webhook + daily batch" - HubSpot logo: "HubSpot CRM (contacts, deals)" — label: "API pull every 6hrs" - Cloud upload icon: "CSV Uploads (manual imports)" — label: "Ad-hoc" **Column 2: Ingestion** Two parallel paths diverge from the sources: - Top path (green): "Real-time Stream" — Kafka/Redis Streams icon, label: "< 1 min latency" - Bottom path (blue): "Batch Extract" — Airflow icon, label: "Daily at 6am UTC" Both paths merge into the next column. A small red monitoring badge sits between the paths: "Monitor: event volume, schema drift, extraction failures" **Column 3: Transformation** A large central box labeled "Processing Layer" containing: - "Data Quality Checks" (top section, amber background): schema validation, null checks, dedup - "Transformations" (middle section, teal background): cleaning, joining, enrichment, aggregation - "dbt Models" (bottom section): staging → intermediate → marts Data quality checkpoint badge (red): "Gate: reject if >2% null rate on required fields" Latency label: "Batch: 15-30 min | Stream: < 5 min" **Column 4: Storage** Two storage targets: - Top: "Data Lake (S3)" — layered as Raw → Clean → Curated, using a lake icon with three nested layers in progressively darker shades of blue - Bottom: "Data Warehouse (PostgreSQL)" — star schema icon showing fact_events at center connected to dim_customers, dim_products, dim_time Monitoring badge: "Monitor: storage costs, query performance, freshness SLAs" **Column 5: Serving** Three output channels: - Dashboard icon: "BI Dashboards (Metabase)" — label: "Exec, Product, Marketing dashboards" - API icon: "Analytics API" — label: "REST endpoints for internal tools" - Brain/ML icon: "ML Features" — label: "Churn prediction, recommendation engine" **Column 6: Monitoring (rightmost, spanning full height)** A vertical panel labeled "Observability" containing stacked monitoring tools: - "Airflow" — DAG health, task duration - "Great Expectations" — data quality test results - "Grafana" — pipeline metrics dashboards - "PagerDuty" — alerts for SLA breaches - "Data Catalog" — lineage tracking, documentation Three horizontal monitoring lines extend from this panel to touch each pipeline stage, labeled: - Line 1 (top, red): "Freshness SLAs — alert if data >2hrs stale" - Line 2 (middle, amber): "Quality scores — alert if <95% pass rate" - Line 3 (bottom, blue): "Volume anomalies — alert if >30% deviation from baseline" **Color scheme:** Navy blue (#1E293B) for infrastructure, teal (#0D9488) for transformation, indigo (#4F46E5) for storage, emerald (#059669) for serving, red (#EF4444) for monitoring touchpoints. Arrows use a medium gray with directional arrowheads. Each column has a subtle gray background band alternating with white to create visual separation. **Footer:** "Data freshness SLA: Executive dashboard <2hrs, Product analytics <30min, ML features <15min" in small text.