Anatomy of a Pipeline YAML
1. Fundamental Structure
Every pipeline consists of these core components:
pipeline.yaml
├── Metadata (name, version)
├── Inputs (data entry points)
├── Outputs (results delivery)
├── Nodes (processing units)
└── Flows (data highways)
2. Component Deep Dive
2.1 Metadata Section
name: "Face Styler" # Pipeline identifier
version: "1.0.2" # Versioning (SemVer recommended)
description: "Transforms portraits into artistic styles"
Why it matters:
name
appears in UI/APIversion
enables change trackingdescription
helps discovery
2.2 Inputs/Outputs System
Inputs (Pipeline's API)
inputs:
user_photo: # Logical name
type: image # Data type constraint
required: true # Validation rule
title: "Your Portrait" # UI label
default: null # Fallback value
Outputs (Results Interface)
outputs:
styled_image:
type: image
title: "Artistic Version"
analysis_report:
type: json
Key Differences:
Aspect | Inputs | Outputs |
---|---|---|
Purpose | Data ingestion | Result delivery |
Mutability | User-provided | Read-only |
Validation | Required/optional | Always generated |
2.3 Nodes Architecture
nodes:
face_detector: # Node ID
category: "image_analysis" # Functional group
script: "detect_faces.js" # Processing logic
inputs: # Required data
photo: { type: image }
outputs: # Produced artifacts
faces: { type: json }
Node Types:
- Input processors: First data handlers
- Transformers: Data modifiers
- Terminals: Produce final outputs
2.4 Flow Mechanics
flows:
detection_to_styling:
from: face_detector.faces # Source node.output
to: style_applier.faces # Target node.input
conditions: # Optional rules
- min_faces: 1
Flow Types:
- Linear: Sequential (A→B→C)
- Fan-out: One-to-many (A→B, A→C)
- Conditional: Branched (A→B if X else C)
3. Execution Lifecycle
3.1 Startup Sequence
start:
nodes:
- initial_processor # Entry point
- parallel_starter # Concurrent init
3.2 Environment Setup
environment:
API_ENDPOINT:
title: "Service URL"
type: string
scope: pipeline # vs 'global' scope
4. Real-world Example
# document_processor.yaml
name: "PDF Analyzer"
inputs:
pdf_file:
type: file
formats: [pdf]
outputs:
text_content: string
page_count: number
nodes:
pdf_extractor:
script: "pdf.js"
inputs:
document: { from: pdf_file }
outputs:
raw_text: { to: text_content }
pages: { to: page_count }
flows:
file_processing:
from: input.pdf_file
to: pdf_extractor.document
What happens when run:
- User uploads PDF
- System routes file to
pdf_extractor
- Script processes document
- Results populate both outputs
5. Design Principles
-
Modularity:
- Nodes should be single-purpose
- Example: Separate
face_detector
andstyle_applier
-
Discoverability:
description: "Extracts text from scanned PDFs using OCR"
tags: ["documents", "text-recognition"] -
Error Resilience:
inputs:
photo:
constraints:
min_resolution: [512, 512]
6. Anti-patterns to Avoid
❌ Monolithic Nodes
# Bad: Does too much
nodes:
mega_processor:
script: "do_everything.js"
✅ Preferred Approach
# Good: Separated concerns
nodes:
preprocessor: {...}
analyzer: {...}
formatter: {...}
7. Debugging Tips
-
Flow Visualization:
graph LR
A[Input] --> B(Processor)
B --> C[Output] -
Validation Command:
pipeline validate my_pipeline.yaml
-
Inspection Points:
nodes:
debug_logger:
script: "log_intermediate.js"
IN WORK