flowchart TB
subgraph Data["Data Ingestion Layer"]
SECOM[(SECOM Sensor Data)]
WM811K[(WM811K Wafer Images)]
VAL[Data Validation]
end
subgraph ML["Machine Learning Pipeline"]
PRE[Preprocessing]
FE[Feature Engineering]
CNN[CNN Model]
AE[Autoencoder]
ENS[Ensemble Model]
end
subgraph INF["Inference & Alert System"]
API[FastAPI Endpoint]
CONF[Confidence Scoring]
ALERT[Alert Generator]
end
subgraph VIZ["Visualization Dashboard"]
STREAM[Streamlit Dashboard]
KPI[KPI Metrics]
GCAM[GradCAM Views]
HIST[Historical Trends]
end
subgraph MES["MES Integration"]
SIM[MES Simulator]
HOOK[Integration Hooks]
end
SECOM --> VAL
WM811K --> VAL
VAL --> PRE
PRE --> FE
FE --> CNN
FE --> AE
CNN --> ENS
AE --> ENS
ENS --> API
API --> CONF
CONF --> ALERT
API --> STREAM
ALERT --> STREAM
STREAM --> KPI
STREAM --> GCAM
STREAM --> HIST
API --> HOOK
SIM --> HOOK
WaferGuard ML - Project Blueprint
Team: Milos, Hernan, Rolando, Oliver Duration: 10-12 weeks (5-6 sprints) Sprint Length: 2 weeks
System Architecture
Data Flow
flowchart LR
A[Raw Data] --> B[Preprocessing]
B --> C[Feature Engineering]
C --> D[Model Training]
D --> E[Validation]
E --> F[Deployment]
F --> G[Real-time Inference]
G --> H[Dashboard]
H --> I[Alerts]
style A fill:#e1f5fe
style D fill:#fff3e0
style H fill:#e8f5e9
style I fill:#ffebee
Epics Overview
Recommended Epic Structure
| Epic ID | Name | Description | Phase Alignment |
|---|---|---|---|
| KAN-1 | Data Pipeline & Foundation | Environment setup, data acquisition, EDA, preprocessing | Phase 1-2 |
| KAN-2 | Anomaly Detection (ML) Model | CNN + Autoencoder development, training, validation | Phase 3 |
| KAN-3 | MES Integration & API | FastAPI inference endpoints, MES simulation, alert system | Phase 4 |
| KAN-4 | Analytics + Dashboard | Streamlit dashboard, visualizations, KPIs, GradCAM | Phase 4 |
| KAN-5 | Governance & Audit-ability | Model versioning (MLflow), documentation, evaluation reports | Phase 5 |
Why This Structure?
Your original epics were solid. I recommend: - Split KAN-1 into Data Pipeline (new KAN-1) and ML Model (keep as KAN-2) - Keep KAN-2 (MES Integration) as KAN-3 - Keep KAN-3 (Analytics+Dashboard) as KAN-4 - Keep KAN-4 (Governance) as KAN-5
This gives you a clear dependency chain: Data → Model → Integration → Dashboard → Documentation
Epic Dependency Flow
flowchart LR
KAN1[KAN-1<br/>Data Pipeline]
KAN2[KAN-2<br/>ML Model]
KAN3[KAN-3<br/>MES Integration]
KAN4[KAN-4<br/>Dashboard]
KAN5[KAN-5<br/>Governance]
KAN1 --> KAN2
KAN2 --> KAN3
KAN2 --> KAN4
KAN3 --> KAN4
KAN4 --> KAN5
KAN3 --> KAN5
style KAN1 fill:#bbdefb,color:#000000
style KAN2 fill:#c8e6c9,color:#000000
style KAN3 fill:#fff9c4,color:#000000
style KAN4 fill:#ffccbc,color:#000000
style KAN5 fill:#e1bee7,color:#000000
Sprint Timeline
gantt
title WaferGuard ML - Sprint Timeline
dateFormat YYYY-MM-DD
section KAN-1 Data Pipeline
Sprint 1 - Foundation :s1, 2026-01-27, 14d
Sprint 2 - Data Pipeline :s2, after s1, 14d
section KAN-2 ML Model
Sprint 3 - Model Dev :s3, after s2, 14d
Sprint 4 - Refinement :s4, after s3, 14d
section KAN-3 & KAN-4
Sprint 4 - API :s4b, after s3, 14d
Sprint 5 - Dashboard :s5, after s4, 14d
section KAN-5 Governance
Sprint 6 - Documentation :s6, after s5, 14d
Sprint Breakdown
Sprint 1: Foundation (Week 1-2)
Epic: KAN-1 (Data Pipeline & Foundation) - 80% Done
| Task ID | Task | Owner | Status |
|---|---|---|---|
| KAN-8 | Set up Python Environment | Oliver | DONE ✅ |
| KAN-9 | Initialize Git Repo With Branching Strategy + Rules | Oliver | DONE ✅ |
| KAN-10 | Download Secom Dataset + Verify Integrity | Oliver | DONE ✅ |
| KAN-11 | Download WM811 Image Analysis + Verify | Oliver | DONE ✅ |
| KAN-12 | Run EDA on Secom Sensor Data | Rolando | IN REVIEW 🔄 |
| KAN-13 | Run EDA on WM811K Image analysis | Oliver | DONE ✅ |
| KAN-14 | Data Quality Assessment Report | Unassigned | TO DO 📋 |
| KAN-15 | Set Up Jira Board with epics/stories | Rolando | DONE ✅ |
| KAN-46 | Research about how the pixels are generated from and actual image | Milos | DONE ✅ |
| KAN-47 | Create slides for initial images EDA findings | Oliver | DONE ✅ |
Sprint 1 Goal: Development environment ready, datasets loaded, initial EDA complete
Sprint 2: Data Pipeline (Week 3-4)
Epic: KAN-1 (Data Pipeline & Foundation)
| Task ID | Task | Owner | Story Points | Status |
|---|---|---|---|---|
| KAN-1-010 | SECOM preprocessing pipeline (missing values, outliers) | TBD | 5 | TODO |
| KAN-1-011 | Feature engineering for sensor data | TBD | 5 | TODO |
| KAN-1-012 | WM811K image preprocessing (resize, normalize) | TBD | 3 | TODO |
| KAN-1-013 | Image augmentation strategy (rotation, flip) | TBD | 3 | TODO |
| KAN-1-014 | Train/validation/test split implementation | TBD | 2 | TODO |
| KAN-1-015 | Data loader classes (PyTorch/TensorFlow) | TBD | 5 | TODO |
| KAN-1-016 | Handle class imbalance (SMOTE/class weights) | TBD | 3 | TODO |
Sprint 2 Goal: Clean, processed datasets ready for model training
Sprint 3: ML Model Development (Week 5-6)
Epic: KAN-2 (Anomaly Detection Model)
| Task ID | Task | Owner | Story Points | Status |
|---|---|---|---|---|
| KAN-2-001 | CNN architecture design for wafer classification | TBD | 5 | TODO |
| KAN-2-002 | Implement CNN model (PyTorch/TensorFlow) | TBD | 5 | TODO |
| KAN-2-003 | Autoencoder architecture for sensor anomaly | TBD | 5 | TODO |
| KAN-2-004 | Implement Autoencoder model | TBD | 5 | TODO |
| KAN-2-005 | Training pipeline with early stopping | TBD | 3 | TODO |
| KAN-2-006 | Hyperparameter tuning setup | TBD | 3 | TODO |
| KAN-2-007 | Baseline model benchmarking | TBD | 3 | TODO |
Sprint 3 Goal: Working CNN and Autoencoder models with baseline metrics
Sprint 4: Model Refinement + API (Week 7-8)
Epics: KAN-2 (Model), KAN-3 (MES Integration)
| Task ID | Task | Owner | Story Points | Status |
|---|---|---|---|---|
| KAN-2-008 | Model validation and cross-validation | TBD | 3 | TODO |
| KAN-2-009 | Ensemble model (CNN + Autoencoder) | TBD | 5 | TODO |
| KAN-2-010 | GradCAM implementation for explainability | TBD | 5 | TODO |
| KAN-3-001 | FastAPI inference endpoint design | TBD | 3 | TODO |
| KAN-3-002 | Implement /predict endpoint | TBD | 5 | TODO |
| KAN-3-003 | Confidence scoring and threshold logic | TBD | 3 | TODO |
| KAN-3-004 | MES simulation mock data generator | TBD | 3 | TODO |
| KAN-3-005 | Alert generation logic | TBD | 3 | TODO |
Sprint 4 Goal: Optimized models, working inference API
Sprint 5: Dashboard Development (Week 9-10)
Epic: KAN-4 (Analytics + Dashboard)
| Task ID | Task | Owner | Story Points | Status |
|---|---|---|---|---|
| KAN-4-001 | Dashboard wireframes in Figma | TBD | 3 | TODO |
| KAN-4-002 | Streamlit app skeleton | TBD | 2 | TODO |
| KAN-4-003 | Real-time anomaly detection view | TBD | 5 | TODO |
| KAN-4-004 | Historical trend analysis charts | TBD | 5 | TODO |
| KAN-4-005 | Model confidence visualization | TBD | 3 | TODO |
| KAN-4-006 | GradCAM attention map display | TBD | 5 | TODO |
| KAN-4-007 | Production KPIs (OEE, defect rate) | TBD | 3 | TODO |
| KAN-4-008 | Alert notification panel | TBD | 3 | TODO |
| KAN-4-009 | Interactive filtering and drill-down | TBD | 5 | TODO |
Sprint 5 Goal: Functional dashboard with all visualizations
Sprint 6: Evaluation & Documentation (Week 11-12)
Epic: KAN-5 (Governance & Audit-ability)
| Task ID | Task | Owner | Story Points | Status |
|---|---|---|---|---|
| KAN-5-001 | Comprehensive model evaluation report | TBD | 5 | TODO |
| KAN-5-002 | Comparison vs baseline (SPC, manual) | TBD | 5 | TODO |
| KAN-5-003 | Model versioning documentation | TBD | 3 | TODO |
| KAN-5-004 | API documentation (OpenAPI/Swagger) | TBD | 3 | TODO |
| KAN-5-005 | User guide for dashboard | TBD | 3 | TODO |
| KAN-5-006 | Technical architecture documentation | TBD | 3 | TODO |
| KAN-5-007 | Final presentation slides | TBD | 5 | TODO |
| KAN-5-008 | Live demo preparation | TBD | 3 | TODO |
| KAN-5-009 | Code cleanup and refactoring | TBD | 3 | TODO |
Sprint 6 Goal: Complete documentation, presentation ready
Work Stream Assignments
| Work Stream | Specialist | Assistant | Primary Epics |
|---|---|---|---|
| Data Engineering & ML Pipeline | TBD | TBD | KAN-1 |
| Model Development & Training | TBD | TBD | KAN-2 |
| Dashboard & Integration | TBD | TBD | KAN-3, KAN-4 |
| Evaluation & Documentation | TBD | TBD | KAN-5 |
Team Structure
flowchart TB
subgraph Team["WaferGuard ML Team"]
subgraph WS1["Data Engineering"]
DE1[Specialist: TBD]
DE2[Assistant: TBD]
end
subgraph WS2["Model Development"]
MD1[Specialist: TBD]
MD2[Assistant: TBD]
end
subgraph WS3["Dashboard & Integration"]
DI1[Specialist: TBD]
DI2[Assistant: TBD]
end
subgraph WS4["Evaluation & Docs"]
ED1[Specialist: TBD]
ED2[Assistant: TBD]
end
end
WS1 -->|Data| WS2
WS2 -->|Models| WS3
WS3 -->|System| WS4
style WS1 fill:#bbdefb
style WS2 fill:#c8e6c9
style WS3 fill:#fff9c4
style WS4 fill:#e1bee7
ML Model Architecture
flowchart TB
subgraph Input["Input Data"]
IMG[Wafer Images<br/>WM811K]
SEN[Sensor Data<br/>SECOM]
end
subgraph CNN["CNN Pipeline"]
C1[Conv2D + ReLU]
C2[MaxPool]
C3[Conv2D + ReLU]
C4[MaxPool]
C5[Flatten]
C6[Dense + Softmax]
end
subgraph AE["Autoencoder Pipeline"]
E1[Encoder]
E2[Latent Space]
E3[Decoder]
E4[Reconstruction Error]
end
subgraph Ensemble["Ensemble Decision"]
COMB[Score Combination]
THRESH[Threshold Logic]
OUT[Final Prediction]
end
IMG --> C1 --> C2 --> C3 --> C4 --> C5 --> C6
SEN --> E1 --> E2 --> E3 --> E4
C6 --> COMB
E4 --> COMB
COMB --> THRESH --> OUT
style CNN fill:#e3f2fd
style AE fill:#f3e5f5
style Ensemble fill:#e8f5e9
Technology Stack (Recommended)
Core: Python 3.9+
ML Framework: PyTorch (recommended) or TensorFlow/Keras
Data: NumPy, Pandas, Scikit-learn, OpenCV
Dashboard: Streamlit (faster dev) or Plotly Dash
API: FastAPI
Tracking: MLflow
Explainability: SHAP, GradCAM
Version Control: Git + GitHub
Project Mgmt: Jira
Tech Stack Diagram
flowchart LR
subgraph Frontend["Frontend"]
ST[Streamlit]
FIG[Figma]
end
subgraph Backend["Backend"]
FA[FastAPI]
ML[MLflow]
end
subgraph ML_Stack["ML Stack"]
PT[PyTorch]
SK[Scikit-learn]
CV[OpenCV]
end
subgraph Data_Stack["Data"]
PD[Pandas]
NP[NumPy]
end
subgraph DevOps["DevOps"]
GIT[Git/GitHub]
JIRA[Jira]
end
Data_Stack --> ML_Stack --> Backend --> Frontend
DevOps -.-> Backend
DevOps -.-> ML_Stack
Key Datasets
| Dataset | Description | Size | Source |
|---|---|---|---|
| SECOM | Sensor data (590 features) | ~1,500 samples | UCI ML Repository |
| WM811K | Wafer map images | ~38,000 images | Kaggle |
Data Pipeline Overview
flowchart LR
subgraph Sources["Data Sources"]
S1[(SECOM<br/>UCI Repository)]
S2[(WM811K<br/>Kaggle)]
end
subgraph Processing["Processing"]
P1[Missing Value<br/>Imputation]
P2[Outlier<br/>Detection]
P3[Normalization]
P4[Image Resize<br/>& Augment]
end
subgraph Output["Processed Data"]
O1[Train Set<br/>70%]
O2[Val Set<br/>15%]
O3[Test Set<br/>15%]
end
S1 --> P1 --> P2 --> P3 --> O1 & O2 & O3
S2 --> P4 --> O1 & O2 & O3
style Sources fill:#e3f2fd
style Processing fill:#fff8e1
style Output fill:#e8f5e9
Success Criteria
| Metric | Target |
|---|---|
| CNN Accuracy | >90% |
| False Positive Rate | <5% |
| Inference Latency | <500ms |
| Dashboard Refresh | <2s |
Story Points by Epic
pie showData
title Story Points Distribution
"KAN-1 Data Pipeline" : 47
"KAN-2 ML Model" : 42
"KAN-3 MES Integration" : 17
"KAN-4 Dashboard" : 34
"KAN-5 Governance" : 33
Inference Flow
sequenceDiagram
participant MES as MES System
participant API as FastAPI
participant Model as ML Model
participant DB as MLflow
participant Dash as Dashboard
participant Eng as Process Engineer
MES->>API: POST /predict (wafer data)
API->>Model: Load model from registry
Model->>DB: Get latest model version
DB-->>Model: Return model weights
Model->>Model: Run inference
Model-->>API: Prediction + confidence
API->>API: Apply threshold logic
alt Anomaly Detected
API->>Dash: Send alert
Dash->>Eng: Display notification
API-->>MES: Response (anomaly: true)
else Normal
API-->>MES: Response (anomaly: false)
end
API->>Dash: Update real-time view
Sprint Workflow
stateDiagram-v2
[*] --> Planning: Sprint Start
Planning --> InProgress: Tasks Assigned
InProgress --> CodeReview: PR Created
CodeReview --> Testing: Approved
Testing --> Done: Tests Pass
CodeReview --> InProgress: Changes Requested
Testing --> InProgress: Tests Fail
Done --> [*]: Sprint End
note right of Planning: 2 hour session
note right of InProgress: Daily standups
note right of Done: Sprint review
Current Sprint Status
Active Sprint: Sprint 1 - Foundation Sprint Start: TBD Sprint End: TBD
Sprint 1 Progress
Meeting Schedule
| Meeting | Frequency | Duration | Day/Time |
|---|---|---|---|
| Sprint Planning | Bi-weekly | 2 hours | Sprint Day 1 |
| Daily Standup | Daily | 15 min | TBD |
| Sprint Review | Bi-weekly | 1.5 hours | Sprint Last Day |
| Sprint Retro | Bi-weekly | 1 hour | Sprint Last Day |
Quick Links
Last Updated: 2026-01-23