CUE for Safety-Critical Infrastructure Validation

Published on February 6, 2026

אם ירצה ה׳

Configuration errors cause over 50% of production outages. CUE stops these errors before deployment through mathematically rigorous constraint validation that traditional tools cannot match. For safety-critical systems—where misconfiguration can harm people—CUE provides executable safety requirements with formal verification properties that support certification to ISO 26262, IEC 61508, and DO-178C.

Implementation Essentials

Aspect	Key Point
What it is	Constraint language for YAML/JSON validation with lattice-based semantics
Integration	CLI (`cue vet`), Go API, CI/CD native; complements OPA Rego
Timeline	8-12 weeks to production validation for pilot domain
Investment	Moderate training (constraint-based thinking); low infrastructure
Risk reduction	Early defect detection, guaranteed constraint enforcement, audit evidence

Business Outcomes

Reduced incident frequency: Configuration errors caught pre-deployment
Accelerated compliance: Automated evidence generation for safety certification
Competitive differentiation: Demonstrable safety engineering excellence
Lower certification cost: Formal verification properties reduce manual review

2. The Problem: When Infrastructure Configuration Becomes Safety-Critical

2.1 From Convenience to Consequence

Infrastructure as Code began as a convenience—version-controlled, repeatable deployments. For most organizations, it remains there: configuration errors cause downtime, financial loss, operational friction. But for safety-critical systems—autonomous vehicles, medical devices, industrial control—the same configuration errors can cause harm.

The YAML that configures a Kubernetes deployment for a recommendation engine and the YAML that configures a Kubernetes deployment for a surgical robot use identical syntax. The difference is not in the format but in the consequences of misconfiguration: resource starvation that degrades recommendations versus resource starvation that interrupts critical monitoring.

2.2 The YAML Validation Gap

YAML's design priorities—human readability, flexibility, minimal syntax—directly conflict with safety assurance requirements. Consider:

replicas: 3
resources:
  limits:
    memory: "128Mi"  # Is this sufficient? Safe? Tested?
  requests:
    memory: "64Mi"   # Must be ≤ limits, but who checks?

Traditional validation answers: syntactically valid? (yes, well-formed YAML). Schema valid? (maybe, if JSON Schema defines the fields). Safe? (unknown—safety is outside validation scope).

2.3 Why Traditional Tools Fall Short

Tool Category	Example	Limitation for Safety
Regex scanners	Early CloudFormation tools	Cannot parse structure; false positives/negatives
Imperative rules	TFLint, early Checkov	Order-dependent; don't compose; conflict detection late
JSON Schema	kubeval, many validators	No cross-field constraints; limited conditionals; no defaults
OPA Rego	Gatekeeper	Runtime-focused; static analysis secondary; no formal semantics

What safety-critical validation requires: mathematical guarantees that constraints hold, regardless of how configurations are composed or ordered.

3. CUE: A Technical Primer for Security Engineers

3.1 What Makes CUE Different

CUE is not a better schema language. It is a constraint programming language for configuration, with three distinctive properties:

Order-independence: A & B equals B & A. Always. No override surprises.

Monotonicity: Adding constraints can only make results more specific or fail. Never less specific.

Explicit failure: Incompatible constraints produce _|_ (bottom), with precise error location. Never silent override.

These properties emerge from lattice-based semantics, not implementation choice. They are provable and portable across CUE implementations.

3.2 Constraint Unification in Practice

Basic CUE validates what JSON Schema validates, then goes further:

// Schema: what must be true
#Deployment: {
    replicas: int & >0 & <100      // Type + bounds
    image: =~"^registry.company.io/"  // Pattern
    resources: {
        requests: memory: string
        limits: memory: string
        // Cross-field: limits ≥ requests
        limits: memory: >=requests.memory
    }
    // Cross-field: replicas affects other constraints
    if replicas > 1 {
        strategy: type: "RollingUpdate"
    }
}

// Data: what is configured
myApp: #Deployment & {
    replicas: 3
    image: "registry.company.io/app:v1.2.3"
    resources: {
        requests: memory: "64Mi"
        limits: memory: "128Mi"  // Valid: 128Mi ≥ 64Mi
    }
}

The & operator is unification, not Boolean AND. It produces the most specific value satisfying both operands, or _|_ if none exists.

3.3 YAML Integration at Scale

CUE's cue vet command validates existing YAML without conversion:

# Validate all YAML in directory against schema
cue vet -c deployment.cue -d '#Deployment' k8s/*.yaml

# Concrete (-c) requires all fields have values
# Violations: precise file:line:column in both schema and data

For CI/CD:

# .github/workflows/validate.yml
- name: CUE Validation
  run: cue vet -c ./schemas ./deployments
  # Non-zero exit on violation blocks deployment

4. Safety Engineering with CUE: Three Concrete Patterns

4.1 Pattern 1: SIL-Aware Kubernetes Resource Validation

Safety Integrity Levels require demonstrable implementation of safety mechanisms. CUE encodes this structurally:

// Base: ASIL-agnostic safety fundamentals
#SafetyBase: {
    runAsNonRoot: true
    readOnlyRootFilesystem: true
    allowPrivilegeEscalation: false
    seccompProfile: type: "RuntimeDefault"
}

// ASIL D: highest integrity, most constraints
#ASIL_D: #SafetyBase & {
    replicas: >=2  // Redundancy required
    // Anti-affinity: distribute across failure domains
    affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: [{
        labelSelector: matchLabels: app: string
        topologyKey: "topology.kubernetes.io/zone"
    }]
    // Health monitoring with tight thresholds
    livenessProbe: {
        initialDelaySeconds: <=10
        periodSeconds: <=5
        failureThreshold: <=3
    }
    // Resource guarantees for predictable performance
    resources: {
        requests: cpu: string
        limits: cpu: requests.cpu  // Guaranteed QoS: requests == limits
    }
}

// Apply to critical workload
criticalService: #ASIL_D & {
    replicas: 3
    // ... other configuration
}

Validation guarantees: Any deployment claiming ASIL_D compliance must satisfy all structural requirements. Missing anti-affinity, excessive probe thresholds, or burstable QoS are caught before deployment, not in safety audit.

4.2 Pattern 2: Automated HAZOP Guide Word Enforcement

HAZOP guide words systematically identify hazardous deviations. CUE constraints can encode preventive patterns:

Guide Word	Hazard	CUE Prevention
NO	Required safety mechanism absent	Mandatory fields with no default (`!`)
MORE	Excessive resource allocation causing starvation	Upper bounds with safety margin
LESS	Insufficient redundancy	Minimum replica constraints
AS WELL AS	Unexpected capabilities from unknown fields	Closed schemas (`{...}` rejects unknowns)
OTHER THAN	Invalid operational mode	Exhaustive disjunctions with no default

Example—preventing "NO [safety monitoring]":

import ( 
            "list"
            "strings"
)

#MonitoredWorkload: {
    // The ! means: must be specified, no default
    metricsPort: int & >1024 & <65536
    healthEndpoint: string & =~"^/health"
    alertingRules: [...string] & list.MinItems(1)
    
    // Derived: monitoring must be reachable
    _monitoringValid: ports.containerPort == metricsPort
}

4.3 Pattern 3: Fault Tree Cut Set Prevention in Terraform

Note: This pattern is architectural; specific implementations require HCL-to-CUE conversion.

Fault Tree Analysis identifies minimal cut sets—smallest combinations of failures causing system hazard. CUE can enforce structural prevention: diversity requirements that eliminate common cause failures.

import ( 
           "list"
           "strings"
)

// For safety-critical redundancy: diverse implementations required
#DiverseRedundancy: {
    channels: [...#Channel] & list.MinItems(2)
    // All channels must have distinct implementations
    _implementations: [for c in channels {c.implementation}]
    _unique: list.Unique(_implementations) & {
        len(this) == len(channels)  // No duplicates
    }
}

#Channel: {
    implementation: string  // e.g., "vendorA-v1.2", "vendorB-v3.4"
    nodeSelector: topology.kubernetes.io/zone: string
    // Zones must differ across channels
}

5. Implementation in Production Environments

5.1 Pipeline Integration Architecture

┌─────────────────┐     ┌─────────────┐     ┌─────────────────┐
│  Developer IDE  │────→│  Pre-commit │────→│   PR Validation │
│  (CUE LSP)      │     │  (cue vet)  │     │  (full schemas) │
└─────────────────┘     └─────────────┘     └─────────────────┘
                                                    │
┌─────────────────┐     ┌─────────────┐            │
│  Production     │←────│  Deployment │←───────────┘
│  (monitored)    │     │  Gate       │
└─────────────────┘     └─────────────┘
        │
        ↓
┌─────────────────┐
│  Incident:      │
│  CUE validation │
│  evidence for   │
│  safety case    │
└─────────────────┘

5.2 Constraint Library Organization

cue.mod/
├── pkg/
│   ├── chokmah.io/safety/v1/      # Base safety patterns
│   │   ├── asil.cue               # ASIL-A through D
│   │   ├── sil.cue                # SIL 1-4 mappings
│   │   └── redundancy.cue         # N-modular, voting
│   ├── chokmah.io/security/v1/    # Security controls
│   │   ├── podsecurity.cue        # PSS implementation
│   │   ├── network.cue            # Zero-trust patterns
│   │   └── rbac.cue               # Least-privilege
│   └── chokmah.io/compliance/v1/  # Framework mappings
│       ├── iso26262.cue           # Automotive
│       ├── iec61508.cue           # Generic functional safety
│       └── nist80053.cue          # US government
└── usr/                           # Application constraints
    └── myapp/
        └── deployment.cue         # Uses chokmah.io/safety/v1

5.3 Measuring Safety Outcomes

Metric	Measurement	Target
Configuration defect escape rate	Defects found in production / total defects	<5% (vs. industry ~50% pre-validation)
Time to safety constraint violation detection	Commit to notification	<5 minutes (pre-commit ideal)
Safety case evidence automation	Validated constraints / total safety requirements	>80% for structural requirements
Constraint library coverage	Resources with CUE schemas / total resource types	100% for safety-critical types

6. Getting Started: A 30-Day Adoption Plan

6.1 Week 1-2: Pilot Selection and Team Enablement

Select pilot: Single application, Kubernetes-native, existing configuration error pain
Assemble team: 2-3 engineers with Go experience, safety engineering liaison
Training: CUE fundamentals (https://cuelang.org/docs/tutorials/), constraint-based thinking workshop
Initial schema: Port existing JSON Schema or develop from safety requirements

6.2 Week 3-4: Production Constraint Deployment

CI integration: cue vet in PR checks, blocking on violation
Developer experience: IDE plugins, pre-commit hooks
Constraint refinement: Based on initial feedback, false positive elimination
Documentation: Constraint purpose, safety rationale, example violations

6.3 Month 2+: Scaling and Optimization

Expand coverage: Additional resource types, cross-resource constraints
Organizational rollout: Training additional teams, constraint library governance
Advanced patterns: Template validation, differential analysis, safety case integration
Community contribution: Share patterns, engage with CUE safety-critical use case development

7. Resources and Expert Engagement

7.1 Open Source Tools and Libraries

Resource	URL	Purpose
CUE Language	https://cuelang.org	Core language and tooling
CUE Kubernetes	https://github.com/cue-labs/cue-api-machinery	K8s-specific patterns