CUE for Safety-Critical Configuration

Published on February 6, 2026

אִם יִרְצֶה הַשֵּׁם

CUE does one thing that JSON Schema and OPA cannot: it makes constraint composition provably order-independent and monotone. For a safety-critical system, that property is the difference between a validation framework you can reason about formally and one you have to test exhaustively. Start with a pilot, instrument the defect escape rate, and you will have the evidence your safety case needs.

Prologue

Configuration errors cause over 50% of production outages (Puppet State of DevOps 2023). CUE stops these errors before deployment through mathematically rigorous constraint validation that traditional tools cannot match. For safety-critical systems, where misconfiguration can harm people, CUE provides executable safety requirements with formal verification properties that support certification to ISO 26262, IEC 61508, and DO-178C.

This guide proves a specific claim: CUE's lattice-based semantics enables safety constraint verification properties that JSON Schema and OPA cannot provide. We demonstrate this across three production patterns, with executable code. The primary reader is a defense or automotive systems engineer choosing configuration validation tooling.

Summary

Aspect	Key Point
What it is	Constraint language for YAML/JSON validation with lattice-based semantics
Integration	CLI (`cue vet`), Go API, CI/CD native; complements OPA Rego
Timeline	4-8 weeks to pilot validation for one domain; 6+ months to full production rollout
Investment	Moderate training (constraint-based thinking); low infrastructure
Risk reduction	Pre-deployment defect detection, guaranteed constraint enforcement, audit evidence

Business Outcomes

Reduced incident frequency: Configuration errors caught pre-deployment
Accelerated compliance: Automated evidence generation for safety certification
Competitive differentiation: Demonstrable safety engineering excellence
Lower certification cost: Formal verification properties reduce manual review

2. The Problem: When Infrastructure Configuration Becomes Safety-Critical

2.1 From Convenience to Consequence

Infrastructure as Code began as a convenience, version-controlled and repeatable. For most organizations, it stays there: configuration errors cause downtime and financial loss. For safety-critical systems, autonomous vehicles, medical devices, industrial control, the same errors can cause harm.

The YAML that configures a Kubernetes deployment for a recommendation engine and the YAML that configures one for a surgical robot use identical syntax. The difference is not format but consequences of misconfiguration: resource starvation that degrades recommendations versus resource starvation that interrupts critical monitoring.

2.2 The YAML Validation Gap

YAML's design priorities, human readability, flexibility, and minimal syntax, directly conflict with safety assurance requirements. Consider:

replicas: 3
resources:
  limits:
    memory: "128Mi"  # Is this sufficient? Safe? Tested?
  requests:
    memory: "64Mi"   # Must be <= limits, but who checks?

Traditional validation answers: syntactically valid? (yes, well-formed YAML). Schema valid? (maybe, if JSON Schema defines the fields). Safe? (unknown, safety is outside validation scope).

2.3 Why Traditional Tools Fall Short

Tool Category	Example	Limitation for Safety
Regex scanners	Early CloudFormation tools	Cannot parse structure; false positives/negatives
Imperative rules	TFLint, early Checkov	Order-dependent; don't compose; conflict detection late
JSON Schema	kubeval, many validators	No cross-field constraints; limited conditionals; no defaults
OPA Rego	Gatekeeper	Datalog-derived semantics; no lattice ordering; static analysis is secondary to runtime policy

OPA's Rego is powerful at runtime policy enforcement but lacks the formal monotonicity and order-independence guarantees that emerge from lattice semantics. See the CUE specification (cuelang.org/docs/references/spec/) versus OPA's policy language reference for the contrast.

What safety-critical validation requires: mathematical guarantees that constraints hold, regardless of how configurations are composed or ordered.

3. CUE: A Technical Primer for Systems Engineers

3.1 What Makes CUE Different

CUE is not a better schema language. It is a constraint programming language for configuration, with three distinctive properties:

Order-independence: A & B equals B & A. Always. No override surprises.

Monotonicity: Adding constraints can only make results more specific or fail. Never less specific.

Explicit failure: Incompatible constraints produce _|_ (bottom), with precise error location. Never silent override.

These properties emerge from lattice-based semantics, not implementation choice. They are provable and portable across CUE implementations.

3.2 Constraint Unification in Practice

Basic CUE validates what JSON Schema validates, then goes further:

// Schema: what must be true
#Deployment: {
    replicas: int & >0 & <100           // type + bounds
    image: =~"^registry.company.io/"    // pattern
    resources: {
        requests: memory: =~"^[0-9]+Mi$"
        limits:   memory: =~"^[0-9]+Mi$"
        // Cross-field: limits must be at least as large as requests.
        // In production CUE, use integer extraction via strconv.Atoi
        // or enforce the policy at the CI layer via cue vet + a Go validator.
        // The constraint below is conceptual pseudocode:
        // limits.memory >= requests.memory
    }
    if replicas > 1 {
        strategy: type: "RollingUpdate"
    }
}

// Data: what is configured
myApp: #Deployment & {
    replicas: 3
    image: "registry.company.io/app:v1.2.3"
    resources: {
        requests: memory: "64Mi"
        limits:   memory: "128Mi"
    }
}

The & operator is unification, not Boolean AND. It produces the most specific value satisfying both operands, or _|_ if none exists.

Note on string-valued memory comparisons: CUE cannot natively compare "128Mi" >= "64Mi" because these are strings, not integers. Production enforcement of memory limit >= request requires either (a) converting to integers in a Go wrapper, or (b) enforcing the rule at a CI step outside CUE. Do not treat the cross-field comment above as executable; it is architectural intent.

3.3 YAML Integration at Scale

CUE's cue vet command validates existing YAML without conversion:

# Validate all YAML in directory against schema
cue vet -c deployment.cue -d '#Deployment' k8s/*.yaml
# -c (concrete) requires all fields to have values
# Violations: precise file:line:column in both schema and data

Directory layout matters. See Section 5.2 for the recommended library structure before wiring this into CI.

For CI/CD:

# .github/workflows/validate.yml
- name: CUE Validation
  run: cue vet -c ./schemas ./deployments
  # Non-zero exit on violation blocks deployment

4. Safety Engineering with CUE: Three Patterns

4.1 Pattern 1: SIL-Aware Kubernetes Resource Validation

Safety Integrity Levels require demonstrable implementation of safety mechanisms. CUE encodes this structurally:

// Base: ASIL-agnostic safety fundamentals
#SafetyBase: {
    runAsNonRoot: true
    readOnlyRootFilesystem: true
    allowPrivilegeEscalation: false
    seccompProfile: type: "RuntimeDefault"
}

// ASIL D: highest integrity, most constraints
#ASIL_D: #SafetyBase & {
    replicas: >=2
    affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: [{
        labelSelector: matchLabels: app: string
        topologyKey: "topology.kubernetes.io/zone"
    }]
    livenessProbe: {
        initialDelaySeconds: <=10
        periodSeconds: <=5
        failureThreshold: <=3
    }
    resources: {
        requests: cpu: string
        limits: cpu: requests.cpu  // Guaranteed QoS: requests == limits
    }
}

criticalService: #ASIL_D & {
    replicas: 3
    // ... other configuration
}

Validation guarantees: Any deployment claiming ASIL_D compliance must satisfy all structural requirements. Missing anti-affinity, excessive probe thresholds, or burstable QoS are caught before deployment, not in a safety audit.

4.2 Pattern 2: Automated HAZOP Guide Word Enforcement

HAZOP guide words systematically identify hazardous deviations. CUE constraints encode preventive patterns:

Guide Word	Hazard	CUE Prevention
NO	Required safety mechanism absent	Mandatory unbound fields (no default means CUE emits `_\\|_` if unset)
MORE	Excessive resource allocation causing starvation	Upper bounds
LESS	Insufficient redundancy	Minimum replica constraints
AS WELL AS	Unexpected capabilities from unknown fields	Closed schemas (`{...}` rejects unknowns)
OTHER THAN	Invalid operational mode	Exhaustive disjunctions with no default

Example, preventing "NO [safety monitoring]":

import (
    "list"
)

#MonitoredWorkload: {
    // A field with a type constraint but no value means CUE requires it;
    // omitting it produces _|_ at vet time.
    metricsPort: int & >1024 & <65536
    healthEndpoint: string & =~"^/health"
    alertingRules: [...string] & list.MinItems(1)
}

4.3 Pattern 3: Fault Tree Cut Set Prevention (Conceptual)

The following is architectural intent. Mapping it to executable Terraform HCL requires HCL-to-CUE conversion tooling.

Fault Tree Analysis identifies minimal cut sets, the smallest failure combinations that cause a system hazard. CUE can enforce structural diversity requirements that eliminate common-cause failures:

import (
    "list"
)

// Two or more channels required; all must have distinct implementations
#DiverseRedundancy: {
    channels: [...#Channel] & list.MinItems(2)
    // list.UniqueItems is the correct CUE built-in
    _implCheck: list.UniqueItems([for c in channels {c.implementation}])
}

#Channel: {
    implementation: string  // e.g., "vendorA-v1.2", "vendorB-v3.4"
    nodeSelector: "topology.kubernetes.io/zone": string
}

5. Implementation in Production Environments

5.1 Pipeline Integration Architecture

+-----------------+     +-------------+     +-----------------+
|  Developer IDE  |---->|  Pre-commit |---->|   PR Validation |
|  (CUE LSP)      |     |  (cue vet)  |     |  (full schemas) |
+-----------------+     +-------------+     +-----------------+
                                                    |
+-----------------+     +-------------+            |
|  Production     |<----|  Deployment |<-----------+
|  (monitored)    |     |  Gate       |
+-----------------+     +-------------+
        |
        v
+-----------------+
|  Incident:      |
|  CUE validation |
|  schema + vet   |
|  output stored  |
|  as audit trail |
|  for ISO 26262  |
|  Table 9 (SW    |
|  V&V evidence)  |
+-----------------+

ISO 26262 Table 9 (Part 6) lists software verification and validation methods. CUE's cue vet output, combined with version-controlled schema files, constitutes documented evidence that structural safety requirements were checked at every deployment. Store schema files in git; store cue vet logs in your artifact repository. Cite schema version and log hash in the safety case.

5.2 Constraint Library Organization

cue.mod/
+-- pkg/
|   +-- chokmah.io/safety/v1/
|   |   +-- asil.cue
|   |   +-- sil.cue
|   |   +-- redundancy.cue
|   +-- chokmah.io/security/v1/
|   |   +-- podsecurity.cue
|   |   +-- network.cue
|   |   +-- rbac.cue
|   +-- chokmah.io/compliance/v1/
|       +-- iso26262.cue
|       +-- iec61508.cue
|       +-- nist80053.cue
+-- usr/
    +-- myapp/
        +-- deployment.cue

5.3 Measuring Safety Outcomes

Metric	Measurement	Target
Configuration defect escape rate	Defects found in production / total defects	<5% (Puppet State of DevOps 2023 reports ~50% without pre-deployment validation)
Time to safety constraint violation detection	Commit to notification	<5 minutes (pre-commit ideal)
Safety case evidence automation	Validated constraints / total safety requirements	>80% for structural requirements
Constraint library coverage	Resources with CUE schemas / total resource types	100% for safety-critical types

6. Getting Started

6.1 Weeks 1-2: Pilot Selection and Team Enablement

Select pilot: Single application, Kubernetes-native, existing configuration error pain
Assemble team: 2-3 engineers with Go experience, safety engineering liaison
Training: CUE fundamentals (cuelang.org/docs/tutorials/), constraint-based thinking workshop
Initial schema: Port existing JSON Schema or derive from safety requirements

6.2 Weeks 3-4: Production Constraint Deployment

CI integration: cue vet in PR checks, blocking on violation
Developer experience: IDE plugins, pre-commit hooks
Constraint refinement: Based on initial feedback, false positive elimination
Documentation: Constraint purpose, safety rationale, example violations

6.3 Month 2 and Beyond: Scaling

Production rollout for a domain with diverse resource types typically takes 6+ months. Expect three phases: schema coverage expansion, organizational training, and safety case integration.

Expand coverage: Additional resource types, cross-resource constraints
Organizational rollout: Training additional teams, constraint library governance
Advanced patterns: Template validation, differential analysis
Safety case integration: Link schema commits to safety case work products

7. Resources

Resource	URL	Purpose
CUE Language	cuelang.org	Core language and tooling
CUE Spec (formal semantics)	cuelang.org/docs/references/spec/	Lattice semantics reference
CUE Kubernetes	github.com/cue-labs/cue-api-machinery	K8s-specific patterns
Puppet State of DevOps 2023	puppet.com/resources/state-of-devops-report	Configuration defect statistics