Cloud-native Collaborative CAD: Server-Authoritative Event Sourcing, Dual Representation, and Hybrid OT/CRDT

December 05, 2025 10 min read

Cloud-native Collaborative CAD: Server-Authoritative Event Sourcing, Dual Representation, and Hybrid OT/CRDT

NOVEDGE Blog Graphics

Cloud-native architecture for real-time collaborative CAD: State model

Authoritative state and dual representation

The backbone of a modern collaborative CAD platform is a server-authoritative, event-sourced operation log that acts as the single source of truth. Rather than committing full document snapshots, every user action—dimension edits, feature reorders, mate changes—is persisted as an immutable operation with causality metadata. This furnishes an append-only provenance trail and makes branching, time-travel, and rollback natural. On top of the event stream, the system maintains a dual representation: a parametric history graph for intent and a geometric state for evaluation. The graph captures features, dependencies, constraints, and suppression states; the geometric state, run by a B-Rep kernel, yields precise topology. To keep views fluid, a progressive tessellation representation supplies mesh LODs timed to the camera and importance, enabling coarse-to-fine rendering while fine geometry continues to resolve.

Hybrid concurrency, predictable merges

Different structures demand different concurrency controls. A hybrid OT/CRDT model reconciles edits predictably: Operational Transformation (OT) maintains the order of strictly linear constructs like the feature tree or mate solve order, while CRDT maps and sets support free-form, collaborative metadata such as comments, review tags, and sketch notes without locks. The event log stores high-level, intent-rich ops (edit dimension D12, suppress Pattern2, reorder after Fillet3) so merges remain meaningful across versions. Deterministic validators check pre- and post-conditions to catch invalid sequences early and offer repair suggestions. Together, these layers yield a state model that is resilient to concurrency, faithful to design intent, and optimized for both precise computation and interactive viewing.

Cloud-native architecture: Services and runtimes

Service topology and protocols

At runtime, the architecture separates concerns through microservices that speak gRPC/Protobuf internally and stream to browsers via QUIC/WebTransport. A realtime session service manages presence, live cursors, and voice/text channels; a modeling kernel service executes deterministic evaluations either on server HPC clusters or in client-side WASM sandboxes for latency-sensitive interactions. Storage/CDN layers deliver assets, progressive tessellations, and delta streams. AuthN/AuthZ land with OIDC, ACLs, and SCIM integration. For over-the-wire efficiency, a binary delta protocol (FlatBuffers or Cap’n Proto) encodes feature ops, tessellation tiles, and state diffs with versioned schemas. Priority lanes separate control signals from model ops and assets so that jitter in heavy downloads does not stall user interactions.

GPU acceleration and sandboxing

Graphics and compute are balanced across client and server. Client-side WebGPU accelerates display, selection, silhouette extraction, and GPU picking; server GPUs render offscreen thumbnails, preview frames for mobile thin-clients, and generate denoised ray-traced snapshots for document histories. Kernel processes are isolated via containers with cgroup quotas and, where kernels are shipped to clients, WASM+WASI with seccomp-like policies restrict syscalls and file access. This mix preserves intellectual property, improves fault isolation, and delivers interactive performance. The result is a scalable, protocol-efficient mesh of services that keep collaboration responsive, secure, and portable across devices.

Cloud-native architecture: Data flow and streaming

Delta-encoding and interest management

Large assemblies and feature-dense parts demand carefully curated data flow. Operations are sent as compact, delta-encoded feature ops that apply against the server’s event log and the client’s speculative cache. For rendering, assemblies stream progressively using spatial tiles and Level of Detail (LOD), with interest management derived from view frustums, distance, and user focus. Critical subassemblies—those being edited, mated, or dimensioned—receive higher priority, while peripheral content trickles in. This approach avoids monolithic downloads and gets designers productive within seconds, with fidelity increasing as bandwidth permits.

Incremental rebuilds and stable topology

Geometry updates should be incremental. A per-feature invalidation graph determines the minimal subgraph to re-evaluate, supporting cached partial results and out-of-order compilation where dependencies allow. With incremental rebuilds, a suppressed pattern doesn’t trigger a full model recompute; only downstream dependents reflow. Crucially, topology is surfaced with persistent naming and stable IDs anchored to geometric signatures (faces, edges, vertices) so that annotations, PMI, and downstream merges remain attached even as features reorder. If a rename or topological change threatens stability, heuristics and signature matching preserve intent, and diffs remain tractable.

Cloud-native architecture: Deployment, tenancy, and security

Edge proximity and routing

Responsiveness hinges on proximity. Regional edge Points of Presence (PoPs) handle presence and transport termination, while kernel pools co-located with users crunch evaluations. Documents are sharded at the project or assembly level, and sticky routing pins participants to the same shard to minimize cross-region chatter. Session affinity aligns with compute caches, allowing hot features and tessellations to stay resident in memory across edits, cutting p95 latencies and mitigating cold-start penalties.

Isolation, governance, and keys

In a multi-tenant environment, isolation is non-negotiable. Per-tenant namespaces and cgroup limits enforce CPU/GPU quotas; kernel sandboxes wall off workloads with WASM+WASI and seccomp. Audit trails and policy hooks insert governance into the operation stream, enabling mandatory code review equivalents for modeling: e.g., sign-off gates before release, with simulation artifacts attached. SSO/OIDC aligns identity with enterprise providers, while optional customer-managed keys protect sensitive IP, including tessellations-at-rest and rebuild caches. This composition delivers a security-first platform without sacrificing the immediacy required for real-time collaboration.

Latency budgets and interactivity strategies: Targets and previews

Concrete latency budgets

Interactive CAD has tight budgets. Aim for sketch edit round-trips under 80–120 ms p95 to keep constraint tweaks feeling immediate. Feature toggle feedback should land under 250 ms p95, especially for suppress/unsuppress and parameter sweeps; assembly navigation must hold a stable 60+ FPS with modest CPU overhead to leave headroom for picking and snapping. Heavy rebuilds inevitably exceed these targets, so they are relegated to background jobs with optimistic UI hints. These budgets inform architecture at every layer—from protocol priorities to GPU queue configurations—ensuring the system remains responsive where users feel it most.

Optimistic previews for perceived speed

When full computation is expensive, show the user what you can, now. Lightweight optimistic previews display the intended outcome using cached tessellations, parametric approximations, or reprojected sketches while the kernel finalizes exact topology. If the server outcome diverges, reconcile seamlessly by morphing vertex positions and updating persistent anchors, not by tearing down the whole view. Designers maintain flow, and the system earns trust by exhibiting smooth convergence, not jarring corrections.

Latency strategies: Client-side techniques

Local prediction and incremental solving

Local prediction reduces perceived delay by applying speculative edits on the client and reconciling after server commit. Constraint solvers keep warm caches of Jacobians and sparsity patterns, enabling incremental constraint solving where only affected relationships are re-evaluated. Hot-path kernels—distance queries, collision checks, curve/plane intersections—compile to WASM and vectorize to WebAssembly SIMD or WebGPU compute. These client accelerations handle the 10–50 ms interactions that define the feel of sketching and manipulating parts, deferring to server authority for determinism and persistence.

Speculation, LOD, and GPU picking

Speculative evaluation anticipates likely next operations (e.g., dragging a dimension further, adding fillets along a highlighted chain) and performs low-cost precomputations. Adaptive LOD selects tessellation density based on screen coverage and motion; fast-move/slow-stop heuristics reduce detail when panning or rotating, then refine when the camera settles. GPU-assisted picking uses hierarchical depth buffers and ID-encoded render passes to turn selections into O(1) reads, reserving CPU cycles for intent capture. The outcome is a UI that feels directly coupled to user intent, with computation amortized across frames and predicted steps.

Latency strategies: Transport tuning, data locality, and prefetch

QUIC/WebTransport and binary deltas

On the wire, use QUIC/WebTransport to avoid head-of-line blocking and to multiplex model ops, control signals, and assets. Frame coalescing reduces per-message overhead; priority lanes put control and edit ops ahead of tessellation tiles. Binary op encoding with FlatBuffers/Cap’n Proto boosts parse speed and minimizes payload size. Layer in LZ4/Zstd for delta compression, and align server “ticks” to coalesce bursts of edits while keeping jitter low. For lossy networks, nack-and-repair strategies retransmit critical operation frames without stalling progressive mesh streaming.

Edge caches and predictive prefetch

Data locality keeps FPS steady. Edge caches store frequently accessed subassemblies, tessellations, and material textures near users. Interest-based streaming sends high-priority tiles for the current frustum while queuing nearby tiles you’re likely to need after a camera move. Predictive prefetch looks at references, mates, and BOM nav to stage parts you’re about to open. For example:

  • When a designer hovers a mate connector, prefetch constraints and adjacent body tessellations.
  • When browsing material choices, prefetch texture thumbnails and BRDF parameters.
  • When expanding a subassembly, warm the solver with its DOF graph and cached solutions.
These tactics transform perceived speed by ensuring the next click’s data is already waiting.

Latency strategies: Resilience and offline

Local-first queues and CRDT safety

Connectivity is imperfect, but work shouldn’t be. Maintain a local-first operation queue so edits can proceed offline; for non-kernel data like comments and labels, use CRDT-safe structures to guarantee convergence upon reconnect. The client persists the queue durably and compacts redundant operations (e.g., squashing a rapid sequence of dimension drags into the final value) to reduce reconciliation cost. Vector clocks track causality so the server can merge confidently, preserving intent even after long partitions.

Degradation modes and seamless recovery

When the modeling kernel is unavailable, degrade to a view-only mode with selection, measurement, sectioning, and markup intact via read-through caches. Asset fetches continue through CDN backstops, and reconnection restores the full editing surface without discarding speculative visuals. The guiding principle is that failure should bend the experience, not break it; design sessions remain useful, with state continuity preserved from transient network to kernel pool failovers.

Conflict resolution and consistency: Operation semantics and strategies

Intent-level operations and validators

Merging is easiest when operations express intent rather than raw geometric diffs. By modeling edits at the semantic level—“edit dimension D12 to 24.5,” “reorder feature after Fillet3,” “suppress Pattern2”—we anchor merges to stable references and expected outcomes. Type-specific validators assert pre-conditions (e.g., the named feature exists and is unsuppressed) and post-conditions (e.g., regenerated topology maintains persistent IDs). If checks fail, the system proposes remedies such as retargeting a constraint or finding a nearby insertion point that avoids cycles.

Optimism, OT, CRDTs, and causality

We rely on optimistic concurrency for most edits, employing short-lived, per-scope locks only for sensitive phases like exclusive sketch solves. Ordered structures—feature trees and mate sequences—apply OT to preserve relative user intent. CRDT maps/sets manage parameters, annotations, and BOM notes without coordination. Causality tracking with vector clocks or Lamport timestamps and deterministic tie-breakers settles true concurrency (e.g., two users editing the same parameter) in a predictable way. The payoff is deterministic merges that are explainable and reversible, preserving flow without risking corruption.

Conflict resolution: Domain-aware handling of geometry and assemblies

Constraints, reorders, and assemblies

Domain knowledge elevates merge quality. Constraints receive soft/hard rankings so the solver pursues minimal-perturbation solutions; when conflicts arise, a dependency graph explains which constraints forced motion and which ones were relaxed. For feature reorders, cycle detection prevents regressions, and the system proposes safe insertion points with small, visual impact previews. In assemblies, mates are merged via stable references and competing degree-of-freedom edits resolve by energy-based or priority policies. This contextual intelligence avoids brute-force overwrites and respects what matters in a mechanical model: continuity of design intent and stability of motion.

Topology robustness and fallback heuristics

Even with prudent modeling, topological changes can disrupt persistent names. We guard against this with geometric signatures—face/edge invariants, curvature patterns, and adjacency graphs—that stabilize IDs across edits. Where signatures are insufficient, heuristic matching favors proximity and feature lineage, and the UI surfaces candidate retargets rather than leaving annotations dangling. This combination maintains persistent naming so measurements, comments, and PMI survive iterations and merges.

Collaboration UX and governance

Explainable collaboration features

Collaboration is most productive when users can see and understand change. Inline diff of feature operations depicts what shifted and why, while time-travel scrubs the event log to highlight intermediate states. A branch/merge model enables exploratory work without fear; review gates bundle simulations and checks with required approvals. Anchored comments and tasks bind to stable IDs on faces and edges so discussions track the geometry, not the viewport. Presence indicators and live cursors communicate intent before edits collide, reducing reconciliation overhead by aligning human behavior.

Auditability and compliance

Enterprises need confidence and traceability. An immutable audit log chronicles every operation, policy-driven approvals enforce process, and rollback is always safe because state is event-sourced. Provenance exports summarize who changed what and when, including hash-linked assets, so compliance teams can verify release quality. By fusing these capabilities, the platform blends the best of modern code review culture with mechanical design workflows.

Conclusion: Core principles

Architectural north stars

Three principles define high-performance collaborative CAD in the cloud. First, a server-authoritative, event-sourced backbone guarantees integrity and traceable provenance while local prediction ensures the UI remains snappy. Second, intent-level operations coupled with a hybrid OT/CRDT approach make merges deterministic and human-meaningful, accommodating both ordered structures and free-form metadata. Third, clear latency budgets enforced through incremental evaluation and progressive streaming keep interactions fluid despite large models and global teams. These tenets align technology choices—from kernels and transports to caches and GPUs—around user-perceived performance and trustworthy outcomes.

Putting it into practice

In practice, this means instrumenting p95 targets, treating rebuild graphs as first-class artifacts, and resisting the temptation to encode only geometry when intent is the actual currency. It means selecting protocols and binary formats that streamline data movement without boxing you into brittle schemas. It means assuming failure and designing optimistic previews, repair paths, and graceful degradation as foundational features, not afterthoughts. With these principles, the platform naturally evolves toward systems that are fast, understandable, and resilient.

Conclusion: Key trade-offs

Performance, determinism, and isolation

Two vectors dominate trade-offs. Client WASM vs. server kernels balance latency against IP control and resource governance. Client-side kernels unlock instant feedback under poor networks, but server execution centralizes optimization, licensing, and consistency. Strong vs. eventual consistency pits determinism against agility; strict ordering reduces surprises yet slows collaboration for loosely coupled edits. Multi-tenant efficiency collides with isolation: packing kernels achieves density, but single-tenant sandboxes reduce blast radius at higher cost. Finally, rich collaboration surface areas must respect privacy and security constraints; presence and voice can be transformative, but they demand fine-grained controls and auditability.

Guiding heuristics for decisions

Draw boundaries with pragmatism:

  • Place hot-path interactions near the user; place definitive, auditable computation where you can guarantee determinism.
  • Adopt strong consistency for linearizable structures; use eventual models for annotations and metadata.
  • Tune resource isolation to tenant profile; offer premium isolation tiers for sensitive IP and bursty workloads.
  • Expose clear user controls for visibility and data retention; let policy define defaults.
These heuristics keep the system adaptable as models grow and teams diversify.

Conclusion: What’s next

Confidential compute and ML assist

The next wave pushes confidentiality and intelligence. Trusted Execution Environments (TEEs) pave the way for end-to-end encrypted modeling sessions where kernels run in attested enclaves; telemetry stays useful through privacy-preserving aggregation. ML models assist merges by ranking retarget options, predicting safe feature insertion points, and proposing constraint repairs that minimize solve energy. These helpers don’t replace expertise; they amplify it by surfacing high-quality choices at the moment of need.

Open protocols and shared naming

Finally, collaboration must cross vendor boundaries. Open, interoperable protocols for session presence, intent ops, and asset streaming will enable “bring-your-own-tool” workflows. Standardized persistent naming schemes—built on geometric signatures and lineage—can anchor annotations and manufacturing data across toolchains. With these advancements, cloud-native CAD evolves from single-tenant apps into a federated, high-trust workspace that preserves design intent from concept through manufacturing.




Also in Design News