Skip to main content

Blog posts tagged
"AI Factory"


Benjamin Ryzman
9 June 2026

What is RDMA over Converged Ethernet (RoCE)?

AI Networking

Previous articles walked through RDMA (Remote Direct Memory Access) as a programming model and InfiniBand as the fabric that was built around it. Both led to the same conclusion, even if it was never stated outright: moving data, not compute, becomes the bottleneck once systems scale. So what happens when you want RDMA, but you’re ...


Benjamin Ryzman
2 June 2026

What is InfiniBand?

AI Article

When distributed workloads stall because nodes cannot exchange small messages quickly and consistently, the network is the limiting factor. How do you solve that problem? InfiniBand offers one solution. InfiniBand is an interconnect, meaning the end-to-end communication system that links compute, storage, and accelerator nodes. It is impl ...


David Beamonte
11 March 2026

The bare metal problem in AI Factories

MAAS MAAS

As AI platforms grow into large-scale “AI Factories,” the real bottleneck shifts from model design to operational complexity. With expensive GPU accelerators, hardware failures and inconsistent configurations lead directly to lost throughput and reduced return on investment. While Kubernetes orchestrates workloads, it cannot fix broken ph ...