AI HPC Datacenters — Infra & Orchestration
The layer that turns thousands of GPUs into one reliable supercomputer.
Design and deployment of Kubernetes + Slurm clusters up to 4,000 GPUs on bare-metal and VMs — H100/H200 nodes over 400 Gbps InfiniBand, NVLink/NVSwitch fabric automation, and validation with NCCL, HPL and ClusterKit. Track-specific, fully automated provisioning of orchestrators, networking and hardening.