High-Performance Networking Engineer - Supercomputing
Company: xAI
Location: Palo Alto
Posted on: February 17, 2026
|
|
|
Job Description:
Job Description Job Description About xAI xAI's mission is to
create AI systems that can accurately understand the universe and
aid humanity in its pursuit of knowledge. Our team is small, highly
motivated, and focused on engineering excellence. This organization
is for individuals who appreciate challenging themselves and thrive
on curiosity. We operate with a flat organizational structure. All
employees are expected to be hands-on and to contribute directly to
the company's mission. Leadership is given to those who show
initiative and consistently deliver excellence. Work ethic and
strong prioritization skills are important. All employees are
expected to have strong communication skills. They should be able
to concisely and accurately share knowledge with their teammates.
About the Role High-Performance Networking Engineer on xAI's
Supercomputing team, you will design and optimize low-latency,
high-bandwidth networking solutions using NVIDIA's RDMA-capable
technologies to support some of the world's largest GPU
supercomputing clusters. These clusters drive AI training and
inference workloads, demanding cutting-edge performance and
scalability. Focus Develop and tune RDMA-based communication
systems leveraging NVIDIA GPUs and Mellanox NICs (InfiniBand, RoCE)
for ultra-fast data transfer between nodes. Implement and optimize
GPUDirect RDMA to enable direct memory access between GPUs and
network interfaces, minimizing CPU overhead. Integrate RDMA
solutions with Kubernetes-based workloads, ensuring seamless
operation across distributed compute and storage systems.
Collaborate with AI researchers and infrastructure teams to
accelerate data pipelines and collective communications using NCCL
and MPI. Troubleshoot and resolve performance bottlenecks in
high-throughput, low-latency networking environments. Ideal
Experience Hands-on experience with NVIDIA RDMA technologies (e.g.,
GPUDirect RDMA, RoCE, InfiniBand) in HPC or AI supercomputing
environments. Proficiency in programming with Rust, C, or C++ for
low-level networking and system optimization. Familiarity with
NVIDIA's networking stack, including Mellanox drivers, libraries
(e.g., libibverbs), and tools (e.g., NVPeerMemory). Experience
optimizing distributed systems with MPI, NCCL, or similar
frameworks for GPU-accelerated workloads. Knowledge of Kubernetes
networking and integrating RDMA into containerized environments.
Bonus: Background in AI/ML training workflows and their networking
demands (e.g., large-scale parameter synchronization). Tech Stack
NVIDIA GPUs and Mellanox networking (InfiniBand, RoCE) RDMA
protocols (e.g., GPUDirect RDMA, RoCEv2) Kubernetes Rust and C/C++
MPI (Message Passing Interface) and NCCL (NVIDIA Collective
Communications Library) Annual Salary Range $180,000 - $440,000 USD
Benefits Base salary is just one part of our total rewards package
at xAI, which also includes equity, comprehensive medical, vision,
and dental coverage, access to a 401(k) retirement plan, short &
long-term disability insurance, life insurance, and various other
discounts and perks. xAI is an equal opportunity employer. For
details on data processing, view our Recruitment Privacy
Notice.
Keywords: xAI, Modesto , High-Performance Networking Engineer - Supercomputing, IT / Software / Systems , Palo Alto, California