Network Engineer

Posted 2026-06-26
Remote, USA Full-time Immediate Start

About the role
Own the network powering large-scale AI training and inference. Your expertise spans hardware and software including rack design, fabric topology, thermal management, collective communications, telemetry, and performance across thousands of compute nodes.

What you'll do
Design rack layouts and cluster topologies

Inform thermal and power planning with facilities teams

Architect the network fabric

Tune collective communications for training and inference workloads

Build telemetry and observability

Drive performance, benchmarking, and incident response

What you'll need
BS in CS, EE, or related field, or equivalent production experience

Rack-to-cluster deployment experience: cabling, power, and thermal management

Expertise in network fabric architecture across RoCEv2 and InfiniBand: routing, load balancing, topology, queueing, congestion control, and QoS

Experience tuning collective communications: NCCL, RCCL, or vendor-specific libraries

Experience with network telemetry: NIC counters, switch streaming (gNMI, sFlow, INT), and collective-level metrics

Experience operating large-scale GPU or ASIC clusters in production

What we offer
Top-tier compensation structured to recognize and retain the best talent

Meaningful equity

Comprehensive medical, dental, vision, life, and disability insurance

Parental leave for all new parents, including adoptive and surrogate journeys

Flexible PTO

Paid Holidays

Relocation support

Equal Employment Opportunity
We're an Equal Opportunity Employer and do not discriminate on the basis of any protected status under applicable law.

Similar Jobs

Back to Job Board