Network Reliability Engineer

Posted 2026-06-26

Remote, USA Full-time Immediate Start

#HPC #AI #GPU #CLUSTERS

YOUR DAILY ROUTINE

Build a large AI infrastructure with monitoring, diagnosis, and remediation of production incidents- Troubleshoot high-impact production issues in collaboration with other engineering teams

Participate in an on-call rotation to handle incidents and ensure service continuity

Implement and maintain observability solutions to monitor AI infrastructure and application health

Contribute to AI infrastructure lifecycle management across different environments and countries

Promote and apply best practices in terms of stability, resiliency, scalability, and security

Maintain clear technical documentation for tools and procedures

Contribute to system and tool evolution based on production feedback

Collaborate closely with development teams to ensure infrastructure readiness- Participate in team rituals and knowledge-sharing initiatives

ABOUT YOU

🎯 SOFTSKILLS :

Proactive and solution-oriented mindset

Passion for automation and continuous improvement

Strong collaboration and communication skills

Ability to work independently and in a team

Willingness to mentor and share knowledge

💻 HARDSKILLS :

Experience with Go or Python

Strong scripting skills (Bash, Python)

Hands-on experience with Linux systems (Ubuntu/Debian)

Preferred hands-on experience with GPU & HPC infrastructure

Knowledge of networking (TCP/IP, DNS, BGP, load-balancing, IPv6, etc.)

Familiarity with monitoring and logging tools (Prometheus, Grafana, Elastic, etc.)

Comfortable with Infrastructure-as-Code (Ansible, Salt, AWX, etc.)

Experience managing relational databases (MariaDB)

Understanding of CI/CD pipelines (GitLab)

Comfortable with English (written and spoken)

\n

\n200 zł - 250 zł an hour
\n

Similar Jobs

Recent Jobs

Vertriebsleiter Deutschland (m/w/d) Futtermittel

Remote Customer Service Representative – Fitness Software Support & Client Success Specialist at careerzynith

Property Coordinator

Data Entry Specialist – Remote Entry‑Level Position with careerzynith – Flexible Hours, Career Growth, and Competitive Compensation

Emergency WPE Coordinator, Global Surge

[Remote] Senior Enterprise Account Manager

Technical Product Manager

Fully Remote Data Entry Specialist – Flexible Hours, Part‑Time & Full‑Time Opportunities at careerzynith

Customer Support Representative – Remote Home‑Based Role for careerzynith (United Kingdom) – Deliver Magical Service & Solutions

[Remote] HR Data and Reporting Analyst

You May Also Like

Health Information Specialist I-Entry Level-Req 7120

[Remote] Senior Enterprise Account Executive - Orange County, CA

AI Security Engineer

[Remote] Senior Account Executive (Healthcare) - Boston

Remote Web Chat Associate – Customer Experience Specialist – Entry‑Level – Flexible Remote Role at careerzynith

Senior Java Software Engineer

Manager, EHS Product Regulatory Compliance

FRACTIONAL SALES CONSULTANT ( Oil & Gas )

Online Part Time Jobs For College Students Using Mobile & Lap Daily Payment ? The EliteJob

[Remote] Industrial/Semiconductor Mechanical Engineer

Back to Job Board