Intetics Inc., a global technology company providing custom software application development, distributed professional teams, software product quality assessment, and “all-things-digital” solutions, is seeking a highly skilled and experienced Senior DevOps Engineer to join our dynamic team on a full-time basis.
About the Project
A fast-growing tech company is building an infrastructure layer for modern AI workloads — a globally distributed platform that provides scalable, cost-efficient, and reliable access to GPU computing resources.
The platform enables customers to run production-level inference workloads across a diverse network of providers, offering flexibility, performance, and resilience required for real-world AI applications.
Since its launch, the company has demonstrated strong traction, securing a significant Series A investment and achieving multi-million ARR within its first year of operation. As both customer demand and platform scale continue to expand, the team is actively growing its infrastructure capabilities to support the next stage of development.
About the Role
We are looking for a strong SRE / DevOps / Infrastructure Engineer to help scale and operate a distributed AI-focused infrastructure platform.
The system combines a cloud-based control layer (running on AWS, including EKS and managed MySQL) with a large fleet of GPU-powered nodes distributed across multiple external providers. These components are connected via a custom networking layer to ensure high availability and performance for production workloads.
Workloads are orchestrated with Kubernetes, while observability is built around Prometheus, Grafana, Loki, Jaeger, and OpenTelemetry, covering metrics, logging, and tracing across the platform.
While the control layer is relatively lightweight and cloud-native, the GPU infrastructure introduces additional complexity. It spans different providers and environments, often resembling distributed on-premise setups rather than standard cloud infrastructure, requiring a deeper understanding of networking, reliability, and systems behavior at scale.
This is a hands-on role focused on solving real infrastructure challenges across Kubernetes, networking, observability, and production operations.
You will join a small, high-impact infrastructure team (currently a couple of engineers) that is actively growing as the platform and customer base continue to expand. The goal is to strengthen the core infrastructure early and support further scaling.
What you’ll do
Requirements
Nice to have