Alphax

Senior DevOps / Infrastructure Engineer

  • Alphax
Salary ? Salary range shown is either directly from the job description or estimated based on typical salaries for similar roles in this industry. This estimate aims to give a general idea of the expected compensation for the position.
$150000 - $220000

Job Description

About the Company

We are building distributed observability and multi-cloud intelligence software for modern AI and data-intensive systems. Our platform enables engineering teams to monitor, debug, and optimize workloads operating across regions, cloud providers, and heterogeneous compute environments.

Reliability, performance, and operational clarity are foundational to our product. Infrastructure is not a support layer — it is core to what we deliver.

We are a small, senior engineering team operating in a high-ownership environment where architectural decisions have direct product impact.

The Role

We are hiring a Senior DevOps / Infrastructure Engineer to design, scale, and operate the systems that power our observability platform.

You will own multi-region cloud deployments, high-ingest telemetry pipelines, system reliability, and production observability. This role is suited for engineers who are comfortable working deeply in distributed systems and who want to influence long-term infrastructure strategy.

What You’ll Own

  • Multi-region, multi-cloud infrastructure supporting real-time observability
  • High-throughput ingestion pipelines and event-driven architectures
  • Autoscaling, failover, and fault-tolerant system design
  • CI/CD pipelines and deployment automation
  • Production SLOs, incident response, and reliability engineering
  • Metrics, logs, tracing, and alerting systems
  • Secure, scalable API and service infrastructure
  • Infrastructure-as-code and environment lifecycle management

Key Responsibilities

  • Architect and operate multi-region deployments across AWS, GCP, or Azure
  • Build and maintain high-throughput telemetry ingestion pipelines
  • Design autoscaling and failover strategies for mission-critical services
  • Own observability systems including Prometheus, Grafana, and distributed tracing
  • Improve MTTR and operational readiness processes
  • Manage CI/CD pipelines, GitOps workflows, and automated deployments
  • Collaborate with backend teams on API performance and infrastructure reliability
  • Harden infrastructure for security, compliance, and tenant isolation
  • Drive long-term infrastructure roadmap and architectural direction

Requirements

Required Qualifications

  • Deep experience with Kubernetes, Docker, and container orchestration
  • Strong background in distributed systems and multi-region architectures
  • Experience with high-ingest, streaming, or event-driven systems
  • Hands-on experience with Prometheus, Grafana, and tracing/alerting frameworks
  • Proficiency with Terraform or similar infrastructure-as-code tools
  • Experience building and maintaining CI/CD pipelines
  • Strong understanding of AWS, GCP, or Azure
  • Python or Go scripting for automation and tooling
  • Experience operating high-availability, production-critical systems

Preferred Experience

  • Cloudflare (DNS, CDN, WAF, SSL)
  • Helm, Kustomize, or similar Kubernetes tooling
  • Experience with time-series databases, vector databases, or high-throughput storage systems
  • Background in SRE, platform engineering, or observability tooling
  • Experience supporting AI/ML workloads or GPU-based systems
  • Familiarity with OpenTelemetry, Jaeger, or similar distributed tracing frameworks

Benefits

What We Offer

  • Significant ownership over core infrastructure decisions
  • A senior engineering team with low overhead and direct collaboration
  • Fast-paced environment with measurable impact
  • Competitive compensation and meaningful equity
  • Opportunity to architect infrastructure for a category-defining observability platform