We are looking for a Senior Kubernetes Administrator to architect, operate, and continuously improve mission-critical Kubernetes platforms with strict availability and reliability requirements (99.95%+ uptime SLAs).
This role owns the end-to-end Kubernetes platform, from control-plane architecture and multi-region resilience to security, automation, observability, and operational excellence.
Architect, operate, and scale highly available Kubernetes clusters meeting 99.95%+ uptime SLAs.
Design and manage multi-region, multi-zone Kubernetes clusters, including:
zero-downtime upgrades
blue/green control-plane rollouts
surge-based node rotations
Deeply understand and optimize Kubernetes internals, including:
API Server performance design and tuning based on NFRs
etcd clustering, backup, compaction, and disaster recovery
scheduler optimization and custom scheduling strategies
Lead advanced troubleshooting of:
control-plane failures
node instability
networking and storage issues
complex, multi-layer platform incidents
Act as a senior escalation point for critical production issues.
Operate AKS or native Kubernetes platforms at enterprise scale.
(Nice to have) Experience with bare-metal Kubernetes, including:
Cilium Load Balancer
BGP-based networking
node pool design
Own stateful Kubernetes workloads, including:
StatefulSets
highly available CSI backends
storage performance tuning and I/O optimization
Lead Infrastructure as Code (IaC) and GitOps strategies using:
Terraform
Helm
Kustomize
Implement declarative cluster configuration with Argo CD.
Automate cluster creation, policy enforcement, compliance scanning, and drift remediation.
Build and maintain enterprise-grade CI/CD pipelines with advanced security controls using:
Azure DevOps
GitHub Actions
Integrate CI/CD pipelines with Kubernetes platforms for secure, reliable delivery.
Implement and enforce enterprise-grade Kubernetes security frameworks, including:
Zero Trust architectures
workload identity
mutual TLS
signed container images
Enforce Pod Security Standards (PSS) and NetworkPolicies, including eBPF-based policy tracing.
Integrate Kubernetes clusters with corporate identity systems (OIDC, Azure AD, IAM).
Manage secrets lifecycle using:
HashiCorp Vault
Azure Key Vault
SOPS
Implement admission control using:
OPA Gatekeeper
Kyverno
Own the full monitoring and observability stack, including:
Prometheus
Grafana
Design and operate high-throughput logging pipelines using Loki.
Implement distributed tracing and root-cause analysis with OpenTelemetry.
Define, measure, and report:
Service Level Indicators (SLIs)
Service Level Objectives (SLOs)
error budgets
reliability dashboards
Senior-level experience operating production Kubernetes platforms at enterprise scale
Deep expertise in Kubernetes internals and control-plane architecture
Strong hands-on experience with AKS or native Kubernetes
Advanced knowledge of Kubernetes networking, storage, and security
Proven experience with IaC, GitOps, and CI/CD automation
Strong troubleshooting skills in complex, distributed systems