Roadie

Lead Site Reliability Engineer

Job Description

Roadie, a UPS Company, is a logistics management and crowdsourced delivery platform. Founded in 2014, Roadie offers businesses fast, flexible and asset-light logistics solutions for last-mile delivery. Roadie enables local delivery to more than 95% of U.S. households by providing access to more than 200,000 independent drivers nationwide – allowing businesses to offer their customers delivery optionality for almost any industry, from airlines to artisans.

Roadie is seeking a Lead Site Reliability Engineer to join our growing Technical Operations Team. We are looking for a leader with a proven track record of managing high-performing SRE teams in high-availability, mission-critical environments. The ideal candidate is a strategic problem solver with deep expertise in site reliability best practices, DevOps principles, AWS and GCP, Kubernetes, and automation. You will play a key role in driving reliability, scalability, and operational excellence across our platform.

What You'll Do

  • Lead and mentor teams focused on enhancing platform reliability, optimizing uptime, and improving software delivery, observability, and infrastructure operation
  • Architect, maintain, and optimize production and non-production Kubernetes clusters (EKS), as well as Elasticsearch (ES), MSK, RDS, and ElastiCache (Redis) clusters
  • Design, deploy, and manage monitoring and logging solutions using Prometheus, Loki, Thanos, Grafana, OpenTelemetry, and New Relic
  • Strategize and collaborate with cross-functional teams to proactively identify bottlenecks, optimize resource utilization, and prevent system failures
  • Define and enforce Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets to drive reliability improvements
  • Automate and streamline operational tasks, reducing toil and increasing efficiency across engineering teams
  • Plan and forecast service capacity and demand, optimize costs, and fine-tune system performance
  • Lead troubleshooting initiatives, post mortems, and resolve production and non-production incidents, ensuring high availability and performance
  • Participate in and manage a 24/7 on-call rotation, responding to incidents and driving post-mortem improvements
  • Willingness to work non-standard hours to facilitate production upgrades or deployments on occasion

Technology We're Using Now

  • Python, Ruby on Rails, Golang
  • React/Redux, Objective-C and Swift, Android
  • Postgres, Redshift, Redis, Kafka
  • AWS/GCP
  • Docker/Kubernetes
  • OpenTelemetry/Prometheus/Thanos/Loki/Grafana/New Relic/Sentry
  • Git/CircleCI
  • ArgoCD

What You Bring

  • 6+ Years in various SRE roles
  • 6+ Years in various DevOPS/System Engineering roles
  • 3+ Years in leading and managing SRE teams
  • 6+ Years of experience building and managing production Kubernetes infrastructure
  • 7+ Years experience with popular scripting languages (Python, Ruby, Bash, etc.)
  • Experience with Infrastructure as code such as Terraform or Crossplane
  • Experience with CI/CD Development tools (CircleCI, etc.)
  • Experience with GitOPS Tools (ArgoCD)
  • Experience using a broad range of AWS technologies (RDS, ElasticSearch, VPC, EKS, S3, CloudFront, MSK, Elasticache, CloudWatch, etc.)
  • Experience developing and maintaining YAML templating systems (Helm charts, Kustomize, etc)
  • Must be able to work independently, be self-motivated and handle multiple priorities
  • Comfortable working in a fast-paced agile environment

Finally, a willingness to admit what you don’t know, and learn what you need to learn quickly.

Why Roadie?

  • Competitive compensation packages
  • 100% covered health insurance premiums for yourself
  • 401k with company match
  • Tuition and student loan repayment assistance (that’s right - Roadie will contribute directly to your existing student loans!)
  • Flexible work schedule with unlimited PTO
  • Monthly 3-day weekends
  • Monthly WFH stipend
  • Paid sabbatical leave- tenured team members are given time to rest, relax, and explore
  • The technology you need to get the job done

This role is not eligible for Visa sponsorship. Applicants must be authorized to work for any employer in the U.S.