Job Description

About the Role:

As a Site Reliability Engineer on the Embedded team, your role will be crucial in helping us design, scale, and manage our growing AWS-backed services for millions of connected IoT devices. Your expertise in cloud-native and highly elastic service design and scaling practices is going to ensure our growing services, as well as new products and features operate smoothly and without manual intervention to achieve Motive’s strong 99.99% availability SLOs. Leveraging and advancing our robust and fully-codified infrastructure and Kubernetes environment, paired with AWS components that require thoughtful implementations, and of course advanced troubleshooting with teams, you can be a large part of Motive’s growth to the next million devices and beyond.

What You’ll Do:

Collaborate with other engineering and product teams to design and build the infrastructure and services required to deliver new features to customers in a cloud-native and event-driven fashion.
Leverage and progress our IaC (Terraform) and CM (Helm) code and strategies for advanced scaling and self-service usage by engineering teams.
Identify and remove bottlenecks from systems in production throughout AWS services and with our Kubernetes platform.
Ensure 99.99% customer-facing uptime.
Continuously improve the monitoring and alerting capabilities of our platform, enabling us to be proactive instead of reactive.
Be a beacon of information for engineering re: scaling, architecture, and observability through guides, codification, brown bags, and Tech Talks.

What We’re Looking For:

4+ years of professional SRE/DevOps experience, and a demonstrated ability working on high volume production systems
Experience with HPA and other scaling experience with Kubernetes.
Advanced knowledge of AWS services and technologies (ALB/ELB, IAM permissions, DynamoDB, SNS, EKS/Fargate, etc.)
Experience with infrastructure as code and configuration management (Terraform and Helm charts especially) to design and provision new services
Knowledge of Python, Bash or other scripting languages. Knowledge of Ruby or Golang is a plus.
High-level of ownership and drive to work with others and see improvements through to production.

Wikimedia Foundation

Director of Site Reliability Engineering

Director of Site Reliability Engineering We are strengthening the team and looking for a Director of Site Reliability Engineering (SRE) to lead our staff and ensure teams;

admin
engineer
exec

Netlify

Senior Site Reliability Engineer

About the Team: Netlify’s SRE team is on a mission to scale Netlify’s infrastructure to support our next million users. We focus on ensuring application resiliency and de;

admin
engineer
senior

Mmdsmart Ltd

SRE Engineer

Would you like to build a career in a leading multinational IT company?We are looking for SRE Engineer to join our team.Since 2007, MMDSmart has been a leading supplier i;

engineer

Wellhub

Lead Site Reliability Engineer

Your wellbeing matters. Join a company that cares. GET TO KNOW US Wellhub (formerly Gympass*) is a corporate wellness platform that connects employees to the best partner;

engineer
admin
exec

Site Reliability Engineer, Embedded

Job Description

About the Role:

What You’ll Do:

What We’re Looking For:

USA Only

SRE Engineer

23 days ago

Wikimedia Foundation

Director of Site Reliability Engineering

Netlify

Senior Site Reliability Engineer

Mmdsmart Ltd

SRE Engineer

Wellhub

Lead Site Reliability Engineer

Find Remote Jobs

About us