You'll be a part of the SRE team reporting to the SRE Manager and working alongside our other Site Reliability Engineers. You'll work day to day with the Software Engineering team as well as our IT team. You'll be expected to identify, improve, automate and innovate on challenges that are presented with running a large scale production system.
Key Responsibilities
·Managing and maintaining Kubernetes clusters in support of development teams deploying and promoting microservices across environments.
·Maintaining a stable, efficient build pipeline that minimizes the time required to bring code enhancements and new functionality to our customers.
·Building and provisioning resilient processes and infrastructure in Azure.
·Working to improve operations and processes to enable developers to fully own and successfully operate the services they are introducing.
·Automating all the things: to ensure stability, efficiency, and consistent reproducibility across our operating environments.
Qualifications and Skills
·Proficient with Kubernetes and Docker.
·Familiarity with Puppet, Kafka, Redis, MySQL.
·Scripting proficiency in Python/Bash.
·Proficient with working in a Linux server environment.