Senior Site Reliability Engineer

  • Zayzoon
  • Verified

Job Description

Who We Are

Our goal is to save ten-million hard-working employees ten-billion dollars. We are a values driven, well-funded, and fast-growing Financial Technology and HR company. We want to empower small and midsize businesses with financial tools that make them the place where people want to work.

We’ve created a financial empowerment platform that helps small but mighty HR teams make a big impact on employee financial wellness.  ZayZoon is quickly becoming the employee financial wellness super-app that employees can’t live without, and employers are clamoring to offer to help attract and retain talent. 

We are growing fast and have been recognized for rapid growth in the 2023 Deloitte Technology Fast 500 and Canadian Technology Fast 50 program! You can read more about it here.

About the Role

We are looking for a Senior Site Reliability Engineer to take ZayZoon’s cloud infrastructure to the next level with complex AWS builds, infrastructure-as-code, and observability/logging/APM solutions. You'll work in an embedded reliability team, alongside app and data engineers, to monitor, benchmark, and scale Zayzoon’s products. You will work with first class technologies and staff to leverage all the goodies AWS has to offer, as well as creating a bridge between our bare metal infrastructure and our Ruby on Rails production app. Predictability, reliability, and scalability are your three favourite words.

  • Develop and maintain infrastructure-as-code CloudFormation templates, emphasizing serverless resources (ECS, Fargate, lambda)
  • Instrumentation and daily metrics analysis of both infrastructure performance and our Ruby on Rails applications, using AWS tooling (Athena, CloudTrail, etc) and third party observability platforms (DataDog, OTel)
  • Manage deployment pipelines, including blue/green and intelligent auto-scaling
  • Maintain and stay ahead of resource dependencies, particularly database (RDS, Redshift), including updates, playbooks, downtime planning
  • Project costs and implement AWS cost savings programs and reserved instances
  • Work alongside our risk and security teams to ensure ongoing SOC-2 and cybersecurity compliances
  • Extensive collaboration with app developers on shared metrics, database performance, load testing
  • Extensive collaboration with data engineers on facilitating data warehouse development, ELT, ETL
  • Participating in our agile development process: sprint planning, story grooming and stand ups
  • Adherence to our SDLC and secure coding practises and environment

Requirements

  • 5+ years infrastructure experience
  • 2+ years AWS experience including certification and deployment of production applications
  • Proficiency with IaC, specifically CloudFormation
  • Experience with containerization (Docker, ECS, ECR)
  • Experience analyzing and acting on performance issues using observability platforms (DataDog, NewRelic, OTel)
  • Has the ability to build quick when we need to experiment and build clean when MVP becomes core functionality
  • Has strong SQL and data analysis skills and an eagerness to dig into data as part of problem solving

Benefits

Candidates must be located in Canada to be considered.

We are organized as a remote team, as such we are looking for candidates who can work effectively remotely. You must have access to a secure high speed internet connection and a secure workspace to ensure security of private information. This role is available on a permanently remote basis.

Please be aware that as part of our final hiring process, we will conduct reference calls with previous managers and possibly other individuals. Additionally, due to the nature of our business, a criminal record check and a basic security clearance will also be required.

We wish to thank all qualified applicants for their interest in joining our team! 

#LI-REMOTE