Our system handles over 3 million requests per second and billions of data points in the real-time advertising ecosystem. We're seeking a DevOps Engineer to join our growing team.
About The Team:
The Engineering Operations Services team is responsible for hosting and maintaining critical applications for Engineering. Members of the Services team are responsible for understanding how these applications function and are consumed by the organization. Having knowledge to help drive usage and set standards on application usage. Experience with Observability tools, Prometheus, Logging, Application performance metrics, workflow tools and infrastructure used to enable Engineerings capabilities.
Watch our talk at Amazon Tech Talks: https://www.youtube.com/watch?v=lRqu-a4gPuU
StackAdapt is a Remote First company, we are open to candidates located anywhere in the Canada for this position.
What you will be doing:
Optimize critical engineering applications to ensure reliability, scalability and security.
Establish tooling and automation processes for infrastructure as code, service recovery and service monitoring.
Provide guidance and standards on application capabilities and usage.
Develop and maintain infrastructure and security guidelines, procedures, and documentation to streamline operations and promote knowledge sharing.
Implement and enhance observability solutions, incorporating monitoring, logging, and tracing tools to provide actionable insights and improve overall system performance and reliability.
Working with applications that drive the operations and insights for Engineering. Applications for observability, workflows and needed services for various critical functions.
What you’ll bring to the table:
At least 3 years of professional work experience in DevOps & Site Reliability
A strong prior working knowledge of AWS technologies
Familiarity with Kubernetes
Experience with AWS ECS
Experience and knowledge of Grafana observability stack (Tempo, Loki, Prometheus)
Familiarity with establishing KPIs on critical systems and operations
Experience with work automation and workflow automation tools such as Kestra & Temporal
Experience and knowledge of monitoring critical systems, application performance metrics and incident management
Experience with as many of our existing technologies is a large plus: Terraform, Ansible, Vault, Consul, Kubernetes, ArgoCD, Go, Prometheus/Grafana, Packer, Nomad, Redis, Github Actions
StackAdapters enjoy:
Competitive salary + equity
RRSP matching
3 weeks vacation + 3 personal care days + 1 Culture & Belief day + birthdays off
Access to a comprehensive mental health care platform
Full benefits from day one of employment
Work from home reimbursements
Optional global WeWork membership for those who want a change from their home office
Robust training and onboarding program
Coverage and support of personal development initiatives (conferences, courses, etc)
Access to StackAdapt programmatic courses and certifications to support continuous learning