Job Description

Jobgether has ALL remote jobs globally. We match you to roles where you're most likely to succeed, and provide feedback on every application to help you learn. No more guesswork, application black holes, or recruiter ghosting in your job search.

For one of our clients, we are looking for a Site Reliability Engineer, working remotely from Canada.

As a Site Reliability Engineer (SRE), you will play a key role in designing, implementing, and maintaining scalable infrastructure while ensuring system reliability and efficiency. Your focus will be on automation, performance optimization, and cloud resource management. Collaborating with cross-functional teams, you will streamline CI/CD pipelines, enhance monitoring solutions, and support a highly available infrastructure. This position requires a proactive approach to troubleshooting and continuous improvement, ensuring seamless integration of new services while leveraging the latest SRE best practices.

Accountabilities:

Design, build, and maintain highly scalable cloud infrastructure using Terraform and Terragrunt for automated resource provisioning.
Manage and optimize AWS cloud environments, ensuring security, cost efficiency, and high availability.
Oversee data streaming platforms using Confluent Cloud and Kafka, ensuring reliable data pipelines.
Deploy and manage Redis instances for caching and real-time data processing.
Implement and maintain monitoring and alerting solutions using Prometheus, Grafana, Alert Manager, and OpsGenie.
Enable feature flag management and controlled rollouts with LaunchDarkly.
Manage Kubernetes clusters, utilizing Helm, ArgoCD, Istio, and Kustomize for continuous deployment and infrastructure-as-code practices.
Collaborate with development teams to integrate new services into the infrastructure seamlessly.
Troubleshoot complex system issues to maintain high availability and performance.
Continuously improve automation tools, processes, and methodologies to enhance system scalability.

Requirements

4+ years of experience in Site Reliability Engineering or a similar role with a strong focus on cloud infrastructure.
Expertise in Infrastructure as Code (IaC) using Terraform and Terragrunt.
Deep knowledge of AWS cloud services and best practices for scalable and secure architectures.
Hands-on experience with Confluent Cloud and Kafka for distributed data streaming.
Strong experience with Redis for caching and RDS for data storage.
Proficiency with OpenSearch/ElasticSearch/ChaosSearch for search and analytics.
Advanced knowledge of monitoring tools like Prometheus, Grafana, Alert Manager, and OpsGenie.
Experience with LaunchDarkly for feature flag management.
Extensive experience managing Kubernetes clusters, including Helm for package management, ArgoCD for deployments, and Istio for service mesh configurations.
Familiarity with Kustomize for Kubernetes resource configuration.
Strong problem-solving skills and ability to troubleshoot complex systems in production environments.
Excellent communication and collaboration skills within agile teams.

Nice to Have:

Experience working in multi-cloud environments (AWS, GCP, Azure).
Familiarity with security best practices in cloud and containerized environments.
Knowledge of serverless architectures and CI/CD tools like Jenkins and GitHub Actions.
Some development experience with NodeJS, Python, or GoLang.

Benefits

Competitive salary based on experience and qualifications.
Fully remote work flexibility, with a collaborative team environment.
Comprehensive healthcare coverage, including medical, dental, and vision plans.
Retirement savings plan with company matching.
Flexible paid time off (PTO) to support work-life balance.
Professional development opportunities, including training and certifications.
Access to cutting-edge technology and opportunities to work on innovative projects.

#LI-CL1

Site Reliability Engineer IOE: Cardano

Who are we?IOHK, is a technology company focused on Blockchain research and development. We are renowned for our scientific approach to blockchain development, emphasizin;

admin
engineer

Site Reliability Engineer II

THE CHALLENGE Eventbrite's business continues to grow and scale rapidly, powering millions of events. Event creators and event goers need new tools and technologies that ;

engineer
admin

Site Reliability Engineer | North America | Canada | Europe | Fully Remote

What we are looking for:As a Site Reliability Engineer at Escape Velocity, you will be a game maker, enabling the teams to create new ways to enhance experiences in inter;

admin
engineer

Senior Site Reliability Engineer

Granicus is the leading provider of citizen engagement technologies and services for the public sector, bringing governments closer to the people they serve with the firs;

admin
engineer
senior

Site Reliability Engineer - (Remote - Canada)

Job Description

Canada Only

SRE Engineer

2 months ago

Site Reliability Engineer IOE: Cardano

Site Reliability Engineer II

Site Reliability Engineer | North America | Canada | Europe | Fully Remote

Senior Site Reliability Engineer

Find Remote Jobs

About us