Job Description

The Role

For the SRE role, we’re looking for somebody to join our Platform team and bridge the gap between development and infra operations, ensuring the reliability, performance, and availability of software systems through automation, monitoring, and proactive problem-solving.

The person in this role will be responsible for ensuring that the underlying infrastructure is running smoothly and that our systems and tools are working as expected.

At Pano, we strongly believe in team members taking ownership of what they do, and our approach to problem-solving relies heavily upon creativity, communication, and collaboration.

The ideal candidate is humble, hungry, and people-smart. They have the wisdom and experience to build mature operational processes for the future but are also comfortable with rolling up their sleeves, writing code, and building systems in a growing startup environment.

What you’ll do

Implement and maintain monitoring systems to proactively identify and address potential issues before they impact users.

Automate repetitive tasks and processes, such as deployments, infrastructure management, and incident response, to improve efficiency and reduce manual effort.

Respond to incidents, diagnose problems, and implement solutions to restore service quickly.

Improve the performance and scalability of systems and applications, ensuring they can handle peak loads and user traffic.

Help plan future capacity needs, ensuring that systems can accommodate growth and evolving requirements, while remaining cost-efficient.

Work closely with development teams to understand their needs, guide them, and ensure that systems are designed and deployed reliably.

Build tools to codify and automate infrastructure operations.

Define and track SLIs and SLOs to measure the performance and reliability of services.

Assess and mitigate risks associated with deployments and infrastructure changes.

Assist with the release and deployment processes, ensuring that changes are rolled out smoothly and reliably.

What you’ll bring

5+ years of professional experience in a fast-paced SaaS or a similar business environment

3+ years of hands-on experience supporting production systems as a Site Reliability Engineer (SRE) or a DevOps Engineer

3+ years of hands-on experience with cloud services and technologies (GCP, AWS, Azure, etc.)

Experience with containerization and orchestration tools (e.g., Docker, Kubernetes)

Proficient in Infrastructure as Code (IaC) tools and methodologies (e.g. Terraform, Pulumi, Puppet, etc.)

Proven ability to troubleshoot and resolve complex technical issues in distributed systems

Ability to communicate effectively within the team and across the organization while sharing insights and updates and collaborating to achieve project goals

Preferred skills:

Advanced working knowledge of GCP Services like GKE, GCS, IAM, etc.

Professional experience supporting containerized Java/JVM/Python services

Experience with relational databases, particularly PostgreSQL

3+ years of professional experience designing, and implementing and/or administering CI/CD solutions (e.g. Github Actions, Buildkite, Jenkins, etc.)

Strong SRE mindset with focus on cloud networking and security best practices

Strong software development, particularly with scripting languages (e.g., Python, Bash, etc.)

Experience with system administration in general and Linux in particular

Familiarity with SOC2 / ISO 27001 security frameworks

Preference for someone in the Pacific / Mountain time zone

Senior Site Reliability Engineer

Position Summary:As a Site Reliability Engineer at Echo360, you will play a critical role in ensuring the reliability, scalability, cost, and security of our cloud infras;

admin
engineer
senior

Senior Site Reliability Engineer

Bitwarden empowers enterprises, developers, and individuals to securely store and share sensitive data. With a transparent, open-source approach to password management, s;

admin
engineer
senior