Pano

Site Reliability Engineer

Job Description

The Role 

For the SRE role, we’re looking for somebody to join our Platform team and bridge the gap between development and infra operations, ensuring the reliability, performance, and availability of software systems through automation, monitoring, and proactive problem-solving.

The person in this role will be responsible for ensuring that the underlying infrastructure is running smoothly and that our systems and tools are working as expected.

At Pano, we strongly believe in team members taking ownership of what they do, and our approach to problem-solving relies heavily upon creativity, communication, and collaboration. 

The ideal candidate is humble, hungry, and people-smart.  They have the wisdom and experience to build mature operational processes for the future but are also comfortable with rolling up their sleeves, writing code, and building systems in a growing startup environment.


What you’ll do
  • Implement and maintain monitoring systems to proactively identify and address potential issues before they impact users.
  • Automate repetitive tasks and processes, such as deployments, infrastructure management, and incident response, to improve efficiency and reduce manual effort.
  • Respond to incidents, diagnose problems, and implement solutions to restore service quickly.
  • Improve the performance and scalability of systems and applications, ensuring they can handle peak loads and user traffic.
  • Help plan future capacity needs, ensuring that systems can accommodate growth and evolving requirements, while remaining cost-efficient.
  • Work closely with development teams to understand their needs, guide them, and ensure that systems are designed and deployed reliably.
  • Build tools to codify and automate infrastructure operations.
  • Define and track SLIs and SLOs to measure the performance and reliability of services.
  • Assess and mitigate risks associated with deployments and infrastructure changes.
  • Assist with the release and deployment processes, ensuring that changes are rolled out smoothly and reliably.

  • What you’ll bring
  • 5+ years of professional experience in a fast-paced SaaS or a similar business environment
  • 3+ years of hands-on experience supporting production systems as a Site Reliability Engineer (SRE) or a DevOps Engineer 
  • 3+ years of hands-on experience with cloud services and technologies (GCP, AWS, Azure, etc.)
  • Experience with containerization and orchestration tools (e.g., Docker, Kubernetes)
  • Proficient in Infrastructure as Code (IaC) tools and methodologies (e.g. Terraform, Pulumi, Puppet, etc.)
  • Proven ability to troubleshoot and resolve complex technical issues in distributed systems
  • Ability to communicate effectively within the team and across the organization while sharing insights and updates and collaborating to achieve project goals

  • Preferred skills:
  • Advanced working knowledge of GCP Services like GKE, GCS, IAM, etc.
  • Professional experience supporting containerized Java/JVM/Python services
  • Experience with relational databases, particularly PostgreSQL
  • 3+ years of professional experience designing, and implementing and/or administering CI/CD solutions (e.g. Github Actions, Buildkite, Jenkins, etc.)
  • Strong SRE mindset with focus on cloud networking and security best practices
  • Strong software development, particularly with scripting languages (e.g., Python, Bash, etc.)
  • Experience with system administration in general and Linux in particular
  • Familiarity with SOC2 / ISO 27001 security frameworks
  • Preference for someone in the Pacific / Mountain time zone