Lead Site Reliability Engineer

  • Vitesse Psp
  • Verified

Job Description

We are Vitesse – the treasury and payment partner of choice for insurance. 

Formed in 2014 by a team of proven FinTech entrepreneurs, we are an FCA-regulated business providing global claim funds management and payment solutions. Operating one of the largest banking and payment settlement networks in the world, we give our customers direct access to 200 countries and currencies. Through a single integration, insurers can use this network to pay claims in as fast as 45 seconds and deliver a superior claimant experience. Our market-leading treasury proposition provides insurers with transparency and control over their claim funds, even when delegated to third-parties, allowing them to have their money in the right place, at the right time, to make that all-important payment when customers need it most.

With over 175 employees across our London headquarters, Europe, and the US, $93m Series C funding secured, and exceeding £10bn in processed transactions, we are only just getting started.

We are collaborative, customer centric and work with integrity, whilst partnering with some of the biggest insurance leaders including Lloyd’s of London and Many Pets. We take huge pride in our company culture, ensuring that everyone has a part to play, an opportunity to be heard, be involved, and the ability to make a real difference.  As we continue to scale up, we want like-minded humans to join us on this exciting journey. Are you ready? 

 

The Role:

The Lead SRE is responsible for ensuring the reliability, scalability, and operational excellence of our infrastructure and services. This is a hands-on engineering role, requiring deep technical expertise, leadership, and a commitment to continuous improvement. The Lead SRE must balance technical delivery with a strong focus on team productivity, performance measurement, and collaboration across squads and stakeholders.

The role requires close collaboration with engineering leads, business stakeholders, and the Head of Platform Operations to define and uphold SLAs, SLOs, and error budgets, ensuring alignment with business priorities. Communication is central, both in technical leadership within the team and in ensuring clear, proactive dialogue across teams and stakeholders.

A strong emphasis is placed on observability, ensuring systems are well-instrumented, reliable, and continuously improving in line with agile principles of Transparency, Inspection, and Adaptation (TIA).

Core responsibilities:

  • Hands-On Engineering & Technical Leadership
  • Design, develop, and maintain cloud infrastructure (Azure/AWS) using Terraform and automation.
  • Lead troubleshooting, performance optimisation, and incident resolution to enhance reliability.
  • Ensure best practices in CI/CD pipelines, observability, and infrastructure deployment.
  • Set high engineering standards and provide mentorship to team members.
  • Observability as a Core Practice
  • Drive observability across all critical systems, ensuring real-time visibility into operations.
  • Promote Transparency, Inspection, and Adaptation by making both system and team health data accessible and actionable.
  • Continuously improve monitoring, logging, and tracing strategies to support data-driven decisions.
  • Cross-Team Collaboration, Strategy & Technical Vision
  • Think strategically and see the bigger picture, ensuring solutions align with both immediate technical needs and long-term business objectives.
  • Work with engineering leads, business stakeholders, and the Head of Platform Operations to define and enforce SLAs, SLOs, and engineering standards that support scalability, reliability, and operational efficiency.
  • Design solutions with a systems-thinking approach, ensuring infrastructure, observability, and automation strategies support sustainable growth.
  • Improve deployment pipelines, automation, and operational workflows across squads, fostering consistency and best practices.
  • Support capacity planning, scalability, and security best practices, proactively identifying risks and opportunities to enhance platform resilience.
  • Team Productivity, Performance & Agile Ways of Working
  • Ensure clear visibility of ongoing work, technical debt, and team progress.
  • Define and track key engineering health metrics to measure and improve team effectiveness.
  • Foster a culture of continuous improvement, driving agile practices, backlog refinement, and retrospectives.
  • Embed blameless learning to improve reliability and efficiency across the team.
  • Incident Response, Risk Management & Reliability
  • ·Participate in the incident response process, working with the service management, ensuring delivery of structured post-mortems and continuous learning.
  • Improve detection, response times, and resolution processes to minimise downtime.
  • Identify recurring failure patterns and implement proactive risk mitigation strategies.
  • Define and enforce SLAs, SLOs, and error budgets, working closely with engineering leads, business stakeholders, and the Head of Platform Operations.

Requirements

  • Proven leadership experience in technical teams, with a focus on mentoring, professional development, and fostering a culture of innovation, reliability, and engineering excellence.
  • Strategic mindset, able to align technical initiatives with business goals, drive scalability and performance improvements, and proactively tackle complex challenges.
  • Proven experience in Site Reliability Engineering, DevOps, or Systems Engineering, with hands-on experience in both Azure and AWS environments.
  • Demonstrable expertise in high-performance, scalable, and highly available systems, with experience in optimising reliability, capacity planning, and system performance.
  • ·Strong understanding of regulatory and security requirements, such as ISO 27001, PCI DSS, CE+ and SOX, with experience implementing compliance-driven engineering practices.
  • Deep expertise in DevOps principles, including automation, infrastructure as code (Terraform, Ansible, or Chef), GitOps workflows, CI/CD best practices (GitHub Actions, GitLab CI/CD, Azure DevOps), and collaborative ways of working.
  • Strong background in containerisation (Docker) and orchestration (Kubernetes), with a focus on scalability and resilience.
  • Hands-on experience with monitoring, observability, and incident management tools (Prometheus, Grafana, ELK, Azure Monitor, Application Insights, Kusto) and a data-driven approach to improving system reliability.
  • Strong networking and security knowledge, including cloud security best practices, identity management, and access controls.
  • Experience in recruiting and scaling teams, driving engineering hiring decisions, shaping team culture, and mentoring engineers.
  • Advocate for modern DevOps and SRE best practices, championing collaboration, transparency, automation, continuous learning, and continuous improvement across teams.
  • Excellent communication skills, able to engage stakeholders, collaborate cross-functionally, and drive alignment on reliability and operational priorities.
  • Adaptability and resilience, comfortable working in fast-paced environments, handling incidents, and participating in on-call rotations.

Benefits

    • 25 days Holiday per year (increasing by 1 day per years' service, up to 30 days) + Bank Holidays  
    • Hybrid working arrangements – minimum 2 days in the office, Tuesday - Thursday 
    • Contributory pension scheme  
    • Enhanced Parental leave   
    • Cycle to Work Scheme  
    • Private Medical Insurance with AXA 
    • Unlimited access to therapy sessions through our partner, Oliva   
    • Discounted Gym membership through Gympass 
    • Financial Coaching with Octopus Wealth  
    • 2 days of volunteering leave per year  
    • Sabbatical after 5 years’ service   
    • Life Assurance - MetLife (UK employees only)
    • Ongoing Learning and Development to support you reach your career goals  

We are Vitesse – the payment provider of choice for the insurance and treasury industry.  

Formed in 2014 by a team of proven FinTech entrepreneurs, we are an FCA regulated payments business that is driven to be the payment partner of choice for the insurance market, by providing global payment services and treasury optimisation.  Operating one of the largest domestic banking and payment settlement networks in the world, we give our customers direct access to more than 170 countries and territories, covering over 110 currencies. Through a single integration, insurers can use this network to pay claims in as fast as 45 seconds, delivering a better customer experience to their claimants.  Our market-leading treasury optimisation service brings complete control and transparency to insurers and allows them to have their money in the right place, at the right time, to make that all important payment - fast, and when their customers need it most.   

With now over 160 employees across Europe and our London headquarters, $26m series B funding in 2022 in the bag and approaching £8bn in processed transactions, we are only just getting started.     

We are collaborative, customer centric and work with integrity, whilst partnering with some of the biggest insurance leaders including Lloyd’s of London and Many Pets. We take huge pride in our company culture, ensuring that everyone has a part to play, an opportunity to be heard, be involved, and the ability to make a real difference.   

As we continue to scale up, we want like-minded humans to join us on this exciting journey.  Are you ready?  

Vitesse at our best – our values 

The Vitesse values are a true reflection of what it takes to thrive in our business, so it’s important to us that any employee who joins our business is aligned with these 3 attributes 

Confident Humility 

We don’t do ego and we know that unless we all win, none of us win. We admit when we’re wrong, ask for help and always think about the wider business before ourselves.

Driven to Succeed 

We see the opportunity ahead of us and we won’t stop until we fulfil the potential we know we have. We hold ourselves to high standards and deliver high quality outcomes for Vitesse and our customers.  

Tenacious Responsibility 

We take ownership for our actions and decisions, and face into the challenges that come our way. We are committed to seeing things through to completion, even in the face of adversity. 

We are an Equal Opportunity Employer  We are committed to creating an inclusive environment that enables everyone to perform at their best, where we recognise the rights of all individuals to mutual respect and where there is an unbiased acceptance of others. Our policies and practices aim to promote an environment that is free from all forms of Unfair discrimination and values the diversity of all people. At the heart of our policy, we seek to treat people fairly and with dignity and respect.