Job Description

Role Summary:

Filevine is hiring a VP of Engineering, Reliability to lead one of the most critical functions in our engineering organization. This leader will own the strategy, people, operations, and outcomes for the teams responsible for infrastructure, site reliability, database engineering, observability, and incident management across Filevine's platform.

This is not a maintenance role. We are looking for a leader who will assess our current reliability posture and operating model with fresh eyes, define a forward-looking vision for how reliability engineering should work at Filevine, and execute that vision at the pace of an AI business. The right candidate has led Reliability organizations through similar inflection points, brings strong convictions about what "good" looks like at scale, and has the operational credibility and executive presence to drive meaningful change.

Responsibilities

Strategic Vision: Define and execute the reliability engineering roadmap, aligning infrastructure and AI-native architecture with Filevine’s enterprise growth and platform modernization.

Operating Model Evolution: Balance centralized platform capabilities with distributed ownership, ensuring the reliability model scales across a diversifying technology portfolio.

Performance Frameworks: Establish and manage SLO/SLI/error budget frameworks to create a shared language for balancing feature velocity with system stability.

Efficiency & Planning: Lead infrastructure cost management (optimization and forecasting), capacity planning, and disaster recovery to meet rigorous enterprise contractual commitments.

Organizational Development: Lead and scale a multi-disciplinary organization (DevOps, SRE, DBRE, Tooling), fostering a culture of ownership, high craftsmanship, and clear career growth.

Operational Excellence: Drive continuous improvement through DORA metrics, incident trend analysis, and systematic toil reduction to enhance service availability and deployment health.

Developer Empowerment: Delivery of self-service tooling, guardrails, and documentation that allow feature teams to operate their own services effectively without bottlenecks.

Security & Compliance: Act as the primary engineering interface for the CISO to advance compliance posture (FedRAMP, SOC 2, CJIS, ISO) and translate security needs into pragmatic action.

Executive Partnership: Collaborate with the CTO, CPO, and Architect to communicate risks and investment needs, positioning reliability as a key enabler for enterprise go-to-market success.

Qualifications

Extensive Leadership: 15+ years of engineering experience, with 7+ years specifically leading infrastructure, reliability, or platform teams at scale in product-driven companies.

Organizational Scale: Proven track record managing organizations of 40+ engineers across SRE, DevOps, and Tooling, including developing multiple layers of management.

Strategic Evolution: Demonstrated experience evolving reliability operating models to meet the shifting needs of a scaling business.

High-Trust Environments: Deep expertise operating in regulated sectors (Legal Tech, Fintech, Gov, or Healthcare) where compliance and data sensitivity are primary constraints.

SRE Mastery: Practical, production-hardened understanding of SRE principles, including SLOs, error budgets, toil reduction, and incident management.

Cloud-Native Fluency: Strong technical command of AWS, container orchestration, Terraform (IaC), CI/CD, and modern observability stacks.

Financial & Resource Stewardship: Direct experience owning cloud infrastructure budgets and successfully driving meaningful cost optimization and forecasting.

AI/ML Infrastructure: Familiarity with the reliability requirements for modern AI workloads, such as model serving, vector search, and data pipeline integrity.

Executive Presence: Ability to engage the C-suite on risk trade-offs and transformation progress with a "builder mentality" that thrives on solving complex, high-stakes problems.

What You'll Be Working On

Transforming the Reliability Operating Model: You will assess Filevine's current reliability posture with fresh eyes and define what "good" looks like for our next stage of growth. That means redesigning how the reliability organization operates — clarifying ownership boundaries, reducing toil, and building the self-service platforms that allow feature engineering teams to own their services with confidence. You will move us from a model where reliability is a bottleneck to one where it is a true force multiplier.

Building and Leading a High-Performing Team: You will develop the people and leadership bench across DevOps, SRE, DBRE, and Tooling. That means investing in your managers and tech leads, establishing clear career paths, and building a culture where reliability engineers take genuine pride in their craft. You will make smart decisions about where headcount creates leverage and where structural or tooling improvements are the better investment.

Establishing SLOs and a Reliability-Velocity Framework: You will introduce SLOs, SLIs, and error budgets as a shared language across engineering and product — giving teams a principled way to make trade-off decisions between shipping fast and staying stable. This isn't about slowing things down; it's about making risk visible and giving teams the tools to make informed decisions.

Owning Cloud Infrastructure Cost and Capacity: You will turn cloud cost management into an active discipline — driving optimization, building forecasting rigor, and creating real accountability across engineering. You will also lead capacity planning and disaster recovery strategy, ensuring Filevine can meet the contractual and operational expectations of enterprise customers.

Partnering on Security and Compliance: You will serve as the primary engineering interface with the CISO, translating compliance requirements across FedRAMP, SOC 2, CJIS, ISO, and other frameworks into pragmatic engineering decisions. You will bring credibility and clear judgment to risk trade-off conversations at the executive level — helping the business invest in the right places and manage risk proportionally.

Enabling Filevine's AI-Native Platform: As Filevine transitions to an AI-native architecture, you will ensure our infrastructure evolves to meet it — including the reliability patterns, failure modes, and scaling demands introduced by AI/ML workloads, vector search, and agentic systems. You will work closely with the Reliability Architect and Platform leadership to make reliability a foundation for LOIS, not an afterthought.

Senior Engineering Manager - Platform Metal | Sweden | Remote

Grafana Labs is a remote-first, open-source powerhouse. There are more than 20M users of Grafana, the open source visualization tool, around the globe, monitoring everyth

engineer
exec

Solutions Engineer 3

As a Solutions Engineer, you will work closely with the sales team in both demonstrating the Restaurant365 platform and configuring products for other users. You will hav

engineer

Quality Engineering Lead

Quality Engineering LeadWho are we?Moneyhub empowers financial services firms with complete, detailed, and real-time insight into their customers’ financial needs and sta

Lead
engineer
exec

Senior Network & AI Solutions Architect

Designation : Senior Network & AI Solutions ArchitectJob Location : Remote (India)Key Responsibilities · Provide technical and architectural leadership to engi

Senior
architecture

VP of Engineering, Reliability

Job Description

USA Only

Software development

10 hours ago

C-Level

Senior Engineering Manager - Platform Metal | Sweden | Remote

Solutions Engineer 3

Quality Engineering Lead

Senior Network & AI Solutions Architect

Find Remote Jobs

About us