Job Description

This role reports directly to the Director of Reliability

Why Are We Hiring

Being a professional is hard. Not only do you need to be an expert in your field but you then need to worry about how you communicate, manage your time, store and share information, keep your data secure, make your tooling reliable, and a million other details. Filevine dreams of a day when all of these details are taken care of so that professionals can get back to focusing on what they love.

We are building our team because we recognize that to make good on that dream we need the help of amazingly talented people like you.

About the Team and Job

To achieve the dream of allowing professionals to focus on what they love Filevine products need key features. They need to be reliable, scalable, performant, cost effective, secure and they need to have a way to recover in the event of a disaster.

The Reliability team is responsible for thinking through these problems and engineering solutions to them. We hire excellent engineers who apply software engineering to these problems to create autonomous systems that take care of these details for us. We use the principle of continuous improvement to make each iteration of these autonomous systems better than the last.

The state of our autonomous systems are nascent with the foundational pieces either recently having been completed or currently under development.

As a Site Reliability Engineer you will be embedded with a cross functional team who has key responsibilities for certain portions of our systems. Over the course of the first year you will gain the valuable context needed to be truly effective and move at speed in the Filevine environment. During your successive years you will be given specific mission critical objectives that help build out and improve our autonomous systems and simultaneously build out your personal brand as an exceptional engineer who has built and maintained amazing systems that can grow to internet scale.

Qualifications

Curiosity, a willingness to learn, a passion to continually improve, and unbridled enthusiasm to make things better everyday without the need to be directed to do so

Proficiency in all of the skills expected of our SRE II's

A bachelors degree in computer science, information systems, a related field; comparable certifications; or equivalent direct work experience

A minimum of 8 years of experience in hands on technical roles

A minimum of 2 years of Site Reliability Engineering experience

Experience building autonomous systems that manage software operational details without human intervention

Preferred Qualifications

M.S. in computer science, information systems, a related field; comparable certifications; or equivalent direct work experience

2-6 years of Site Reliability Engineering Experience

Experience developing, deploying, and maintaining internet scale applications

Experience incorporating Artificial Intelligence or Machine Learning into internet scale applications

Responsibilities

Provide leadership, mentoring, and excellent judgement by being responsible for:

Developing autonomous systems that manage the details necessary to build, deploy, test, and operate all Filevine Inc. products

Being the voice of Reliability on your team throughout the SDLC

Collecting, monitoring, aggregating, dashboarding, and alerting on software and server events

Improving the CI/CD pipeline

Developing playbooks, tools, and scripts to streamline processes and shorten problem resolution time

Identifying and fixing gaps in the availability of systems

Improving and defending the security of software and systems

Documenting and diagramming processes, procedures, and best practices

Finding, learning, improving, or creating new tools that are reliable, usable, and helpful to enable other engineers to perform their work more efficiently

Work within assigned team to complete duties as assigned, while mentoring, training, and reviewing more junior engineers.

Work either individually or in conjunction with other engineers to complete assignments

Be part of an on-call rotation with other team members to provide 24/7/365 production reliability support

Be part of an on-call rotation with other team members to provide escalated emergency support for the services your team owns

Communicate frequently, clearly, and effectively with various technical and management audiences

Sr Site Reliability Engineer, Cloud

Who we are Kentik is the network observability company. Our platform is a must-have for the network front line, whether digital business, corporate IT, or service provide;

admin
engineer
cloud

Sr Site Reliability Engineer

Come and impact millions of Brazilians!!Want to make a difference in the lives of millions of Brazilians? At RecargaPay, we create accessible and innovative financia;

admin
engineer

Senior Site Reliability Engineer

Are you passionate about ensuring the seamless operation of large-scale, distributed, and robust systems? Do you thrive on optimizing performance, increasing reliability,;

admin
engineer
senior
python