We are looking for a talented Sr. Site Reliability Engineer to our team!
In this role, you will build and maintain the company's internal platform, driving operational excellence and empowering the entire engineering team. You should have experience in analyzing, proposing, and implementing safer systems and processes. Collaborating closely with engineering squads across platform engineering, you will ensure our applications are both safe and reliable.
DEPARTMENT: Engineering
LOCATION: Anywhere in Brazil - Remote
WHAT YOU GET TO DO:
Collaborate with and support our creative, tight-knit development team
Design, deploy, and operate Loadsmart's critical systems while balancing reliability, cost, and agility
Play a key role in driving reliability projects with engineering teams
Utilize your intuitive problem-solving skills and contagious positive attitude to tackle challenging and exciting issues, inspiring those around you
Collect metrics and understand their business impact, encouraging the team to do the same
Perform troubleshooting and root-cause analysis of system operation issues
Be accountable for the platform's Service Level Agreements and Objectives
Provide infrastructure support during off-hours as needed
Take ownership of software infrastructure projects
Seek, give, and receive constructive feedback through code and specification reviews.
REQUIRED QUALIFICATIONS:
Over 5 years of experience in Cloud Computing, SRE/DevOps
Proficient in English communication (both written and spoken) to collaborate in an international team with native and non-native English speakers
Detail-oriented with high initiative and self-motivation
Strong understanding of software engineering principles and how systems work under the hood
In-depth knowledge of modern networking and operating systems
Proficiency in AWS, cloud environments, containers, Kubernetes, Docker, and DevOps engineering, including managing tests and CI/CD pipelines
Familiarity with automation tools and provisioners like Terraform, Ansible, or Chef
Solid troubleshooting and system engineering experience in UNIX/Linux production environments
Experience with monitoring, alerting, and incident management
Proficiency in automating tasks with scripting languages like Python, Bash, etc
Experience or exposure to PostgreSQL and DBA responsibilities is a plus.