Job Description

TextNow is looking for a new member of our SRE team who will be responsible for our infrastructure, automation, observability, and a few other critical functions.

What You'll Do:

Ensure System Reliability: Design, build, and maintain scalable, resilient, and highly available systems to support TextNow’s infrastructure and services.

Automation & Infrastructure as Code: Develop and maintain automation using Terraform, Ansible, and other tools to enable efficient deployment, scaling, and operations of cloud-based systems (AWS preferred).

Incident Response & On-Call Support: Participate in an on-call rotation, troubleshoot issues, and drive incident resolution to minimize downtime and improve system performance. Conduct post-mortems and implement corrective actions to enhance reliability.

Performance Monitoring & Optimization: Implement and improve observability tools, logging, and monitoring solutions to identify and mitigate potential system issues proactively.

Collaboration & Cross-Team Engagement: Work closely with software engineers, DevOps, and product teams to align technical efforts with business objectives and improve system reliability from development to production.

Continuous Improvement: Identify areas for improvement in architecture, automation, and operational practices. Contribute to the design and implementation of new SRE best practices.

Who You Are:

Experienced in SRE/DevOps: You have 2+ years of experience in an operationally focused role, such as SRE, DevOps, or Infrastructure Engineering, with a deep understanding of reliability, scalability, and performance optimization.

Proficient with Key Technologies: Hands-on experience with AWS, GitHub, Terraform, Ansible, or similar tools to build and manage cloud infrastructure efficiently.

Incident Management Expert: You are comfortable handling production incidents, analyzing root causes, and implementing long-term fixes to prevent recurrence.

Automation & Observability Focused: Passionate about reducing toil through scripting and automation while ensuring robust observability using logging, metrics, and monitoring tools.

Collaborative & Impact-Driven: You enjoy working cross-functionally with engineers, product teams, and leadership to drive meaningful improvements to system reliability.

Zeller

Site Reliability Engineer (Node.js/Typescript + AWS)

About ZellerWe believe that businesses of all sizes deserve better financial services and payment products. Australian businesses are amazingly entrepreneurial, driven an;

admin
engineer
javascript

Escape Velocity Entertainment Inc

Site Reliability Engineer | North America | Canada | Europe | Fully Remote

What we are looking for:As a Site Reliability Engineer at Escape Velocity, you will be a game maker, enabling the teams to create new ways to enhance experiences in inter;

admin
engineer

Offchain Labs

Site Reliability Engineer

At Offchain Labs, we are not just building products — we’re leading a movement. We are committed to creating a decentralized, secure, and transparent future through ;

admin
engineer

Mozn

Senior Site Reliability Engineer

Mozn is a rapidly growing technology firm revolutionising the field of Artificial Intelligence and Data Science headquartered in Riyadh, Saudi Arabia and it’s working to ;

admin
engineer
senior

Site Reliability Engineer

Job Description

USA Only

SRE Engineer

9 days ago

Zeller

Site Reliability Engineer (Node.js/Typescript + AWS)

Escape Velocity Entertainment Inc

Site Reliability Engineer | North America | Canada | Europe | Fully Remote

Offchain Labs

Site Reliability Engineer

Mozn

Senior Site Reliability Engineer

Find Remote Jobs

About us