TextNow is looking for a new member of our SRE team who will be responsible for our infrastructure, automation, observability, and a few other critical functions.
What You'll Do:
Ensure System Reliability: Design, build, and maintain scalable, resilient, and highly available systems to support TextNow’s infrastructure and services.
Automation & Infrastructure as Code: Develop and maintain automation using Terraform, Ansible, and other tools to enable efficient deployment, scaling, and operations of cloud-based systems (AWS preferred).
Incident Response & On-Call Support: Participate in an on-call rotation, troubleshoot issues, and drive incident resolution to minimize downtime and improve system performance. Conduct post-mortems and implement corrective actions to enhance reliability.
Performance Monitoring & Optimization: Implement and improve observability tools, logging, and monitoring solutions to identify and mitigate potential system issues proactively.
Collaboration & Cross-Team Engagement: Work closely with software engineers, DevOps, and product teams to align technical efforts with business objectives and improve system reliability from development to production.
Continuous Improvement: Identify areas for improvement in architecture, automation, and operational practices. Contribute to the design and implementation of new SRE best practices.
Who You Are:
Experienced in SRE/DevOps: You have 2+ years of experience in an operationally focused role, such as SRE, DevOps, or Infrastructure Engineering, with a deep understanding of reliability, scalability, and performance optimization.
Proficient with Key Technologies: Hands-on experience with AWS, GitHub, Terraform, Ansible, or similar tools to build and manage cloud infrastructure efficiently.
Incident Management Expert: You are comfortable handling production incidents, analyzing root causes, and implementing long-term fixes to prevent recurrence.
Automation & Observability Focused: Passionate about reducing toil through scripting and automation while ensuring robust observability using logging, metrics, and monitoring tools.
Collaborative & Impact-Driven: You enjoy working cross-functionally with engineers, product teams, and leadership to drive meaningful improvements to system reliability.