Job Description
Zipdev is looking to add a remote Site Reliability Engineer to its team of LatAm developers! As a Site Reliability Engineer, you will work as an integrated member of product teams to help build, deploy and reliably monitor cloud services. You will work on complex software development projects to keep important, revenue-critical services up. You will actively develop code and build frameworks to monitor the services deployed in production to drive reliability and performance across a massive scale.
We're looking for a talented Site Reliability Engineer who can work under minimal supervision, define test procedures, and collaborate closely with Developers, Designers, Customer Support, and Engineering Leadership.
What you will do:
- Build systems and infrastructure to monitor complex, large-scale distributed systems
- Identify stability/performance issues and collaborate with developers to triage critical issues in production systems.
- Represent the SRE organization in design reviews and operational readiness exercises for new and existing services
- Devise ways to actively monitor system throughput, capacity and reliability.
- Ability to debug complex systems and evolve a running environment without downtime.
- Engage in service capacity planning and demand forecasting, software performance analysis and system tuning.
- Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization.
Requirements
- Bachelor’s degree in Computer Science or equivalent work experience as System Administrator with programming skills.
- 3 -4 years of proven professional experience as a Site Reliability Engineer.
- Experience with one or more general-purpose programming/scripting languages including but not limited to: Python, Bash, Perl or Go.
- Fundamental knowledge of technologies across a broad range of disciplines: virtualization storage, networking, server, and security
- Understanding of systems and application design, including the operational trade-offs of various designs.
- Demonstrable knowledge of Unix, TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures.
- Experience in analyzing logs and troubleshooting large-scale distributed systems.
- Excellent organization, time management, and communication skills.
- Currently living in Latin America.
Nice to have:
- Experience with instrumenting and monitoring production systems (ELK stack, Zabbix, Nagios, Statsd/Graphite, APM, etc.)
- Experience with Amazon AWS Infrastructure (EC2, S3, VPC, Security Groups, RDS) and related services desired
- A working understanding of Docker, Vagrant, Ansible/Chef/Puppet.
Our Recruitment Process
- Video Interview
- 20-minute take-home skills test
- 30-minute Call with the Recruiter (project, benefits, etc.)
- Interviews directly with the client (depending on the project the # of interviews may vary, this may include an assessment)
- Final Offer
Benefits
- Work remotely Monday - Friday, 40 hours a week (no weekends)
- Vacation: 10 business days a year
- Holidays: 5 National Holidays a year
- Company Holidays: 5 Company Holidays a year (Christmas Eve, Christmas Day, New Year's Eve, New Year's Day, Zipdev Day)
- Parental Leave
- Health Care Reimbursement
- Active Lifestyle Reimbursement
- Quarterly Home Office Reimbursement
- Payroll Deduction Purchase Plans
- Longevity Bonus
- Continuous Learning Bonus
- Access to Training and Professional Development Platforms
- Did we mention it's REMOTE?!!
One of our core values at Zipdev is "Be authentic." that's why we encourage you to answer the application form in your own words; we are interested in getting to know you, not a digital assistant.
Wondering how our remote environment or our payment method work? We've put together some helpful answers in our FAQs at the bottom our our career site. Take a look and let us know if you have any other questions!