Senior Infrastructure Engineer, Platform

Job Description

About the Attentive Team
Have you ever received a text message from your favorite brand with an incredible offer?  Did you know that text message marketing delivers the highest ROI of any marketing channel?  And that more customers than ever prefer to connect with brands via text?  That is what we do at Attentive.  We empower the world’s leading brands to engage with their customers at the right moment, with the right message. Our platform powers more than 400 million messages every day, approaching 100 billion annually.

We’re building big things!  Check out our tech blog here: https://tech.attentive.com/

About the Role
Our Platform Infrastructure team is the backbone of everything we do at Attentive, providing a resilient and cost-effective platform that seamlessly handles billions of events from over 100 million customers daily. We own everything from compute, persistence, and networking to observability and deployments. Joining our team offers a high-growth career opportunity to collaborate with some of the world’s most talented engineers in a high-performance, high-impact culture.

We’re looking for a self-motivated, highly driven Senior Software Engineer to join our Compute and Network team. We, as a team, provide a compute and networking platform with reusable opinionated cloud components and tools that follow well-defined patterns and narrow interfaces, for engineering teams to use in their solutions, thereby providing leverage in a safe, reliable, and secure manner with guardrails and stable underlying infrastructure.

This team is responsible for building and operating the core compute and networking infrastructure of our microservices architecture here at Attentive, which consists of AWS VPCs, TransitGateways, multi-cluster Kubernetes (EKS), Istio as both a service mesh and ingress solution, and CloudFlare at the edge. We use Terraform and ArgoCD as our tools for Infrastructure as Code.


What You'll Accomplish
  • Demonstrate the ability to analyze, troubleshoot, coordinate, and resolve complex infrastructure issues
  • Orchestrate Kubernetes infrastructure across multiple networks and AWS accounts
  • Manage our Infrastructure as Code Orchestration that allow application teams to consume AWS cloud services through automation and self-serve capabilities
  • Develop automation workflows to improve team efficiency 
  • Analyze and improve efficiency, scalability, and stability of various system resources
  • Partner with other teams and business stakeholders to deliver business initiatives
  • Help onboard new team members, provide mentorship and enable successful ramp up on your team's code bases

  • Your Expertise
  • You have been working in the areas of Site Reliability Engineering / DevOps / Infrastructure for 5+ years, and have an understanding of best practices
  • You have experience building infrastructure in a microservices architecture
  • You have a background in software development and are passionate about bringing that experience to bear on the world of infrastructure
  • You have experience with Infrastructure as Code using Terraform and can’t imagine a world without it
  • You understand the importance of CI/CD in building high-performing teams and have worked with tools like Jenkins, CircleCI, and ArgoCD
  • You are passionate about observability and worked with tools such as Splunk, Nagios, Sensu, Datadog, New Relic
  • You are comfortable with the OSI networking model and have experience debugging or creating solutions that start at layer 7 down to layer 3.
  • You are very familiar with containers and container orchestration and have direct experience with vanilla Docker as well as Kubernetes as both a user and as an administrator.

  • What We Use
  • Our infrastructure runs primarily in Kubernetes hosted in AWS’s EKS
  • Infrastructure tooling includes Istio, Datadog, Terraform, CloudFlare, and Helm
  • Our backend is Java / Spring Boot microservices, built with Gradle, coupled with things like DynamoDB, Kinesis, AirFlow, Postgres, Planetscale, and Redis, hosted via AWS
  • Our frontend is built with React and TypeScript, and uses best practices like GraphQL, Storybook, Radix UI, Vite, esbuild, and PlaywrightOur automation is driven by custom and open source machine learning models, lots of data and built with Python, Metaflow, HuggingFace 🤗, PyTorch, TensorFlow, and Pandas