Senior Data Engineer, Machine Learning Operations

Job Description

About the Attentive Team
Have you ever received a text message from your favorite brand with an incredible offer?  Did you know that text message marketing delivers the highest ROI of any marketing channel?  And that more customers than ever prefer to connect with brands via text?  That is what we do at Attentive.  We empower the world’s leading brands to engage with their customers at the right moment, with the right message. Our platform powers more than 400 million messages every day, approaching 100 billion annually.

We’re building big things!  Check out our tech blog here: https://tech.attentive.com/

About the Role
We’re looking for a self-motivated, highly driven Senior Software Engineer to join our Machine Learning Operations (MLOps) team. As a team, we enable Attentive’s Machine Learning (ML) practice to directly impact Attentive’s AI product suite through the tools to train, inference, and deploy ML models with higher velocity and performance, while maintaining reliability. We build and maintain a foundational ML platform spanning the full ML lifecycle for consumption by ML engineers and data scientists. This is an exciting opportunity to join a rapidly growing MLOps team at the ground floor with the ability to drive and influence the architectural roadmap enabling the entire ML organization at Attentive.

This team and role is responsible for building and operating the ML data, tooling, serving, and inference layers of the ML platform. We are excited to bring on more engineers to continue expanding this stack. 


What You'll Accomplish
  • Unlock offline & real-time access to trillions of data points for our ML and Data Science teams
  • Manage, expand, and optimize our feature store that enables feature engineering, multi-TB scale training jobs, and offline / real-time inferencing
  • Support PB scale data operations on the feature store using Apache Spark, Spark Structured Streaming, Kafka, and Ray
  • Partner with other teams and business stakeholders to deliver ML and AI initiatives

  • Your Expertise
  • You have been working in the areas of Data Engineering / MLOps for 5+ years, and have built and matured the pipelines of a PB-scale feature store
  • You have deep Apache Spark, Spark Streaming, and Ray Data experience and built data pipelines for ML use cases using these tools
  • You understand the correlation between data cardinality, query plans, configuration settings, and hardware and the impact of each on data pipeline performance 
  • You have led the rollout and operationalization of feature stores such as Tecton or Feast
  • You understand the key differences between online and offline ML inferences and can voice the critical elements to be successful with each to meet business needs

  • What We Use
  • Our infrastructure runs primarily in Kubernetes hosted in AWS’s EKS
  • Infrastructure tooling includes Istio, Datadog, Terraform, CloudFlare, and Helm
  • Our backend is Java / Spring Boot microservices, built with Gradle, coupled with things like DynamoDB, Kinesis, AirFlow, Postgres, Planetscale, and Redis, hosted via AWS
  • Our frontend is built with React and TypeScript, and uses best practices like GraphQL, Storybook, Radix UI, Vite, esbuild, and Playwright
  • Our automation is driven by custom and open source machine learning models, lots of data and built with Python, Metaflow, HuggingFace 🤗, PyTorch, TensorFlow, and Pandas