Have you ever received a text message from your favorite brand with an incredible offer? Did you know that text message marketing delivers the highest ROI of any marketing channel? And that more customers than ever prefer to connect with brands via text? That is what we do at Attentive. We empower the world’s leading brands to engage with their customers at the right moment, with the right message. Our platform powers more than 400 million messages every day, approaching 100 billion annually.
We’re building big things! Check out our tech blog here: https://tech.attentive.com/
About the Role
We’re looking for a self-motivated, highly driven Senior Software Engineer to join our Machine Learning Operations (MLOps) team. As a team, we enable Attentive’s Machine Learning (ML) practice to directly impact Attentive’s AI product suite through the tools to train, inference, and deploy ML models with higher velocity and performance, while maintaining reliability. We build and maintain a foundational ML platform spanning the full ML lifecycle for consumption by ML engineers and data scientists. This is an exciting opportunity to join a rapidly growing MLOps team at the ground floor with the ability to drive and influence the architectural roadmap enabling the entire ML organization at Attentive.
This team and role is responsible for building and operating the ML data, tooling, serving, and inference layers of the ML platform. We are excited to bring on more engineers to continue expanding this stack.
What You'll Accomplish
Unlock offline & real-time access to trillions of data points for our ML and Data Science teams
Manage, expand, and optimize our feature store that enables feature engineering, multi-TB scale training jobs, and offline / real-time inferencing
Support PB scale data operations on the feature store using Apache Spark, Spark Structured Streaming, Kafka, and Ray
Partner with other teams and business stakeholders to deliver ML and AI initiatives
Your Expertise
You have been working in the areas of Data Engineering / MLOps for 5+ years, and have built and matured the pipelines of a PB-scale feature store
You have deep Apache Spark, Spark Streaming, and Ray Data experience and built data pipelines for ML use cases using these tools
You understand the correlation between data cardinality, query plans, configuration settings, and hardware and the impact of each on data pipeline performance
You have led the rollout and operationalization of feature stores such as Tecton or Feast
You understand the key differences between online and offline ML inferences and can voice the critical elements to be successful with each to meet business needs
What We Use
Our infrastructure runs primarily in Kubernetes hosted in AWS’s EKS
Infrastructure tooling includes Istio, Datadog, Terraform, CloudFlare, and Helm
Our backend is Java / Spring Boot microservices, built with Gradle, coupled with things like DynamoDB, Kinesis, AirFlow, Postgres, Planetscale, and Redis, hosted via AWS
Our frontend is built with React and TypeScript, and uses best practices like GraphQL, Storybook, Radix UI, Vite, esbuild, and Playwright
Our automation is driven by custom and open source machine learning models, lots of data and built with Python, Metaflow, HuggingFace 🤗, PyTorch, TensorFlow, and Pandas