Job Description

About the Role

We’re looking for a self-motivated, highly driven Software Engineer II to join our Machine Learning Platform (MLOps) team. As a team, we enable Attentive’s Machine Learning (ML) practice to directly impact Attentive’s AI product suite through the tools to train, inference, and deploy ML models with higher velocity and performance, while maintaining reliability. We build and maintain a foundational ML platform spanning the full ML lifecycle for consumption by ML engineers and data scientists. This is an exciting opportunity to join a rapidly growing ML Platform team at the ground floor with the ability to drive and influence the architectural roadmap enabling the entire ML organization at Attentive.

This team and role is responsible for building and operating the ML data, tooling, serving, and inference layers of the ML platform. We are excited to bring on more engineers to continue expanding this stack.

What You'll Accomplish

Expand, mature, and optimize our ML platform built around cutting edge tooling like Ray, MLFlow, Argo, and Kubernetes to support traditional and deep learning ML models

Build and mature capabilities to support CPU / GPU clusters, model performance monitoring, drift detection, automated roll-outs, and improved developer experience

Build, operate, and maintain a low-latency, high volume ML serving layer covering both online and batch inference use cases

Orchestrate Kubernetes and ML training / inference infrastructure exposed as an ML platform

Expose and manage environments, interfaces, and workflows to enable ML engineers to develop, build, and test ML models and services

Close the latency gap on model inference to online, real-time model serving

Develop automation workflows to improve team efficiency and ML stability

Analyze and improve efficiency, scalability, and stability of various system resources

Partner with other teams and business stakeholders to deliver business initiatives

Help onboard new team members, provide mentorship and enable successful ramp up on your team's code bases

About you

You have been working in the areas of MLOps / Platform Engineering / DevOps / Infrastructure for 3+ years, and have an understanding of gold standard practices and best in class tooling for ML

Your passion is exposing platform capabilities through interfaces that enable high performance ML practices, rather than designing ML experiments (this team does not directly develop ML models)

You understand the key differences between online and offline ML inferences and can voice the critical elements to be successful with each to meet business needs

You have experience building infrastructure for an ML platform and managing CPU and GPU compute

You have a background in software development and are passionate about bringing that experience to bear on the world of ML infrastructure

You have experience with Infrastructure as Code using Terraform and can’t imagine a world without it

You understand the importance of CI/CD in building high-performing teams and have worked with tools like Jenkins, CircleCI, Argo Workflows, and ArgoCD

You are passionate about observability and worked with tools such as Splunk, Nagios, Sensu, Datadog, New Relic

You are very familiar with containers and container orchestration and have direct experience with vanilla Docker as well as Kubernetes as both a user and as an administrator

Your Expertise

You have been working in the areas of ML Platform / MLOps / Platform Engineering / DevOps / Infrastructure for 3+ years, and have an understanding of gold standard practices and best in class tooling for ML

Your passion is exposing platform capabilities through interfaces that enable high performance ML practices, rather than designing ML experiments (this team does not directly develop ML models)

You understand the key differences between online and offline ML inferences and can voice the critical elements to be successful with each to meet business needs

You have experience building infrastructure for an ML platform and managing CPU and GPU compute

You have a background in software development and are passionate about bringing that experience to bear on the world of ML infrastructure

You have experience with Infrastructure as Code using Terraform and can’t imagine a world without itYou understand the importance of CI/CD in building high-performing teams and have worked with tools like Jenkins, CircleCI, Argo Workflows, and ArgoCD

You are passionate about observability and worked with tools such as Splunk, Nagios, Sensu, Datadog, New RelicYou are very familiar with containers and container orchestration and have direct experience with vanilla Docker as well as Kubernetes as both a user and as an administrator.

What We Use

Our infrastructure runs primarily in Kubernetes hosted in AWS’s EKS

Infrastructure tooling includes Istio, Datadog, Terraform, CloudFlare, and Helm

Our backend is Java / Spring Boot microservices, built with Gradle, coupled with things like DynamoDB, Kinesis, AirFlow, Postgres, Planetscale, and Redis, hosted via AWS

Our frontend is built with React and TypeScript, and uses best practices like GraphQL, Storybook, Radix UI, Vite, esbuild, and Playwright

Our automation is driven by custom and open source machine learning models, lots of data and built with Python, Metaflow, HuggingFace 🤗, PyTorch, TensorFlow, and Pandas

Back End Developer

Remote, Full Time, Individual Contributor, +4 years of experienceWho We AreAt Yuno, we are building the payment infrastructure that enables all companies to participate i;

dev
front end
backend

Principal AI Engineer, United Kingdom

BJAK is an internet company with deep expertise in automation, having built Southeast Asia's largest insurance aggregator. Leveraging our strength in advanced browser aut;

engineer

Senior Frontend Engineer - (Remote - India)

About JobgetherJobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-dr;

front end
typescript
javascript
senior
engineer

Engineering Manager - Growth - (Remote - Canada)

About JobgetherJobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-dr;

exec
marketing
engineer

Software Engineer II, Machine Learning Platform

Job Description

USA Only

Software development

4 days ago

Back End Developer

Principal AI Engineer, United Kingdom

Senior Frontend Engineer - (Remote - India)

Engineering Manager - Growth - (Remote - Canada)

Find Remote Jobs

About us