Caseware

Senior AI Software Developer in Test

Job Description

We are at the forefront of AI adoption in our cloud-native SaaS platform, building intelligent, agentic features that transform how users interact with our product. As an AI SDET, you'll pioneer and scale AI-driven testing practices from the ground up—fast-tracking reliable, safe, and high-performing AI capabilities across the organization. You will contribute in areas to reduce deployment risks, minimize hallucinations and drift, ensure ethical AI, and drive faster releases (targeting 20-40% velocity gains through automated validations). This is a high-impact, foundational role in Platform Engineering's Quality function, where your work will directly influence product trust, compliance, and innovation for our end users. 

📍 Location: This is a fully remote position located in Colombia. 

You will be reporting to:
Jai Joshi

Contact:
Maira Russo - Senior Talent Acquisition Partner


What You’ll Be Doing
  • Quality & AI-First Mindset 
  • Evolve a modern, AI-first quality strategy for our fast-scaling SaaS architecture, including foundational infrastructure and emerging agentic/intelligent systems. 
  • Integrate AI enhancements into CI/CD pipelines (e.g., predictive flakiness detection, automated test generation, self-healing scripts) to improve isolation, data setup, & execution reliability using existing/suggesting tools. 
  • Establish scalable testing practices that support hyper-growth and petabyte-scale AI data pipelines. 

  • AI-Focused Test Strategy, Automation & Evaluation 
  • Design deterministic and statistical testing approaches for non-deterministic LLM-based and agentic systems, addressing hallucinations, prompt injection, bias, drift, and safety risks. 
  • Build automated evaluation pipelines and harnesses for correctness, faithfulness, retrieval quality, generation accuracy, tool-calling, planning sequences, and multi-agent flows. 
  • Execute/Develop test frameworks for the full AI lifecycle: prompts, datasets, embeddings, model versions, RAG pipelines (end-to-end validation), and guardrails. 
  • Implement red-teaming, bias/fairness checks, and compliance mechanisms; leverage in trend frameworks for metrics and observability. 
  • Integrate AI-specific quality signals into CI/CD for automated gating and continuous monitoring. 

  • Cross-Functional & End-to-End Testing 
  • Partner closely with product, data science, AI engineering, and dev teams to test AI features, conduct multi-agent simulations, and ensure high-quality roadmap delivery. 
  • Facilitate knowledge sharing and upskilling on AI testing best practices across the Quality Function. 

  • Metrics, Observability & Continuous Improvement 
  • Drive core metrics (DORA, test coverage/effectiveness) plus AI-specific indicators (e.g., hallucination rate, context precision, drift detection). 
  • Build real-time dashboards and support A/B testing of models with post-deployment monitoring. 

  • Culture, Mentorship & Innovation 
  • Champion a quality-first, ethical AI mindset organization-wide. 
  • Mentor SDET’s, lead workshops on AI risks/validation, and influence design/deploy/incident processes. 
  • As a foundational hire, define roadmaps and best practices for sustainable AI quality assurance. 

  • Challenges You'll Tackle
  • Ensuring reliability in agentic systems amid data drift and non-deterministic behavior. 
  • Scaling tests for global SaaS while maintaining low hallucination rates and strong safety guardrails. 
  • Building evaluation from scratch in a rapidly evolving landscape (e.g., multi-modal, agentic flows). 

  • Success in the First 6 Months
  • Launch foundational AI test frameworks and pipelines, achieving 80-90% coverage for key AI components. 
  • Reduce AI-related defect escapes by 30-40% and integrate automated safety/compliance checks into all releases. 
  • Establish metrics dashboards and evaluation loops that enable data-driven iteration on intelligent features. 

  • What You Will Bring
  • 7+ years in Quality Engineering/SDET roles within cloud-native SaaS environments, including 2+ years hands-on with AI/ML/LLM systems. 
  • Expertise in automated testing infrastructure, CI/CD (Jenkins/GitHub Actions), and test pyramid strategies (unit → E2E). 
  • Strong full-stack testing experience (frontend/backend/API) and collaboration with dev teams. 
  • Proven experience testing LLMs, AI agents, RAG pipelines, and related risks (hallucinations, prompt injection, bias, drift). 
  • Proficiency in JS/TS, working knowledge of Python or Java; experiance with AI evaluation frameworks (e.g., Ragas, DeepEval, LangChain/LangSmith/LangFuse) and other tools you may have proficiency in. 
  • Knowledge of performance, Stress and Load testing tools like K6, JMeter, Blazemeter will be nice to have. 
  • Knowledge of observability (NewRelic), statistical testing methods, red-teaming, and ethical AI practices. 
  • Excellent communication, and coaching skills; ability to thrive in ambiguity and drive innovation. 
  • Bachelor's/Master's in Computer Science, AI, or related; certifications (e.g., ISTQB AI Testing) a plus. 
  • Strong English language communication and collaboration skills

  • We value adaptability in this fast-moving field—equivalent experience and a strong portfolio (e.g., open-source contributions, case studies) are highly regarded.