Job Description

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.

Role Overview

We are looking for highly analytical engineers and technical domain experts to contribute to advanced AI evaluation and benchmarking projects focused on realistic terminal-based and infrastructure-heavy workflows. In this role, you will design technically challenging tasks that evaluate how AI systems reason through debugging, operational failures, complex workflows, and multi-step problem-solving scenarios.

The ideal candidate has strong experience working with production systems, debugging, automation, or large-scale engineering workflows, and can design realistic technical challenges that simulate real-world engineering environments.

This role is particularly well suited for professionals with backgrounds in backend engineering, infrastructure, DevOps, data systems, MLOps, cybersecurity, or platform engineering.

CONTRACT: Contractor assignment (5 weeks)

COMMITMENT: Full-time (40h/week) or Part-time (20h/week) with minimum 4h PST overlap

LOCATION: Remote — Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Pakistan, Indonesia, Kenya, Nigeria, Turkey, Vietnam

PROCESS: One technical assessment/interview (~45 min)

Responsibilities:

Design realistic terminal-based benchmark tasks for AI evaluation systems
Create technically deep debugging and investigation scenarios
Develop task specifications involving infrastructure, workflows, pipelines, or operational failures
Write clear solution approaches and deterministic evaluation criteria
Identify realistic edge cases, failure modes, and system constraints
Design multi-step reasoning challenges across complex technical environments
Contribute expertise across one or more engineering or operational domains
Review and refine benchmark quality, difficulty, and validation logic
Collaborate with reviewers and researchers on AI evaluation workflows

Requirements

3–10 years of experience in software engineering or related technical domains
Strong debugging, analytical, and systems reasoning skills
Good understanding of system architecture, dependencies, and operational processes
Experience with terminal, CLI, automation, or developer tooling workflows
Exposure to AI systems, LLMs, benchmarking, or evaluation frameworks is preferred
Ability to design technically rigorous and realistic engineering scenarios

Senior Mobile Developer (React Native)

About the ProjectSidekick is a Canadian DTC brand selling physical therapy and rehab products. We're hiring a senior React Native developer to own our customer companion

Senior
react native
mobile
javascript
react
dev

Technical Lead - Core Payments

Latam, Remote, Full Time, Leader, +5 years of experienceWho We AreAt Yuno, we are building the payment infrastructure that enables all companies to participate in the glo

Lead
exec

Head of Product Development - 59086937461

Job Title: Head of Product DevelopmentJob Type: Full-time / RemoteWork Hours: US HoursJob Overview:We’re helping our client find a Head of Product Development to build an

Head Level
exec
product manager

Principal Backend Engineer

Jeeves is looking for a Senior Backend Engineer to join our LATAM engineering team. You will design and build the backend systems that power Jeeves's financial platform —

Head Level
backend
engineer
aws

AI Evaluation Engineer - Software Engineering Domain

Job Description

Remote Colombia

Software development

2 days ago

Senior Mobile Developer (React Native)

Technical Lead - Core Payments

Head of Product Development - 59086937461

Principal Backend Engineer

Find Remote Jobs

About us

Additional

AI Evaluation Engineer - Software Engineering Domain

Job Description

Remote Colombia

Software development

2 days ago

Senior Mobile Developer (React Native)

Technical Lead - Core Payments

Head of Product Development - 59086937461

Principal Backend Engineer

Subscribe to Job Alerts

Find Remote Jobs

About us

Additional