We are sharing a specialised part-time consulting opportunity for experienced Grafana power users with strong backgrounds in observability, dashboard design, alerting systems, data source configuration, and technical workflow evaluation.
This role supports an exciting collaboration with a leading frontier AI research laboratory focused on improving how advanced AI agents perform real-world Grafana workflows through high-quality task design, reference execution, grading logic, and evaluation analysis.
Selected professionals will design expert-level evaluation tasks that test whether AI agents can use Grafana the way real professionals do, perform those workflows themselves on hosted Grafana instances to produce reference trajectories, implement programmatic graders, and review model attempts to identify failures and root causes. This opportunity is especially well-suited to professionals with deep Grafana expertise, strong systems thinking, and the ability to translate real-world observability workflows into rigorous AI evaluation tasks.
Key Responsibilities
Professionals in this role may contribute to:
Grafana Workflow Design
Design realistic, multi-step Grafana workflows involving dashboards, alerting rules, data source configuration, panel setup, and cross-module operations
Translate real-world observability workflows into clear, challenging evaluation tasks
Help ensure that tasks reflect authentic expert use of Grafana in professional environments
Reference Execution & Grader Development
Perform each workflow on a hosted Grafana instance to produce a reference trajectory
Write clear, specific task prompts with measurable outcomes that can be verified programmatically
Implement programmatic graders that check whether each instruction was completed correctly
Support structured evaluation workflows through precise and technically grounded task design
AI Evaluation & Task Calibration
Review AI agent attempts at assigned tasks and identify where and why they fail
Tag root causes and support high-quality model evaluation workflows
Calibrate task difficulty so tasks remain challenging but solvable, iterating on prompts and constraints based on model performance
Ideal Profile
Strong candidates may have:
2+ years of daily, professional Grafana experience in SRE, Platform Engineering, Observability, or similar environments
Deep familiarity with PromQL, dashboard templating, alerting pipelines, and data source configuration such as Prometheus, InfluxDB, or related systems
The ability to articulate workflows clearly enough for programmatic verification
Comfort writing basic grading scripts in Python
Strong systems thinking and the ability to work independently in structured technical workflows
Preferred qualifications
Experience with Grafana API automation
Background in Kubernetes or infrastructure monitoring
Familiarity with AI evaluation or benchmarking
Ability to translate complex operational workflows into measurable, reviewable task structures
Why This Opportunity
Contribute specialised Grafana and observability expertise to a cutting-edge AI collaboration
Help improve how advanced AI agents perform real-world monitoring and dashboard workflows
Work on high-impact task design, grading, and benchmarking systems with strong practical engineering relevance
Flexible remote work with structured expectations and competitive hourly compensation
Contract Details
Independent contractor role
Fully remote with flexible scheduling
Hourly compensation of $90–$150 per hour
Expected commitment of 10–15 hours per week minimum during the project
Fast turnaround is expected and responsiveness matters
Projects may be extended, shortened, or concluded early depending on project needs and performance
Weekly payments via Stripe or Wise
Work will not involve access to confidential or proprietary information from any employer, client, or institution
Please note: We are unable to support H1-B or STEM OPT candidates at this time
Start date: Immediate
About the Platform
This opportunity is available through a leading AI-driven work platform that connects domain experts with frontier AI research projects.
Experts contribute to improving advanced AI systems by providing specialised expertise across real-world workflows, structured evaluation, model training support, and domain-specific content validation.
By submitting this application, you acknowledge that your information may be processed by 24-MAG LLC for recruitment and opportunity matching in accordance with our Privacy Policy: https://www.24-mag.com/privacy-policy