Alex Staff Agency

Senior Data Specialist

Job Description

We need someone who understands data deeply and uses Python to wrangle it — not a platform engineer, not a pure pipeline builder, but a data specialist who's comfortable with research, investigation, and the unglamorous work of making messy energy market data actually usable.

You'll spend significant time on tasks like: mapping BM units to power plants and fuel types, reconciling legacy data formats with current ones, ensuring consistency between different Elexon message types, and cleaning time-series data (outliers, gaps, overlaps). Some of this requires genuine investigation — cross-referencing sources, making judgment calls, documenting edge cases. There's no API that solves these problems for you.

Python is your primary tool (Pandas, Numpy, standard libraries) to minimise manual effort, but you should be comfortable that some detective work is unavoidable. If you find satisfaction in truly understanding a dataset's structure and quirks — rather than just piping data through and hoping for the best — this role is for you.

Data Mapping and Research

• Map BM units from Elexon to their corresponding power plants, substations, and fuel types — combining API data, public registers, and manual research

• Map substations to ETYS zones and grid supply points

• Build and maintain reference/master datasets that link identifiers across disparate sources (Elexon, National Grid ESO, TEC register, etc.)

• Document mappings, assumptions, and known limitations clearly for downstream users

Data Reconciliation and Consistency

• Reconcile legacy data formats with current formats (e.g., historical operational data stored in different schemas or granularities)

• Ensure consistency between different Elexon message types — understand the market data structure well enough to know why BOALF, BOD, and DISBSAD might not perfectly align and how to handle it

• Investigate discrepancies between data sources and determine authoritative values

Data Cleaning and Quality

• Clean time-series data: detect outliers (price spikes, meter errors), fill gaps appropriately, resolve overlapping or duplicate timestamps

• Develop reusable Python-based cleaning routines that can be applied across datasets

• Understand why data quality issues occur (settlement reruns, late submissions, format changes) not just patch them

Pipeline Development (Supporting the Above)

• Write and maintain Python data grabbers for energy market APIs

• Build dbt models to transform raw data into clean, analysis-ready datasets

• Orchestrate workflows via GitHub Actions

• Design PostgreSQL schemas that reflect your understanding of the domain

Requirements

Must Have

Strong Python skills for data work — you're fluent with pandas, comfortable writing clean, testable code, and can build reusable data processing logic. This is not an Excel role.

Solid SQL skills — complex queries, window functions, CTEs in PostgreSQL

Experience with messy, real-world data — you've done reconciliation, cleaning, or mapping work before and understand it's not always automatable

Methodical and detail-oriented — you notice inconsistencies and want to understand root causes

Experience with messy, real-world data — you've done reconciliation, cleaning, or mapping work before and understand it's not always automatable

Methodical and detail-oriented — you notice inconsistencies and want to understand root causes

Good documentation habits — you know that undocumented mappings and assumptions are technical debt

Self-directed — you can own ambiguous problems, do your own research, and communicate findings clearly

Nice to Have

• Experience with energy, utilities, or market data (any geography)

• Familiarity with UK energy markets, Elexon data, or grid operations

• dbt experience for transformation pipelines

• Exposure to time-series data challenges (irregular timestamps, gaps, restatements)

Highly Desirable — Agentic AI Coding Experience

We value candidates who can build software using agentic AI coding systems. This is fundamentally

different from using code completion tools or chat-based assistants.

What we're NOT looking for:

- GitHub Copilot (code completion/autocomplete)

- ChatGPT or similar chat interfaces for generating isolated code snippets

- Any tool that only provides single-turn question/answer interactions

What we ARE looking for: Hands-on experience with agentic coding systems such as Claude Code,

Codex (OpenAI's agentic coding tool), Open Code, or Cursor.

Ideal candidates will demonstrate:

- Breadth of experience — proficiency with at least 2 agentic systems (experience with only one is insufficient)

- End-to-end development — ability to design and build software from the ground up using these tools, not just generating isolated snippets

- Multi-agent orchestration — demonstrated experience orchestrating multiple agents using skills, tools, and agent

coordination, not just one-shot problem solving

- Deep system knowledge — familiarity with hooks, permission systems, MCP (Model Context Protocol) servers, custom skills and tool definitions, and context management

Not What We're Looking For

• Platform/infrastructure engineers who prefer to stay above the data layer

• People who expect clean, well-documented data as input

• Those uncomfortable with research, ambiguity, or "manual" investigation work

Practicalities

• Remote-first with async collaboration (Slack, GitHub, documented decisions)

• Core overlap with UK business hours expected (at least 4 hours daily)

• Competitive compensation based on location and experience

Benefits

  • Plenty of opportunities for learning and professional growth
  • B2b contract with a paid vacation