Job Description

We need someone who can build high-quality forecasting models for UK energy balancing markets — not a generalist who's touched a bit of everything, but a specialist who genuinely understands time series, knows how to extract signal from massive feature sets, and can produce reliable probabilistic forecasts.

You'll spend significant time on tasks like: engineering features from raw market data, selecting the most predictive subset from hundreds of thousands of candidates, building gradient boosting models that output well-calibrated prediction intervals, and rigorously validating everything to avoid the subtle leakage problems that plague time series work.

You won't be responsible for deployment — we have experienced DevOps for that. But you'll need to hand off models that are well-documented, reproducible, and actually work in production. If you find satisfaction in the craft of building models that hold up under scrutiny — rather than just hitting a metric on a test set — this role is for you.

Feature Engineering and Selection

• Engineer predictive features from energy market data (prices, volumes, grid conditions, weather, calendar effects)

• Work with feature sets in the hundreds of thousands — you'll need systematic approaches, not manual inspection

• Apply and evaluate feature selection methods (mRMR, importance-based selection, recursive elimination) to build parsimonious models

• Analyse feature importance and stability across time periods and market conditions

• Understand the domain well enough to create features that reflect how the balancing market actually works

Model Development

• Build gradient boosting models (XGBoost, LightGBM, CatBoost) for multi-horizon forecasting

• Produce probabilistic forecasts — prediction intervals, quantile regression, or distribution outputs — not just point estimates

• Handle class imbalances appropriately when the problem requires classification

• Design proper time series cross-validation schemes that respect temporal ordering

• Diagnose and fix target leakage — you should be able to explain why a 'too good' result is suspicious

Validation and Testing

• Test pipeline components using synthetic/artificial data where ground truth is known

• Validate that preprocessing steps (missing value imputation, outlier handling) don't introduce leakage

• Build confidence that models will generalise, not just interpolate

Experiment Tracking and Reproducibility

• Track experiments systematically (MLflow or similar)

• Maintain reproducible training pipelines with proper configuration management

• Document model decisions, hyperparameter choices, and validation results clearly

Domain Understanding

• Invest time learning UK energy balancing markets — BM units, settlement periods, system prices, imbalance dynamics

• Translate domain knowledge into model improvements (better features, appropriate loss functions, sensible constraints)

• Collaborate with colleagues who understand the data infrastructure and market context

Requirements

Must Have

• Deep time series experience — you understand why random CV splits fail for forecasting, how to handle multiple horizons, and the pitfalls of lookahead bias

• Strong feature engineering and selection skills — you've worked with high-dimensional feature sets and know multiple approaches to reduce them systematically

• Gradient boosting expertise — XGBoost, LightGBM, or CatBoost are your core tools; you understand their hyperparameters and when each matters

• Probabilistic forecasting ability — you can produce calibrated prediction intervals or quantile forecasts, not just point predictions

• Rigorous validation mindset — you're paranoid about leakage, you test your assumptions, and you don't trust results that seem too good

• Python fluency — clean, testable code; comfortable with pandas/Polars, scikit-learn, and the GBM libraries

• SQL competence — you can pull and reshape data from PostgreSQL without friction

• Clear communication — you document your work and can explain model behaviour to non-ML colleagues

Nice to Have

• Experience with MLflow, Hydra, Metaflow, or similar tooling for experiment tracking and pipeline management

• Polars experience (we're migrating some workloads from pandas)

• Background in energy, utilities, trading, or other domains with similar forecasting challenges

• Familiarity with UK energy markets, Elexon data, or grid balancing

• Experience with conformal prediction or other modern uncertainty quantification methods

Highly Desirable — Agentic AI Coding Experience

We value candidates who can build software using agentic AI coding systems. This is fundamentally different from using code completion tools or chat-based assistants.

What we're NOT looking for: - GitHub Copilot (code completion/autocomplete) - ChatGPT or similar chat interfaces for generating isolated code snippets - Any tool that only provides single-turn question/answer interactions

What we ARE looking for: Hands-on experience with agentic coding systems such as Claude Code, Codex (OpenAI's agentic coding tool), Open Code, or Cursor.

Ideal candidates will demonstrate:

- Breadth of experience — proficiency with at least 2 agentic systems (experience with only one is insufficient)

- End-to-end development — ability to design and build software from the ground up using these tools, not just generating isolated snippets

- Multi-agent orchestration — demonstrated experience orchestrating multiple agents using skills, tools, and agent coordination, not just one-shot problem solving

- Deep system knowledge — familiarity with hooks, permission systems, MCP (Model Context Protocol) servers, custom skills and tool definitions, and context management

Benefits

Plenty of opportunities for learning and professional growth
B2b contract with a paid vacation

Principal Data Scientist (GenAI)

Tiger Analytics is looking for experienced Data Scientists to join our fast-growing advanced analytics consulting firm. Our consultants bring deep expertise in Data Scien

Head Level
data science

Data Scientist

Company SummaryIrth Solutions is a software product company building cutting-edge technology platforms that set industry benchmarks across Damage Prevention, Asset Integr

data science

Data Scientist (Risk)

About The Team:The Risk Team is responsible for building the defense layer that keeps CloudWalk, its merchants, and users safe. We keep our ecosystem healthy by tracking

data science

Data Scientist

Reliance Health’s mission is to make quality healthcare delightful, affordable, and accessible in emerging markets. From Nigeria, Egypt, Senegal and beyond, we offer comp

data science

Senior Data Scientist

Job Description

Lithuania Only

Data Scientist

12 days ago

Senior

Principal Data Scientist (GenAI)

Data Scientist

Data Scientist (Risk)

Data Scientist

Find Remote Jobs

About us