2025-09-20 - AI Evaluation & Benchmarking TechSprint

Turn yesterday’s ideas into working code for the Evaluation and Benchmarking Suite community.

Why a TechSprint?

The scoping workshop (Day 1) will finish with a Must‑Have feature backlog and roadmap for the Evaluation & Benchmarking Suite. Day 2 is where we build. In just eight focused hours we aim to:

Ship the first proof‑of‑concept for the “add‑your‑own‑benchmark data” feature.
Prototype integrations with leading open‑source eval tools such as DeepEval, Ragas, truLens, promptFoo.
Automate FINOS‑style governance checks (licensing, dataset metadata, risk tags) on every pull request.

By the closing demo we expect multiple merged PRs and a clear follow‑on workstream for the wider community.

What stack we will provide?

Who should attend?

Backend & data engineers keen to work with HuggingFace, FastAPI, GitHub Actions
Data‑/ML‑scientists who maintain or create finance‑specific tasks/datasets
Model‑risk, compliance & DevSecOps specialists translating EU AI Act controls into code
UI/UX & documentation writers to polish the demo deliverables

What you’ll leave with

Publicly visible contributions merged (or ready to merge) into FINOS‑Labs repos.
Reusable tooling for your own AI governance pipelines.
Network & bragging rights – best demo wins sponsor swag and a FINOS blog feature.

Logistics

Date: 20 Sept 2025 – the day after the Scoping Workshop
Time: 09:00 – 18:00 BST
Format: Red Hat, Peninsular House, 30-36 Monument Street, 4th floor, London EC3R 8NB, United Kingdom & Virtual
Cost: Free – registration required (capacity limited to ensure mentor access).

Supported by

FINOS Events

AI EVALUATION & BENCHMARKING SUITE - TECHSPRINT

FINOS Events

AI EVALUATION & BENCHMARKING SUITE - TECHSPRINT

Share this: