2025-09-19 - AI Evaluation & Benchmarking Workshop

Written by FINOS Team | Aug 8, 2025 2:46:39 PM

Help set the industry standard for trustworthy, production‑ready GenAI in financial services at this member only workshop.

Why this workshop, why now?

Modern LLM‑powered solutions can’t move from proof‑of‑concept to production unless they are measured in ways that build trust and confidence. FINOS has already boot‑strapped an Open Financial LLM Leaderboard that assesses 35 datasets across 23 finance‑specific tasks, but contributors now need a clear, community‑owned process to expand that work into a full Evaluation & Benchmarking Suite.

This scoping workshop is the official kick‑off for that effort. By the end of a highly‑interactive half day you will have:

A consensus‑driven Project Scope Statement that defines what the suite is and is not.
A negotiated “Must‑Have” feature backlog and a theme‑based public roadmap to Q4 2025 and beyond.
A clear path for contributing new datasets, tasks and risk metrics under FINOS governance.

Who should attend?

Financial‑services technologists building or evaluating GenAI solutions
Model‑risk, compliance & audit professionals aligning with EU AI Act and other regulations
Data‑science researchers & academics focusing on domain‑specific benchmarks
Open‑source contributors & vendors offering evaluation tooling or datasets

Key take‑aways

Practical artefacts ready to merge into the FINOS repo: README, CONTRIBUTING, GitHub issues.
Peer network across banks, vendors, academia and regulators shaping open standards for GenAI assurance.
Early influence on the next features of the Open Financial LLM Leaderboard and future evaluation tooling.

Logistics

Date:  19 Sept 2025
Time: 13:00 – 17:00 BST
Location: London, Location TBA
Cost: Free – registration required. FINOS Members Only.

View full post

2025-09-19 - AI Evaluation & Benchmarking Workshop

2025-09-19 - AI Evaluation & Benchmarking Workshop