Service
Beneficiary insight
Research and service-improvement teams theming feedback at scale, accurately enough that the research team signs off and the ethics review can audit.
The operational job: taking free-text patient feedback, complaint records, case notes, and survey responses across multiple form structures, and theming them accurately enough that the research team can sign off on the result and the ethics review can audit it.
This is hard because feedback arrives in different shapes from different services - surveys evolve, forms get updated, free-text questions change. The thing that doesn't change is the question the research team is trying to answer: what are people actually telling us, and what are we going to do about it?
The shape Loop takes here:
- Redaction pre-processor runs before any LLM call - PII is stripped at the worker tier before the model sees it
- Theming with quoted exemplars and provenance back to source records - every theme can be opened to show the specific responses that produced it
- Inter-annotator agreement calibrated against a hand-labelled gold-standard subset - the eval surface tells you when the model is drifting away from how your team would have labelled it
- Designed so a four-person research team can work with the same volume that ten would have done by hand
How we'd shape it
Insight deployments start with the gold-standard subset. We sit with the research team and have them label 50-100 responses by hand on the actual themes that matter for their work. That set becomes the eval surface against which everything else is measured. The discovery output is the calibration - what does "good" look like for this charity's this programme.
What gets built and stays running
We'd build:
- The redaction + theming + exemplar surfacing pipeline
- A per-batch eval against the gold-standard subset, with drift detection over time
- An export to whichever shape the research team needs (Excel for the report, structured CSV for the analyst, charts for the board pack)
- Audit lineage from every theme down to the specific responses that produced it
This is the area where ongoing care matters most. Feedback shape evolves; surveys change; new programmes start. We hold the calibration alongside the team and update the gold-standard subset as the work itself shifts.
Where this stands today
In production today at Breast Cancer Now as AIDA - the dual-schema AI system supporting Service Pledge research. AIDA themes patient feedback across 20+ different NHS form structures, accurately enough that the research team signs off on the results and uses them in published reports. Loop is the generalisation of what we learnt building it.
Sound like the kind of work you'd like back?
A one-week shape-finding engagement is how we start. If you decide to go ahead, that fee comes off the build.