Researching AI products when every user experience is unique
Developing hybrid research methodologies that validate genuine utility over novelty when traditional UX methods fail with probabilistic AI systems.

When we began developing Goose, an AI assistant for heritage professionals, we encountered a fundamental problem that traditional user research methods couldn't address: how do you validate an AI product when every user interaction is probabilistic and unique?
Unlike conventional software where you can observe users completing predefined tasks - clicking through checkout flows, navigating menus, or finding specific information - AI products generate different responses for every user, even when asked identical questions. Traditional usability testing methods like tree testing or task completion rates become meaningless when the interface itself is conversational and the outputs are dynamically generated.
This presented a critical challenge for responsible AI development. How could we distinguish between users being impressed by AI capabilities and users actually finding genuine utility? How could we validate that our heritage-focused AI was solving real professional problems rather than simply providing entertaining conversations?
When standard research methods fall short
Early user feedback was enthusiastic: "Goose feels like I've asked a person and someone who’s very knowledgeable has come back to me," said one tester. But enthusiasm alone wasn't sufficient validation. We needed to understand whether users were experiencing genuine value or simply the novelty effect common with AI interactions.
Traditional metrics offered limited insight. User satisfaction scores couldn't tell us whether the AI's responses were actually helpful for heritage marketing challenges. Task completion rates were irrelevant when each conversation followed unpredictable paths based on the AI's probabilistic responses. Standard UX observation methods couldn't capture whether the AI was providing appropriate domain expertise or generic advice.
The research challenge was compounded by the subjective nature of AI utility. What constitutes a "good" AI response varies dramatically based on user expertise, context, and specific needs. A response that delights one user might frustrate another, even when both are working on similar heritage marketing challenges.
Hybrid research methodology for AI validation
We developed a dual-track research approach that combined qualitative user interviews with detailed behavioural analytics. This allowed us to match user perceptions against actual usage patterns and validate both satisfaction and genuine utility.
The qualitative research focused on understanding user experiences through in-depth conversations about their interactions with Goose. We explored not just what users thought about the responses they received, but how they integrated those responses into their actual work. Did the AI's suggestions influence real marketing decisions? Were users returning to continue projects, or were conversations one-off experiments?
Simultaneously, we tracked detailed usage analytics: message lengths, conversation topics, return usage patterns, and feature adoption. This data revealed genuine engagement patterns that users themselves might not recognise or report accurately. We could see that 45% of users were highly engaged, with conversations averaging 11 messages per project, and that users working with Goose's "thinking partners" feature were having significantly longer, more detailed interactions.
Validating genuine utility over novelty
The hybrid approach revealed crucial insights that neither method could capture alone. Users praised Goose's domain expertise - "I think that is a huge point around Goose - it is built specifically for the heritage sector" - and this was validated by our analytics showing that 22% of conversations focused on strategy, 20% on heritage-specific topics, and 16% on marketing challenges.
More importantly, we could validate that users were applying AI insights to real work scenarios. One user reported that Goose's fundraising advice gave her "a couple of ideas that aren't in the first draft" of an actual fundraising plan. Another described using it for "real-world tasks" like visitor engagement strategies and event planning.
The research revealed that thinking partners - AI personas representing different professional perspectives - drove significantly deeper engagement. Projects using thinking partners averaged 20 messages compared to 7 messages for standard conversations, suggesting users found genuine collaborative value rather than just novelty.
Research methodology that matches values
This research approach reflected our commitment to responsible AI development for mission-driven organisations. Rather than optimising for impressive demos or user amazement, we prioritised validating genuine professional utility. We measured conversation quality, topic relevance, and sustained engagement rather than vanity metrics like user excitement or initial adoption.
The methodology also revealed areas where the AI needed improvement. Users noted that thinking partner responses sometimes lacked sufficient differentiation, and the trusted sources feature saw limited engagement - insights that wouldn't have emerged from satisfaction surveys alone.
By combining user perception data with behavioural evidence, we could confidently validate that Goose was serving heritage professionals' actual needs rather than simply showcasing technological capability. This research foundation enabled responsible scaling, ensuring that subsequent development focused on genuine utility rather than impressive but superficial features.
The approach demonstrated that researching AI products for mission-driven contexts requires methodologies that prioritise impact over impression - measuring whether the technology genuinely serves organisational values and professional needs, not just whether users find it remarkable.
Technological change continues to accelerate but only a quarter of charities say they feel prepared to respond to the opportunities and challenges. Let's close the opportunity gap together.