How we calculate accuracy
Overview
We calculate accuracy using a long-running background task. At the heart of this is a prompt chain where we have the LLM classify each document as highly relevant / somewhat relevant / not relevant to the query, then calculate scores off of that assessment.
Somewhat relevant sources are counted as misses, as this category captures the type of tangential information that would ead to the LLM hallucinating or not directly answering the user.
Search relevance
Pongo calculates accuracy on your top 5 search results using Mean Reciprocal Rank (MRR).
This metric takes into account the rank of the first instance of a relevant result, where all relevant results at rank 1 is a perfect score and no relevant results is 0.
This graphic from Evidently AI explains it best:
“Query with missing context”
When we say a query is missing context, we mean one that our analysis pipeline has identified as having no information directly relevant to the query.
It could have somewhat relevant information, which may lead to the user getting a hallucinated answer, or no relevant information, which would lead to an “I don’t know” from the LLM.