Gold Standard Evaluation

Run the pipeline across gold standard cases, compare outputs vs GT talking points, and review LLM-assisted match judgments.

Loading

GT Talking Points

Output Talking Points

Judgment view
GT Point Section Status Comments

Overall Run Feedback

Case Scores

Overall Score
Matched
Partial
Missing

Overall % by Status

Section Scores

Point Distribution (Selected Case)

Section Breakdown Table

Section Score Matched Partial Missing

Quality Metrics

    Aggregate Scores (All Cases)

    Overall Score
    Matched
    Partial
    Missing

    Overall % by Status

    Point Distribution (All Cases)

    Macro Averages

      Done