Metrics for query exact match, token overlap, answer-set quality, BLEU/ROUGE, CodeBLEU, and more. Execution backends for local RDF (RDFLib) and remote SPARQL endpoints. Pluggable LLM-based judging via ...