English (United Kingdom)

https://curated-unify.zendy.io/wp-json/zendy-region/v1/featured_content/oa?rat=en

https://curated-unify.zendy.io/wp-json/zendy-region/v1/highlighted_journal/

Zendy Plus

Presents the access of premium content as premium feature

Premium Content

Presents the keyphrase highlighting as premium feature

Keyphrase Highlighting

Presents the summarisation as premium feature

Summarisation

Insights

Presents the pdf analysis as premium feature

PDF Analysis

Presents the zaia usage as premium feature

ZAIA

Zendy Tools

Zendy Open

Large language models can now directly generate answers to many factualquestions without referencing external sources. Unfortunately, relativelylittle attention has been paid to methods for evaluating the quality andcorrectness of these answers, for comparing the performance of one model toanother, or for comparing one prompt to another. In addition, the quality ofgenerated answers are rarely directly compared to the quality of retrievedanswers. As models evolve and prompts are modified, we have no systematic wayto measure improvements without resorting to expensive human judgments. Toaddress this problem we adapt standard retrieval benchmarks to evaluate answersgenerated by large language models. Inspired by the BERTScore metric forsummarization, we explore two approaches. In the first, we base our evaluationon the benchmark relevance judgments. We empirically run experiments on howinformation retrieval relevance judgments can be utilized as an anchor toevaluating the generated answers. In the second, we compare generated answersto the top results retrieved by a diverse set of retrieval models, ranging fromtraditional approaches to advanced methods, allowing us to measure improvementswithout human judgments. In both cases, we measure the similarity between anembedded representation of the generated answer and an embedded representationof a known, or assumed, relevant passage from the retrieval benchmark.

Adapting Standard Retrieval Benchmarks to Evaluate Generated Answers