Human assessments of document similarity | Zendy

Westerman S.J. | Zendy; Cribbin T. | Zendy; Collins J. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Human assessments of document similarity

Author(s) -

Westerman S.J.,

Cribbin T.,

Collins J.

Publication year - 2010

Publication title -

journal of the american society for information science and technology

Language(s) - English

Resource type - Journals

eISSN - 1532-2890

pISSN - 1532-2882

DOI - 10.1002/asi.21361

Subject(s) - similarity (geometry) , reliability (semiconductor) , computer science , n gram , string (physics) , gram , correlation , information retrieval , data mining , statistics , artificial intelligence , mathematics , image (mathematics) , biology , physics , power (physics) , genetics , geometry , quantum mechanics , bacteria , language model , mathematical physics

Two studies are reported that examined the reliability of human assessments of document similarity and the association between human ratings and the results of n‐gram automatic text analysis (ATA). Human interassessor reliability (IAR) was moderate to poor. However, correlations between average human ratings and n‐gram solutions were strong. The average correlation between ATA and individual human solutions was greater than IAR. N‐gram length influenced the strength of association, but optimum string length depended on the nature of the text (technical vs. nontechnical). We conclude that the methodology applied in previous studies may have led to overoptimistic views on human reliability, but that an optimal n‐gram solution can provide a good approximation of the average human assessment of document similarity, a result that has important implications for future development of document visualization systems.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research