Spam filter evaluation with imprecise ground truth | Zendy

Gordon V. Cormack | Zendy; Aleksander Kołcz | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Spam filter evaluation with imprecise ground truth

Author(s) -

Gordon V. Cormack,

Aleksander Kołcz

Publication year - 2009

Publication title -

citeseer x (the pennsylvania state university)

Language(s) - English

Resource type - Conference proceedings

DOI - 10.1145/1571941.1572045

Subject(s) - computer science , filter (signal processing) , ground truth , word error rate , artificial intelligence , measure (data warehouse) , data mining , computer vision

When trained and evaluated on accurately labeled datasets, online email spam filters are remarkably effective, achieving error rates an order of magnitude better than classifiers in similar applications. But labels acquired from user feedback or third-party adjudication exhibit higher error rates than the best filters -- even filters trained using the same source of labels. It is appropriate to use naturally occuring labels -- including errors -- as training data in evaluating spam filters. Erroneous labels are problematic, however, when used as ground truth to measure filter effectiveness. Any measurement of the filter's error rate will be augmented and perhaps masked by the label error rate. Using two natural sources of labels, we demonstrate automatic and semi-automatic methods that reduce the influence of labeling errors on evaluation, yielding substantially more precise measurements of true filter error rates.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research