Learning to Separate Text Content and Style for Classification | Zendy

Dell Zhang | Zendy; Wee Sun Lee | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Learning to Separate Text Content and Style for Classification

Author(s) -

Dell Zhang,

Wee Sun Lee

Publication year - 2006

Publication title -

lecture notes in computer science

Language(s) - English

Resource type - Book series

SCImago Journal Rank - 0.249

H-Index - 400

eISSN - 1611-3349

pISSN - 0302-9743

ISBN - 3-540-45780-1

DOI - 10.1007/11880592_7

Subject(s) - computer science , style (visual arts) , content (measure theory) , set (abstract data type) , information retrieval , multinomial distribution , maximization , artificial intelligence , natural language processing , mathematics , statistics , mathematical analysis , archaeology , history , programming language , mathematical optimization

Many text documents naturally have two kinds of labels. For example, we may label web pages from universities according to their categories, such as “student” or “faculty”, or according the source universities, such as “Cornell” or “Texas”. We call one kind of labels the content and the other kind the style. Given a set of documents, each with both content and style labels, we seek to effectively learn to classify a set of documents in a new style with no content labels into its content classes. Assuming that every document is generated using words drawn from a mixture of two multinomial component models, one content model and one style model, we propose a method named Cartesian EM that constructs content models and style models through Expectation Maximization and performs classification of the unknown content classes transductively. Our experiments on real-world datasets show the proposed method to be effective for style independent text content classification.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research