State-of-the-art anonymisation of medical records using an iterative machine learning framework | Zendy

György Szarvas | Zendy; Richárd Farkas | Zendy; Róbert BusaFekete | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

State-of-the-art anonymisation of medical records using an iterative machine learning framework

Author(s) -

György Szarvas,

Richárd Farkas,

Róbert BusaFekete

Publication year - 2007

Publication title -

journal of the american medical informatics association

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.614

H-Index - 150

eISSN - 1527-974X

pISSN - 1067-5027

DOI - 10.1197/jamia.m2441

Subject(s) - computer science , health insurance portability and accountability act , software portability , identification (biology) , security token , protected health information , precision and recall , named entity recognition , information retrieval , measure (data warehouse) , recall , unified medical language system , artificial intelligence , medical record , data mining , natural language processing , machine learning , confidentiality , health care , philosophy , health promotion , computer security , hrhis , task (project management) , economic growth , linguistics , biology , management , radiology , programming language , medicine , botany , economics

OBJECTIVEThe anonymization of medical records is of great importance in the human life sciences because a de-identified text can be made publicly available for non-hospital researchers as well, to facilitate research on human diseases. Here the authors have developed a de-identification model that can successfully remove personal health information (PHI) from discharge records to make them conform to the guidelines of the Health Information Portability and Accountability Act.DESIGNWe introduce here a novel, machine learning-based iterative Named Entity Recognition approach intended for use on semi-structured documents like discharge records. Our method identifies PHI in several steps. First, it labels all entities whose tags can be inferred from the structure of the text and it then utilizes this information to find further PHI phrases in the flow text parts of the document.MEASUREMENTSFollowing the standard evaluation method of the first Workshop on Challenges in Natural Language Processing for Clinical Data, we used token-level Precision, Recall and F(beta=1) measure metrics for evaluation.RESULTSOur system achieved outstanding accuracy on the standard evaluation dataset of the de-identification challenge, with an F measure of 99.7534% for the best submitted model.CONCLUSIONWe can say that our system is competitive with the current state-of-the-art solutions, while we describe here several techniques that can be beneficial in other tasks that need to handle structured documents such as clinical records.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research