Privacy protection of sexually transmitted infections information from Chinese electronic medical records

Sci Rep. 2025 Jan 8;15(1):1296. doi: 10.1038/s41598-024-84658-9.

Abstract

The comprehensive adoption of Electronic Medical Records (EMRs) offers numerous benefits but also introduces risks of privacy leakage, particularly for patients with Sexually Transmitted Infections (STI) who need protection from social secondary harm. Despite advancements in privacy protection research, the effectiveness of these strategies in real-world data remains debatable. The objective is to develop effective information extraction and privacy protection strategies to safeguard STI patients in the Chinese healthcare environment and prevent unnecessary privacy leakage during the data-sharing process of EMRs. The research was conducted at a national healthcare data center, where a committee of experts designed rule-based protocols utilizing natural language processing techniques to extract STI information. Extraction Protocol of Sexually Transmitted Infections Information (EPSTII), designed specifically for the Chinese EMRs system, enables accurate and complete identification and extraction of STI-related information, ensuring high protection performance. The protocol was refined multiple times based on the calculated precision and recall. Final protocol was applied to 5,000 randomly selected EMRs to calculate the success rate of privacy protection. A total of 3,233,174 patients were selected based on the inclusion criteria and a 50% entry ratio. Of these, 148,856 patients with sensitive STI information were identified from disease history. The identification frequency varied, with the diagnosis sub-dataset being the highest at 4.8%. Both the precision and recall rates have reached over 95%, demonstrating the effectiveness of our method. The success rate of privacy protection was 98.25%, ensuring the utmost privacy protection for patients with STI. Finding an effective method to protect privacy information in EMRs is meaningful. We demonstrated the feasibility of applying the EPSTII method to EMRs. Our protocol offers more comprehensive results compared to traditional methods of including STI information.

Keywords: Chinese electronic medical records; Infectious disease; Natural language processing; Privacy protection; Sexually transmitted infections.

MeSH terms

  • Adult
  • China / epidemiology
  • Computer Security
  • Confidentiality*
  • Electronic Health Records*
  • Female
  • Humans
  • Male
  • Middle Aged
  • Natural Language Processing
  • Privacy
  • Sexually Transmitted Diseases* / prevention & control