The Correlation Between Data Monetization Term Usage and Readability of Privacy Policies
Presenter(s)
Garrett Buryska
Abstract
Websites and online services have become essential to daily life, and with that, guardrails in the form of privacy policies are as important to maintain user consent. However, privacy policies are one-sided. While these policies are meticulously crafted by corporations, the resulting documents are often dense and legally complex for the average user. This study utilized natural language processing (NLP) to investigate the relationship between readability and the presence of terms related to sensitive data and monetization. The dataset used for the analysis consisted of over 100 privacy policies across many sectors. A data disclosure score was developed using the SpaCy library to detect disclosures of data sharing, tracking, and sensitive information using common keywords while also considering negation. Readability was scored using the Textstat library, with the Flesch Reading Ease (FRE) metric being used primarily. Statistical analysis revealed a significant negative correlation between readability and the data disclosure score, showing that policies with possibly more invasive data practices are more difficult to read. A paired t-test also showed that sentences containing data disclosure keywords are significantly harder to read than the rest of the text. Our results showed that less readable text occurs around sensitive data and monetization disclosures, which possibly conceals important privacy disclosures from the average user.
College
College of Science & Engineering
Department
Computer Science
Campus
Winona
First Advisor/Mentor
Mingrui Zhang
Second Advisor/Mentor
Trung Nguyen
Location
Kryzsko Great River Ballroom, Winona, Minnesota; United States
Start Date
4-23-2026 1:00 PM
End Date
4-23-2026 2:00 PM
Presentation Type
Poster Session
Format of Presentation or Performance
In-Person
Session
2a=1pm-2pm
Poster Number
7
The Correlation Between Data Monetization Term Usage and Readability of Privacy Policies
Kryzsko Great River Ballroom, Winona, Minnesota; United States
Websites and online services have become essential to daily life, and with that, guardrails in the form of privacy policies are as important to maintain user consent. However, privacy policies are one-sided. While these policies are meticulously crafted by corporations, the resulting documents are often dense and legally complex for the average user. This study utilized natural language processing (NLP) to investigate the relationship between readability and the presence of terms related to sensitive data and monetization. The dataset used for the analysis consisted of over 100 privacy policies across many sectors. A data disclosure score was developed using the SpaCy library to detect disclosures of data sharing, tracking, and sensitive information using common keywords while also considering negation. Readability was scored using the Textstat library, with the Flesch Reading Ease (FRE) metric being used primarily. Statistical analysis revealed a significant negative correlation between readability and the data disclosure score, showing that policies with possibly more invasive data practices are more difficult to read. A paired t-test also showed that sentences containing data disclosure keywords are significantly harder to read than the rest of the text. Our results showed that less readable text occurs around sensitive data and monetization disclosures, which possibly conceals important privacy disclosures from the average user.

Comments
Buryska, Garrett A