The Correlation Between Data Monetization Term Usage and Readability of Privacy Policies

Presenter(s)

Garrett Buryska

Abstract

Websites and online services have become essential to daily life, and with that, guardrails in the form of privacy policies are as important to maintain user consent. However, privacy policies are one-sided. While these policies are meticulously crafted by corporations, the resulting documents are often dense and legally complex for the average user. This study utilized natural language processing (NLP) to investigate the relationship between readability and the presence of terms related to sensitive data and monetization. The dataset used for the analysis consisted of over 100 privacy policies across many sectors. A data disclosure score was developed using the SpaCy library to detect disclosures of data sharing, tracking, and sensitive information using common keywords while also considering negation. Readability was scored using the Textstat library, with the Flesch Reading Ease (FRE) metric being used primarily. Statistical analysis revealed a significant negative correlation between readability and the data disclosure score, showing that policies with possibly more invasive data practices are more difficult to read. A paired t-test also showed that sentences containing data disclosure keywords are significantly harder to read than the rest of the text. Our results showed that less readable text occurs around sensitive data and monetization disclosures, which possibly conceals important privacy disclosures from the average user.

College

College of Science & Engineering

Department

Computer Science

Campus

Winona

First Advisor/Mentor

Mingrui Zhang

Second Advisor/Mentor

Trung Nguyen

Location

Kryzsko Great River Ballroom, Winona, Minnesota; United States

Start Date

4-23-2026 1:00 PM

End Date

4-23-2026 2:00 PM

Presentation Type

Poster Session

Format of Presentation or Performance

In-Person

Session

2a=1pm-2pm

Poster Number

7

Comments

Buryska, Garrett A

Share

COinS
 
Apr 23rd, 1:00 PM Apr 23rd, 2:00 PM

The Correlation Between Data Monetization Term Usage and Readability of Privacy Policies

Kryzsko Great River Ballroom, Winona, Minnesota; United States

Websites and online services have become essential to daily life, and with that, guardrails in the form of privacy policies are as important to maintain user consent. However, privacy policies are one-sided. While these policies are meticulously crafted by corporations, the resulting documents are often dense and legally complex for the average user. This study utilized natural language processing (NLP) to investigate the relationship between readability and the presence of terms related to sensitive data and monetization. The dataset used for the analysis consisted of over 100 privacy policies across many sectors. A data disclosure score was developed using the SpaCy library to detect disclosures of data sharing, tracking, and sensitive information using common keywords while also considering negation. Readability was scored using the Textstat library, with the Flesch Reading Ease (FRE) metric being used primarily. Statistical analysis revealed a significant negative correlation between readability and the data disclosure score, showing that policies with possibly more invasive data practices are more difficult to read. A paired t-test also showed that sentences containing data disclosure keywords are significantly harder to read than the rest of the text. Our results showed that less readable text occurs around sensitive data and monetization disclosures, which possibly conceals important privacy disclosures from the average user.