How can we overcome the limitations of current natural language privacy policies without imposing new requirements on website operators?

Natural language privacy policies have become a de facto standard to address expectations of “notice and choice” on the Web. Yet, there is ample evidence that users generally do not read these policies and that those who occasionally do struggle to understand what they read. Initiatives aimed at addressing this problem through the development of machine implementable standards or other solutions that require website operators to adhere to more stringent requirements have run into obstacles, with many website operators showing reluctance to commit to anything more than what they currently do.

This NSF Frontier project builds on recent advances in natural language processing, privacy preference modeling, crowdsourcing, formal methods, and privacy interfaces to overcome this situation. It combines fundamental research with the development of scalable technologies to:

  1. Semi-automatically extract key privacy policy features from natural language website privacy policies, and
  2. Present these features to users in an easy-to-digest format that enables them to make more informed privacy decisions as they interact with different websites.

As such, this project offers the prospect of overcoming the limitations of current natural language privacy policies without imposing new requirements on website operators. Work in this project will also involve the systematic collection and analysis of website privacy policies, looking for trends and deficiencies both in the wording and content of these policies across different sectors and using this analysis to inform ongoing public policy debates. A transition phase will enable the transfer of these technologies to industry for large-scale deployment and to regulators and policy makers interested in tracking practices.

We are proud to be working with the following faculty and researchers from across Carnegie Mellon and the world:
Alessando Acquisti (CMU Heinz College)
Ed Hovy (CMU LTI)
Joel Reidenberg (Fordham)
Florian Schaub (U.Michigan)
Barbara van Schewick (Standford)
Noah Smith (Washington)
Shomir Wilson (U.Cincinnati)

Learn More About This Project

Project Publications

Abhijith Athreya Mysore Gopinath, Shomir Wilson, and Norman Sadeh, "Supervised and Unsupervised Methods for Robust Separation of Section Titles and Prose Text in Web Documents", Conference on Empirical Methods in Natural Language Processing (EMNLP), Brussels, Belgium, Nov 2018 [pdf] [website]

Shomir Wilson, Florian Schaub, Frederick Liu, Kanthashree Mysore Sathyendra, Daniel Smullen, Sebastian Zimmeck, Rohan Ramanath, Peter Story, Fei Liu, Norman Sadeh, Noah A. Smith, "Analyzing Privacy Policies at Scale: From Crowdsourcing to Automated Annotations", To appear in the ACM Transactions on the Web, Oct 2018 [pdf]

Hamza Harkous, Kassem Fawaz, Rémi Lebret, Florian Schaub, Kang G. Shin, Karl Aberer, "Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning", USENIX Security Symposium 2018, Aug 2018

Peter Story, Sebastian Zimmeck, Norman Sadeh, "Which Apps have Privacy Policies?", Annual Privacy Forum, Jun 2018 [pdf]

Frederick Liu, Shomir Wilson, Peter Story, Sebastian Zimmeck and Norman Sadeh, "Towards Automatic Classification of Privacy Policy Text", Carnegie Mellon University Technical Report CMU‐ISR‐17‐118R and CMU‐LTI‐17‐010, Institute for Software Research and Language Technologies Institute, School of Computer Science, Jun 2018 [pdf]

H. Habib, Y. Zou, C. Swoopes, A. Jannu, L.F. Cranor, F. Schaub, "An Empirical Analysis of Online Consent and Opt-Out Experience", PLSC ’18: Privacy Law Scholars Conference, May 2018

H. Habib, Y. Zou, A. Jannu, C. Swoopes, A. Acquisti, L.F. Cranor, N. Sadeh, F. Schaub, "An Empirical Analysis of Website Data Deletion and Opt-Out Choices", CHI 2018 Workshop on General Data Protection Regulation: An Opportunity for the HCI Community?, Apr 2018 [pdf]

Peter Story, Sebastian Zimmeck, Norman Sadeh, "Which Apps have Privacy Policies?", Carnegie Mellon University Technical Report CMU-ISR-18-100R, Institute for Software Research, School of Computer Science, Feb 2018 [pdf]

Peter Story, Sebastian Zimmeck, Norman Sadeh, "Which Apps have Privacy Policies?", FTC PrivacyCon, Feb 2018 Poster [pdf]

More project publications available here.