Data Mining and Breach Notification in Cyber Incident Responses

By Ashish Prasad, Michael Sarlo, Anya Korolyov, and Giel Stein [1]

Data breaches continue to be a major problem for corporations and organizations in 2025, and the U.S. Department of Justice, Criminal Division, has recently brought enforcement actions for hacking, ransomware, and other cybercrimes.  See U.S. Dept. of Justice, Former U.S. Soldier Pleads Guilty to Hacking and Extortion Scheme Involving Telecommunications Companies, July 13, 2025, https://www.justice.gov/opa/pr/former-us-soldier-pleads-guilty-hacking-and-extortion-scheme-involving-telecommunications

When a breach occurs, federal and state breach notification laws may require notification of details of the breach and related information not just to regulatory authorities, but also to individuals whose personally identifiable information (“PII”) and/or personal health information (“PHI) has been accessed or acquired in a way that compromised the security, confidentiality or integrity of the PII.  See The Sedona Conference Incident Response Guide, 21 Sedona Conf. J. 125, 170-74 (2020).  See also Foley and Lardner LLP, State Data Breach Notification Laws, 2025, https://www.foley.com/wp-content/uploads/2024/04/23.45248-Data-Breach-Chart-4.10.24.pdf. Notification might not be required if there is no reasonable likelihood of harm, PII/PHI was encrypted, the breach resulted from good-faith access or acquisition by an employee or agent of the organization, or in other special situations.  Id. at 174-84.  Failure to comply with the letter of breach notification statutes as to the recipients, timing, method, and content of notifications can result in fines and consumer lawsuits.  Id. at 184-233.

Data mining, in which the data that has been breached is analyzed for the purpose of providing notice to affected individuals and organizations, has become one of the most complex activities and a large expense of a cybersecurity incident.   Yet, the cybersecurity industry has not promulgated formal standards for repeatable and defensible methods and workflows that breach counsel and data mining providers should follow for data mining. While there are general best practices and industry guidance (e.g., from the National Institute of Standards and  Technology), these do not directly prescribe standardized workflows for data mining in breach notification contexts. Consequently, different vendors and breach counsel may take different approaches, creating inconsistency.  This has resulted in costs, burdens, and risks for organizations that must engage in data mining for breach notification, as well as for their counsel, insurers, and other parties affected by breaches.

A data mining workflow in cybersecurity incident response should include the following five stages and be repeatable, reproducible, and subject to audit trails.

  • Data identification.  This includes forensic techniques for potentially exfiltrated sources, preservation, and processing using generally accepted industry standard tools. In this stage, breach counsel and the data mining provider build a data map using the PII/PHI that has been identified to show the organization where its sensitive data may be located.  The data map is used to enrich data that is extracted from the compromised dataset and helps the organization organize and secure its data going forward to reduce the possibility of future breaches.
  • Data filtering and de-duplication. This includes analyzing family members and items, threading email chains to identify the most inclusive emails in chains, textual near duplicate analysis, domain and email analysis for the identification of mass and junk emails, and file name and file type assessments.  Definition and targeting of PII/PHI occurs at this stage, including search term calibration and sampling, as well as the development of a review and extraction protocol.  Machine learning tools are utilized to identify and report on the volume and characteristics of PII/PHI, which goes beyond simply searching for names and allows the computer to construct a model or profile of individual entities.  This helps breach counsel and the data mining provider to combine names (confirming that a person’s title and a person’s name both refer to that person) and differentiate people (confirming that two references to a name refer to two different individuals).
  • Data review and extraction.  This involves the training and calibration of the review team to ensure consistency and accuracy in identifying sensitive information. The team conducts a detailed review of the compromised data to extract Data Subjects and any associated PII or PHI. Emerging artificial intelligence and machine learning technologies are increasingly being integrated into this phase to assist in identifying patterns, accelerating document review, and improving extraction accuracy.
  • Quality control.  This includes vetting of the accuracy of AI or search term workflows prior, during, and after human review, using well-established statistical sampling methods such as recall and precision of search terms.
  • Data normalization and deduplication.  This includes standardizing names and other extracted data elements, followed by a deduplication process to ensure that each affected individual is represented only once. The result is a concise and accurate list, enabling a single notification letter to be mailed to each person.

A sound data mining workflow is crucial for preventing the dangers of over-notice and under-notice. Under-notice must be avoided because it could lead to the organization failing to meet its disclosure obligations and expectations from suppliers and customers, which can lead to time-consuming and costly questions and responses about the incident, in addition to fines and consumer litigation. Over-notice should also be avoided, as it increases reputational harm to the organization and attracts unnecessary attention from litigation attorneys seeking to pursue civil claims. Additionally, over-notice may create confusion and alarm among unaffected individuals, damaging trust and customer relationships. A defensible, well-documented data mining process helps ensure that notifications are accurate, targeted, and compliant with regulatory expectations. See HaystackID, [Webcast Transcript] Data Mining in Incident Response:  Managing Risk and Spend through an Effective Evidence-Based Approach, Sep. 8, 2022, https://haystackid.com/webcast-transcript-data-mining-in-incident-response-managing-risk-and-spend-through-an-effective-evidence-based-approach/

Breach counsel and data mining providers that follow the best practices for data mining described above will be able to satisfy federal and state laws for breach notification as well as provide answers to customers, business partners, and others about the breach, thereby minimizing the costs, burden, and risks of the breach for the organization.

About HaystackID®

HaystackID® solves complex data challenges related to legal, compliance, regulatory, and cyber requirements. Core offerings include Global Advisory, Cybersecurity, Core Intelligence AI™, and ReviewRight® Global Managed Review, supported by its unified CoreFlex™ service interface. Recognized globally by industry leaders, including Chambers, Gartner, IDC, and Legaltech News, HaystackID helps corporations and legal practices manage data gravity, where information demands action, and workflow gravity, where critical requirements demand coordinated expertise, delivering innovative solutions with a continual focus on security, privacy, and integrity. Learn more at HaystackID.com.

Assisted by GAI and LLM technologies.

SOURCE: HaystackID

[1] Ashish Prasad is Vice-President and General Counsel at HaystackID, Lecturer at the University of Michigan Law School, and Co-Founder of the Government Investigation and Civil Litigation Institute.  Michael Sarlo is Chief Innovation Officer and President – Global Investigations & Cyber Incident Response Services at HaystackID.  Anya Korolyov is Executive Vice President, Cyber and Legal Data Intelligence Strategy at HaystackID.  Giel Stein is a Member at Clark Hill PLC and a former Special Assistant U.S. Attorney.

Sign up for our Newsletter

Stay up to date with the latest updates from Newslines by HaystackID.

Email
Success! You are now signed up for our newsletter.
There has been some error while submitting the form. Please verify all form fields again.