Data Loss Prevention

Collection, management and use of data has become important to many organizations. In some cases, data is central to their products and services. Some of this data could be sensitive information about partners or customers. In other cases, data-driven decisions give organizations a competitive edge. Losing this data can therefore impact business operations and even threaten survival.

Data loss prevention (DLP) is about being proactive in protecting data. It's about implementing processes and best practices, about adopting the right tools and technologies to prevent data loss. A well-defined organization-wide DLP approach is likely to work better than ad hoc approaches within individual departments.

DLP generally involves protecting data in use, data in motion and data at rest. Protection can be categorized into endpoint, network or storage DLP. Given different types of data and many touchpoints, an effective strategy is to combine different DLP techniques.

Discussion

  • What types of data matter for DLP?

    Important data that need protection include intellectual property (source code, design documents, etc.), corporate data (legal, financial, employee data, etc.), and customer data (name, address, credit card numbers, etc.). Data can either be structured (well organized and easy to process) or unstructured (more qualitative and harder to process).

    Broadly, data comes in three states, all of which need to be protected:

    • Data in Use: Data while it's being used by an application or an endpoint.
    • Data in Motion: Data while it's being transferred over a network.
    • Data at Rest: Data while it's stored on a file server within an enterprise network, in a storage service in the cloud or in a portable storage device.
  • What are the common causes of data losses?
    Various touchpoints of data loss. Source: Imperva 2020.
    Various touchpoints of data loss. Source: Imperva 2020.

    Accidental deletion of files or even parts of a file are data losses, particularly when backups are not available. File system updates are common everyday operations. Hence, accidental deletion is a real risk. Such a loss can happen due to wrong user operation, bugs in automation scripts or system misconfiguration.

    Data is commonly stored on hard drives. Data loss from hard drive failures is a real possibility. Failures could be due to mechanical fault, overheating, mishandling, liquid damage, etc. Physical damage to equipment can also occur to due fires, explosions, or natural disasters.

    A sudden system shutdown due to a power outage can damage hard drives. Data could be unrecoverable even after power is restored. In general, if systems are shutdown improperly or power supply is unstable, there's a chance of data loss.

    Viruses or malware can result in attackers stealing, deleting or corrupting data. Entry point for these are usually third-party application links, external hard drives, pen drives, and memory cards. Another external threat is physically stealing a device that contains data.

  • What are the types of data loss prevention?
    Types of data loss prevention. Source: Tucakov 2019.
    Types of data loss prevention. Source: Tucakov 2019.

    Data loss prevention solutions are primarily classified based on type and source of data:

    • Endpoint DLP: Organizations have a variety of endpoints, from portable devices (pen drives, cell phones, hard drives) to fixed devices (servers, workstations, printers). Endpoint DLP solutions are installed on such devices. For example, endpoint DLP can log or prevent file copying or sharing.
    • Network DLP: Applicable to all networked devices and to all network-accessible data. For example, when a user attempts to email sensitive information on a corporate network, network DLP could encrypt or block the data. Network DLP is common in enterprises that handle high volumes of data.
    • Cloud DLP: May be considered a subset of Network DLP. Applicable to enterprises that use cloud services.
    • Storage DLP: The focus is on the access and sharing of data while it's in storage. Alerts are triggered when data is not stored securely. Storage DLP is common for data warehouses and data lakes.
  • What are some specific DLP techniques?
    Introduction to Microsoft Endpoint DLP. Source: Microsoft Mechanics 2020.

    Broadly, techniques cover prevention, detection, response and analysis. A comprehensive DLP solution should address all aspects of data security: storage and transmission; encryption and key management; authentication and authorization; real-time monitoring of both network traffic and endpoint activity; threat detection and alerting; data backup and recovery plan; up-to-date policies; regulatory compliance; on-premise or cloud security.

    Even when storage systems are secured against intrusion, logical protection controls must be in place. This includes encrypting, masking, and anonymizing data before it goes into storage. This gives additional protection against exposure or misuse should data be stolen.

    Encryption can be applied for both data at rest and in motion. Cryptographic hashing can be used to anonymize personally identifiable information data. Data fingerprinting can be used to identify and track data. Content can be analyzed via rules-based expressions, database fingerprinting, partial document matching, full document matching (using file hash), domain identification, and statistical analysis.

    For better user awareness, we could add visible metadata to headers or footers, or watermarks. These are visible reminders to users not to share sensitive data.

  • What do you mean by security policies and rules in DLP?

    Policies determine how sensitive is a piece of data and what operations are allowed on that data. Policies may define who can access it, if it can copied internally or externally, if it should be encrypted, if it can be attached in an email, who to notify for unauthorized access, and so on. Rules are mechanisms that implement policies.

    Data classification is critical to enforcing these policies. Data categories or classes may include Top Secret, Confidential and Public; or alternatively, Public, Private or Restricted. Or they could be aligned to standards such as GDPR or HIPAA.

    Creating effective policies involves a few steps. Start by asking what types of data are available, where they're stored, who can access them and who's accessing them. Start with classifying different types of data and how important is each one to the business. Assess risk factors and vulnerabilities. Data that's not relevant to the business can be deleted, thus optimizing DLP implementation. Document the policies in detail. This helps implementation and compliance. Enforce the policies across the organization. Collect metrics to know how well DLP is working. Automate workflows.

  • What metrics can help in assessing maturity of a DLP solution?

    Data classification is typically done by a trained software model that comes with a certain performance. Sometimes the model may fail to identify sensitive data. This is the problem of true negatives. This is a critical failure since a lenient policy would be applied and the sensitive data could get stolen or lost. The other problem is one of false positives whereby non-sensitive data is classified as sensitive and a strict security policy is unnecessarily applied to it. This reduces usability and productivity.

    Policies can have exceptions. For example, a user may be given temporary access to a document. Many exceptions can mean that the policy probably needs to be refined.

    A high number of unmanaged devices is not ideal for an enterprise. This metric should be kept low. Laptops, smartphones, and removable storage devices are endpoints. If these are outside the control of the IT department, enforcing DLP policies will not be possible.

    If an issue is detected, we wish to fix it quickly. Alert response time is a metric to motivate workflow optimizations.

  • What are best practices for implementing a strong DLP?
    Process flow for DLP in operation. Source:
    Process flow for DLP in operation. Source:

    Implementing DLP is not a one-time effort. "Setting and forgetting" is a recipe for failure. Review and adapt policies, approaches, tools and technologies as the business evolves. When new to DLP, implement in phases based on priorities. Define the primary objective for implementing DLP. Align DLP with existing security architecture.

    Apply DLP at every stage of a data's lifecycle, be it collection, storage, analysis, update, migration, deletion or archival. Create easy-to-follow security rules. Rules should not be too restrictive that they affect productivity. While relying on tools, don't forget to train people.

    An organization must have a centralized DLP program, which also makes it easier to update DLP strategy and policies. Avoid ad hoc DLP approaches at different departments.

    DLP solutions should comply with industry regulations such as GDPR, HIPAA (healthcare information), SOX, and PCI DSS (credit card data).

  • What's the approach to data classification in DLP?
    Enhancing DLP with data classification. Source: Tierney 2020.
    Enhancing DLP with data classification. Source: Tierney 2020.

    Effective DLP can happen only when data is correctly classified, be it structured or unstructured data. Define and execute policies as relevant to each data type. Adopt or create a classification system that can be easily customized. Preconfigured classification methods can be a starting point.

    Automatic classification will be more consistent and reliable than asking users to classify their data. There are many approaches to data classification. It can be as simple as applying regular expressions or heuristics. For example, a 16-digit string can imply credit card information.

    A more sophisticated approach is to train and deploy machine learning models. Models could be trained to recognize sensitive information, inside information, public information, personally identifiable information (PII), financial data, regulated data, intellectual property, and more.

    One way to aid data classification is to insert metadata at the point of creating or saving the data. Metadata could include file format, author, date of creation/update, confidentiality level, etc. Metadata also speeds up search and classification.

  • Could you describe some DLP products?

    Symantec, McAfee, Web-sense, SecureTrust, Check Point, Digital Guardian, SolarWinds, CoSoSys, CrowdStrike, Trustifi and ManageEngine offer DLP products and solutions.

    Lepide Data Security Platform and McAfee DLP are example solutions that can analyze metadata and data to discover and classify data. Netwrix has a data classification product that includes features such as reusable index for better performance, flexible taxonomy manager that enables companies easily self-manage their data, transparency that informs users why a document was classified in a particular way, and remediation workflows to redact content, quarantine files, or revoke permissions.

    Examples of DLP implemented in the cloud are Google's Cloud DLP, Nightfall and Acronis. Cloud DLP can scan/discover/classify data, mask sensitive data, and statistically measure re-identification risks. Nightfall automates classification, monitoring, alerting and redaction. In general, cloud DLP solutions integrate via APIs to various other systems (Slack, Google Drive, Jira, etc.) and hence prevent data loss via these systems.

    Cloudflare One and API Shield are tools that implement DLP for data on Cloudflare.

  • What are the limitations of data loss prevention?

    There are ways to bypass DLP techniques. Malicious actors on the inside can send encrypted files that DLP tools can't scan or classify. DLP tools may be unable to handle large files. Users can also take screenshots and embed them into documents. Data can be obfuscated by simply setting the font to Wingdings.

    DLP rules need to actively monitored and maintained. Data classification is rarely perfect. False positives and false negatives are a problem.

    With cloud-based DLP solutions, sensitive data is processed in the cloud, which is at odds with enterprises not willing to send such data outside their networks.

Milestones

1980

In the early 1980s, the first computer viruses are created. Created as curiosities, these later are evolved for malicious purposes. At a keynote speech for the Chaos Computer Club, German computer engineer Berger encourages others to explore such viruses.

1986

A virus called Brain is created by two Pakistani brothers. It infects the boot sector of storage media formatted with the DOS File Allocation Table (FAT) file system. It's creators claim that the virus is intended to be mostly harmless, simply displaying a copyright message about the pirated software and changing the name of the floppy drive. However, many people report that the virus has wiped their data or made their drives slow and unresponsive.

2014

Hackers steal credit card data of more than 50 million customers of Home Depot. Home Depot incurs a loss of $260 million. This is just one example to highlight the high cost of data loss and the need for DLP. On the whole, very high-profile data breaches are discovered all through the 2010s: Target (2013), Yahoo (2013), U.S. Voter database (2015), AdultFriendFinder (2016), etc.

2015

During this decade, as more of enterprise data starts moving to the cloud, so does the need and adoption of cloud-based DLP solutions. Performance requirements also demand that DLP be done in the cloud. DLP that initially started with compliance focus transitions during the 2010s towards threat detection, privacy and protected of personal data. It's been said that,

The best uses for DLP today live under a triple umbrella of security, privacy, and compliance.
2016

The U.S. government releases the Data Security Policy Principles Framework for the Precision Medicine Initiative (PMI) that was launched by President Obama in early 2015. This sets guidelines and expectations to protect sensitive patient data.

2020

The National Institute of Standards and Technology (NIST) releases version 1.0 of the NIST Privacy Framework: A Tool for Improving Privacy through Enterprise Risk Management. It offers enterprises guidelines towards protecting personal data. The document also links the Privacy Framework to NIST's Cybersecurity Framework.

References

  1. Abdullahi, Aminu. 2021. "Data Loss Prevention (DLP) Best Practices & Strategies." Enterprise Networking Planet, October 22. Accessed 2021-01-07.
  2. Alder, Steve. 2016. "HHS Announces Release of the Final Data Security Policy Principles Framework." HIPAA Journal, May 27. Accessed 2022-01-07.
  3. Bushkovskyi, Oleksandr. 2019. "What are Data Loss Prevention (DLP) Best Practices?" Blog, The App Solutions, August 8. Updated 2020-01-23. Accessed 2021-01-07.
  4. Cooper, Stephen. 2019. "14 Best Data Loss Prevention Software Tools." Comparitech, April 18. Updated 2022-03-21. Accessed 2022-03-28.
  5. CrowdStrike. 2020. "What is Data Loss Prevention (DLP)?" CrowdStrike, October 20. Updated 2022-02-14. Accessed 2021-01-07.
  6. Ellis, Scott and Anton Chuvakin. 2020. "Not just compliance: reimagining DLP for today’s cloud-centric world." Blog, Google Cloud, July 1. Accessed 2022-03-28.
  7. Foote, Keith D. 2020. "A Brief History of Data Security." Dataversity Digital LLC, December 29. Accessed 2021-01-07.
  8. Google Cloud. 2022. "Cloud Data Loss Prevention." Google Cloud. Accessed 2022-03-28.
  9. HANDD Business Solutions. 2022. "Enhance your Data Loss Prevention." Data Classification, HANDD Business Solutions. Accessed 2022-03-28.
  10. Heaslip, Emily. 2021. "6 Cloud Data Loss Prevention Best Practices & Strategies." Blog, Nightfall, November 22. Accessed 2021-01-07.
  11. Imperva. 2020. "Data Loss Prevention." Imperva, June 17. Accessed 2021-01-07.
  12. Jain, Saarthak. 2021. "What Are The Latest Data Loss Prevention Techniques?" Blog, Coding Ninjas, January 8. Accessed 2021-01-07.
  13. Kathuria, Sonal. 2021. "Data Loss Prevention – Meaning, Types, Working, and Benefits." Blog, Security Pilgrim, October 18. Accessed 2021-01-07.
  14. Lord, Nate. 2021. "An Expert Guide to Securing Sensitive Data: 34 Experts Reveal the Biggest Mistakes Companies Make with Data Security." Blog, Digital Guardian, February 17. Accessed 2022-03-28.
  15. McAfee Enterprise. 2018. "Data Loss Prevention (DLP) Best Practices." McAfee Enterprise, November 28. Accessed 2021-01-07.
  16. McKinsey. 2022. "The data-driven enterprise of 2025." McKinsey & Company, January 28. Accessed 2022-03-28.
  17. Microsoft Mechanics. 2020. "Endpoint Data Loss Prevention (DLP) | What it is and how to set it up in Microsoft 365." Microsoft Mechanics, on YouTube, November 10. Accessed 2021-01-07.
  18. Murphy, Danny. 2020. "The Role of Data Classification in Data Loss Prevention (DLP)." Blog, Lepide, March 11. Updated 2020-06-16. Accessed 2021-01-07.
  19. NIST. 2020. "NIST Releases Version 1.0 of Privacy Framework." News, NIST, January 16. Accessed 2021-01-07.
  20. Nailwal, Mukesh. 2018. "5 Most Common Causes of Data Loss & Prevention of Data Loss." Blog, DRS Softech, October 23. Updated 2019-04-15. Accessed 2021-01-07.
  21. Radeska, Tijana. 2016. "Brain -The first computer virus was created by two brothers from Pakistan. They just wanted to prevent their customers of making illegal software copies." The Vintage News, September 8. Accessed 2021-01-07.
  22. Raza, Muhammad. 2020. "Data Loss Prevention & DLP Solutions Explained." Blog, BMC, November 20. Accessed 2021-01-07.
  23. SafeBACKUP Consult Ltd. 2022. "7 Greatest Causes of Data Loss." Data Backup and Online Storage, SafeBACKUP Consult Ltd. Accessed 2021-01-07.
  24. Security Ninja. 2014. "Data loss prevention (DLP) strategy guide." Infosec Institute, Inc., July 9. Accessed 2021-01-07.
  25. Shapland, Rob. 2014. "The risks of cloud data loss prevention." TechTarget, December 10. Accessed 2021-01-07.
  26. Tierney, Mike. 2020. "The Importance of Data Classification for Data Loss Prevention." Blog, Netwrix, November 5. Updated 2022-01-13. Accessed 2021-01-07.
  27. Tucakov, Dejan. 2019. "Data Loss Prevention Best Practices: Ultimate Guide to DLP." Blog, phoenixNAP, May 21. Accessed 2022-03-28.
  28. Yalavarthy, Misha. 2021. "Using Cloudflare for Data Loss Prevention." Blog, CloudFlare, March 24. Accessed 2021-01-07.

Further Reading

  1. Bushkovskyi, Oleksandr. 2019. "What are Data Loss Prevention (DLP) Best Practices?" Blog, The App Solutions, August 8. Updated 2020-01-23. Accessed 2021-01-07.
  2. LammTech. 2021. "What is Data Loss Prevention - All you need to know." Blog, Lammtech, January 1. Accessed 2021-01-07.
  3. Raza, Muhammad. 2020. "Data Loss Prevention & DLP Solutions Explained." Blog, BMC, November 20. Accessed 2021-01-07.
  4. Kay, Sabrina. 2019. "Cloud App Security: Masking with File Policy." Blog, Sabrinaksy, April 22. Accessed 2021-01-07.
  5. Druva. 2021. "A simple guide to data loss prevention." Blog, Druva, November 3. Accessed 2021-01-07.
  6. Yalavarthy, Misha. 2021. "Using Cloudflare for Data Loss Prevention." Blog, CloudFlare, March 24. Accessed 2021-01-07.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
3
13
2378
23
7
1780
2145
Words
1
Likes
3386
Hits

Cite As

Devopedia. 2022. "Data Loss Prevention." Version 26, March 28. Accessed 2023-11-13. https://devopedia.org/data-loss-prevention
Contributed by
2 authors


Last updated on
2022-03-28 17:31:26

Improve this article

Article Warnings

  • Readability score of this article is below 50 (48.8). Use shorter sentences. Use simpler words.