Dark Data

Dimensions of dark data. Source: Imaginea Technologies 2019, slide 4.
Dimensions of dark data. Source: Imaginea Technologies 2019, slide 4.

Organizations typically collect, process and store lots of information in the normal course of their operations. A lot of this is done simply for compliance purposes. However, the same data could be useful to plan product roadmaps, aid business decisions or optimize operations. This is possible today due to modern techniques of machine learning and data analytics.

Dark data is simply data that's available to organizations but not being used. The term "dark" does not refer to something evil or illegal. Nor is it specifically about security or privacy. Rather, it's about data that's hidden from view, easy to ignore, and hard to access or analyse.

In physics, we know that dark matter comprises most of the universe. Similarly, dark data is lot bigger than data that organizations typically analyse to aid decision making.

Discussion

  • Could you explain dark data with some examples?
    Basics of dark data. Source: Accenture YouTube 2019.

    Consider a banking application for credit card approval. The focus is on customer information and eligibility. However, how the user arrived at the online form could be useful information. This data is available but is not used.

    In medical domain, paper records are digitized. These are used by doctors. Since they're are stored as scanned images, information retrieval or analysis systems ignore them.

    Another example is when a business acquires users or gathers feedback via different channels. Call centre logs are captured as audio files whereas website visits are captured as web server logs. This is another source of dark data when new insights are inaccessible just because data sources exist independently and can't be easily combined.

    Consider an Accounts Payable department. Dark data could include overviews in the ERP system, previous transactions, account history, CRM information, and so on. This data is not just unused but also presents a risk due to its sensitive and confidential nature.

  • What are the different types or sources of dark data?
    Some types of dark data. Source: Javanainen 2016.
    Some types of dark data. Source: Javanainen 2016.

    Dark data is different in each industry. Some categories of dark data include customer information, ex-employee information, log files, survey data, financial statements, notes, presentations, emails, email attachments, inactive databases, old versions of documents, call-centre transcripts, customer reviews, etc.

    Dark data has come about because organizations realize that data is valuable and storage is cheap. So they end up storing lots of it without using them properly. Data lakes and their associated technologies help store large volumes of multiformat data.

    An important reason for dark data is big data. Big data's traits (volume, variety and velocity) make it difficult for organizations to process it. So they store it and defer the processing until later. From this perspective, the term "dusty data" has been used.

    Where credibility of data is essential, data that can't be traced to its source won't be used for analysis.

  • How does dark data differ from unstructured data?
    Dark data is not just unstructured data. Source: Systems Innovation 2018, 3:00.
    Dark data is not just unstructured data. Source: Systems Innovation 2018, 3:00.

    Often dark data is unstructured because unstructured data is harder to analyse. Unstructured data "becomes dark" when organizations don't have the know-how to analyse them and obtain insights.

    However, structured data can also be part of dark data. For example, two structured datasets might exist independently in their own silos. If they were to be combined, new insights could be obtained. Combining two datasets might not be trivial, particularly when they exist in different formats, stored in different systems or controlled by different teams.

    In general, dark data can be structured or unstructured but they share some common characteristics. Data might be redundant, obsolete or trivial. This view expands the definition of what's dark data. It's not just valuable data that's not used but includes useless data that's simply taking up storage space.

  • How can I make use of or analyse dark data?

    Those who manage dark data should regularly do audits and attempt to structure the data. You may decide not to dump dark data but it should be encrypted and stored in a secure manner. For unstructured data, at least attach metadata labels that so they can be easier to find for future analysis. Classify or organize data in a single repository so that users can find them faster.

    Before dark data can throw up insights, you have to ask the right questions, questions that are relevant to your business. The question could be about 'What' or 'Why'. Each type of question might need a different approach.

    Tools to work with dark data include DeepDive, Snorkel, and Dark Vision. Docugami is a company that's looking at ways to enable computers to understand documents written by and for humans. They call the problem "document dysfunction".

    Any effort to use dark data manually will be almost impossible. Automation is the key. Robotic Process Automation (RPA) can be enhanced with cognitive abilities to understand dark data.

Milestones

Jul
2012

A researcher at Gartner mentions the term Dark Data in a blog post. This may well be the first mention of the term. He notes that dark data is stored "just in case it might come in handy" at a later point. To get any value out of dark data, it's important that business users ask the right questions. In subsequent literature, Gartner is credited with coining the term.

Mar
2013

Gartner Research publishes a report titled Innovation Insight: File Analysis Innovation Delivers an Understanding of Unstructured Dark Data. Two months later, Gartner publishes a separate article on Dark Data in its online glossary of information technology terms.

2015

IBM reports that 90% of IoT sensor data is not used. Moreover, 60% of this data loses its value within milliseconds. Technologies that allow us to process this data in almost real time is therefore critical. If not, we end up with dark data.

2016

A study by Veritas involving 22 countries shows that 52% of all data collected by organizations is dark. They say it's data "beneath the line of sight of senior management".

Jan
2017

Chan Zuckerberg Initiative (CZI) acquires Meta, which is a search engine focused on scientific literature. This is an effort to unlock the value in dark data.

May
2017

Apple acquires Lattice Data that specializes in dark data. Lattice Data commercialized DeepDive, a tool developed at Stanford University. DeepDive uses machine learning to convert dark data into structured data that can be combined within existing structured data sources. DeepDive extracts relationships and makes inferences. It's development can be traced back to 2014.

2018

Some folks extend the definition of dark data to include data that's hidden behind firewalls or not accessible by search engines. Thus, dark data includes data from the deep web as well.

References

  1. Accenture YouTube. 2019. "What is Dark Data?" AI 101, Accenture, on YouTube, February 19. Accessed 2019-12-12.
  2. Bisson, Simon. 2019. "Using machine learning to solve your dark data nightmare." ZDNet, May 31. Accessed 2019-12-12.
  3. DataTree. 2019. "What Is Dark Data?" DataTree, First American, May 02. Accessed 2019-12-12.
  4. Datumize. 2019. "The Evolution of Dark Data and how you can harness it to make your business Smarter." Blog, Datumize. Accessed 2019-12-12.
  5. Dayley, Alan. 2013. "Innovation Insight: File Analysis Innovation Delivers an Understanding of Unstructured Dark Data." Gartner Research, March 28. Accessed 2019-12-12.
  6. DeepDive. 2017. "DeepDive." v0.8.0, Stanford University. Accessed 2019-12-12.
  7. Gartner. 2013. "Dark Data." Glossary, Gartner, May 07. Accessed 2019-12-12.
  8. Imaginea Technologies. 2019. "Intelligent Process Automation In The Era of Dark Data." Imaginea Technologies, on SlideShare, September 13. Accessed 2019-12-12.
  9. Javanainen, Mika. 2016. "Sink or Swim: Managing the Growing Flood of Dark Data with EIM." Blog, M-Files, September 13. Accessed 2019-12-12.
  10. Johnson, Heather. 2015. "Digging up dark data: What puts IBM at the forefront of insight economy." SiliconANGLE, October 30. Accessed 2019-12-12.
  11. Lunden, Ingrid. 2017. "Apple acquires AI company Lattice Data, a specialist in unstructured ‘dark data’, for $200M." TechCrunch, May 13. Accessed 2019-12-12.
  12. Mackey, Stephen. 2016. "The Rise of Dark Data and What It Means To Accounts Payable." Blog, Kefron, July 26. Updated 2018-03-26. Accessed 2019-12-12.
  13. Marsh, Samantha. 2019. "Dark Data – The Blind Spots in Your Analytics." Blog, iDashboards, January 30. Accessed 2019-12-12.
  14. Meehan, Mary. 2019. "What Your Data Isn't Telling You: Dark Data Presents Problems And Opportunities For Big Businesses." Forbes, June 04. Accessed 2019-12-12.
  15. Pal, Kaushik. 2015. "What is the importance of Dark Data in Big Data world?" KDnuggets, November 20. Accessed 2019-12-12.
  16. Rao, Vinay. 2018. "Extracting dark data." IBM Developer, March 08. Accessed 2019-12-12.
  17. Systems Innovation. 2018. "Dark Data Analytics." Systems Innovation, on YouTube, February 07. Accessed 2019-12-12.
  18. Taulli, Tom. 2019. "What You Need To Know About Dark Data." Forbes, October 27. Accessed 2019-12-12.
  19. Tully, Tim. 2019. "Dark Data Has Huge Potential, But Not If We Keep Ignoring It." Blog, Splunk, April 30. Accessed 2019-12-12.
  20. Wassén, Olivia. 2019. "What is dark data? And how is it costing you?" NodeGraph, April 18. Accessed 2019-12-12.
  21. White, Andrew. 2012. "Dark Data is like that furniture you have in that Dark Cupboard." Blog, Gartner, July 11. Accessed 2019-12-12.

Further Reading

  1. Kambies, Tracie, Paul Roma, Nitin Mittal, and Sandeep Kumar Sharma. 2017. "Dark analytics: Illuminating opportunities hidden within unstructured data." Deloitte Insights. Accessed 2019-12-12.
  2. Rao, Vinay. 2018. "Extracting dark data." IBM Developer, March 08. Accessed 2019-12-12.
  3. Systems Innovation. 2018. "Dark Data Analytics." Systems Innovation, on YouTube, February 07. Accessed 2019-12-12.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
2
0
1161
2
0
10
1116
Words
1
Likes
6642
Hits

Cite As

Devopedia. 2020. "Dark Data." Version 4, January 6. Accessed 2023-11-12. https://devopedia.org/dark-data
Contributed by
2 authors


Last updated on
2020-01-06 09:28:50