Open Data

The data spectrum. Source: ODI Standards 2019a.
The data spectrum. Source: ODI Standards 2019a.

The idea of open data is to share data freely with others. Openness also allows others to modify, reuse or redistribute the data. Openness has two facets: legal and technical. Legal openness is about applying a suitable open license to the data. Technical openness is about removing technical barriers and making it easy to access, read, store or process the data.

By opening up data, others can unlock value in the form of information and knowledge. For example, in the travel sector, locations, images, prices and reviews are data that can help us plan a holiday. Information is data within a given context. Knowledge personalizes information and helps us make decisions.

Many organizations worldwide promote open data. Open licenses, datasets and tools are available. Governments are releasing open data that citizens can use.

Discussion

  • Could you describe open data?

    Database is really about structure and organization of data, also known as the database model. This is generally covered by copyright. In the context of open data, we're more concerned with the contents of the database, which we simply call data.

    Data can mean a single item or an entire collection. Particularly for factual databases, a collection can be protected but not individual items. For example, a protected collection may be about the melting point of various substances but no one can be prevented from stating a particular item, such as element E melts at temperature T.

    To understand the meaning of openness, we can refer to the Open Definition that states, "Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)."

    Open data should be accessible at a reasonable cost if not free. It should be available in bulk. There shouldn't be restrictions on who or for what purpose they wish to use the data. Tools to use the data should be freely available and not proprietary. Data shouldn't be locked up behind passwords or firewalls.

  • Where could open data be useful?
    A still from an animation of wind speeds based on open data from the US National Weather Service. Source: Tauberer 2014, fig. 1.
    A still from an animation of wind speeds based on open data from the US National Weather Service. Source: Tauberer 2014, fig. 1.

    Open data can make governments more transparent. Citizens will have confidence that their money is being spent as budgeted or in implementing the right policies. For example, one activist noted that Canadian citizens used open data to save their government $3.2bn in fraudulent charitable donations. In Brazil, DataViva provides millions of interactive visualizations based on open government data.

    New business opportunities are possible. For example, once Transport for London opened their data, developers used it to build apps. Likewise, Thomson Reuters uses open data to provide better services to its customers. OpenStreetMap and Copernicus are examples that enable new GIS applications.

    In research, open data is a part of what is called Open Science. It leads to reproducible research and faster advancements. Open data also enables researchers to revalidate their own findings.

    Open data can be used to protect the environment. Some apps that do this include mWater, Save the Rain and Ecofacts.

  • Could you mention some sources of open data?
    Types of data that could be opened. Source: James 2013.
    Types of data that could be opened. Source: James 2013.

    There are many curated lists on the web for open data. From some of these, we mention a few useful sources of open data by category:

  • Which are some organizations working with or for open data?

    We mention a few organizations:

    • Open Data Institute: Works with companies and governments to build an open, trustworthy data ecosystem, where people can make better decisions using data and manage any harmful impacts.
    • Open Data Commons: Provides a set of legal tools to publish and use open data. They've published open licenses applicable to data.
    • Open Knowledge Foundation: A worldwide network of people passionate about openness, using advocacy, technology and training to unlock information and enable people to work with it to create and share knowledge. It was briefly called Open Knowledge International. The Open Definition is one of their projects.
    • Open Data Charter: A collaboration between governments and organisations working to open up data based on a shared set of principles.
  • Why and what aspects of open data should we standardize?
    Aspects of open data that can be standardized. Source: Dodds 2018.
    Aspects of open data that can be standardized. Source: Dodds 2018.

    Data is more valuable if we can combine two different datasets to obtain new insights. Interoperability is the key. Given diverse systems, tools and data formats, interoperability can be almost impossible.

    Without standards, it becomes more difficult for us to publish, access or share data effectively. Standards also make it easier to repeat processes, compare results and reach a shared understanding. Moreover, we need open standards that are available to public and are defined through collaboration and consensus.

    Open standards should define a common data model. The data pipeline should be streamlined. It should be easy to combine data. It should promote common understanding.

    Open Data Institute's page on standards is a useful resource to learn more. A checklist for selecting a suitable standard looks at the licensing, project needs, maintenance of the standard, and guidance on usage.

  • What sort of licensing should I adopt when opening my data?
    Open data licenses that conform to Open Definition. Source: Open Definition 2019a.
    Open data licenses that conform to Open Definition. Source: Open Definition 2019a.

    Releasing your data without a license creates uncertainty. In some jurisdictions, data lacking explicit permission may be protected by intellectual property rights. It's therefore better to attach a license.

    Aspects that define a license include public domain, attribution, share-alike, non-commercial, database only and no derivatives.

    There are a number of licenses that conform to Open Definition. Creative Commons CC0 is an example. It releases the data into public domain. Anyone can copy, modify and distribute, even for commercial purposes, without asking permission. A similar license is Open Data Commons Public Domain Dedication and Licence (PDDL). Other conformant but less reusable licenses are data licenses from governments (Germany, UK, Canada, Taiwan). UK's Open Government Licence is an example.

    Two other licenses worth looking at come from the Linux Foundation: CDLA–Sharing-1.0 and CDLA-Permissive-1.0, where CDLA refers to Community Data License Agreement.

    Open Data Commons Open Database License (ODC-ODbL) is seen as a "viral license". Any changes you make to the dataset, you're required to release the same under this license.

  • What are some challenges with open data?

    Publishers continue to use customized licenses. This makes it hard to reuse data. It makes licenses incompatible across datasets. Instead, they should use standardized open licenses. Ambivalent or redundant clauses cause confusion. Licenses often are not clear about the data to which they apply. Data is often not linked to legal terms.

    Data is hard to find. Sometimes their locations change. A platform such as CKAN might help.

    Data could be misinterpreted, which could result in a wrong decision. This creates a fear of accountability and prevents producers from opening their datasets.

    Quality of data is another concern. Data should be machine-readable and in raw form. Publishing data in HTML or PDF is not research friendly. For context and interpretation, metadata should be shared.

    Raw data is good but also requires advanced technical skills and domain knowledge. We therefore need to build better data literacy. AI algorithms are being used to analyse data but many are black-box models. They also promote data centralization and control, which are at odds with the open data movement.

    Data infrastructure must of consistent quality. Data can also be biased by gender or against minority groups.

Milestones

1942

The concept of open data starts with Robert King Merton, one of the fathers of the sociology of science. He explains how freely sharing scientific research and results can stimulate growth and innovation.

1994

In the US, the Government Printing Office (GPO) goes online and opens a few government-related documents. This is an early example of open government data at a time when the Internet was becoming popular. Historically and legally, the idea can be traced back to the the Freedom of Information Act of 1966.

2005

The Open Knowledge Foundation creates the Open Definition. This is based on established principles from the open source movement for software. This definition is later translated into more than 30 languages. In November 2015, Open Definition 2.1 is released.

Feb
2006

At a TED talk, Hans Rosling presents compelling visuals of global trends in health and economics. Using publicly available datasets from different sources, he debunks myths about the developing world. He makes a case for governments and organizations to open up their data. Data must be enabled using design tools. His mantra is to animate and liberate data from closed databases. Data must also be searchable.

Dec
2007

Thirty individuals meet at Sebastopol, California to discuss open public data. Many of them come from the culture of free and open source software movement. This event can be seen as a convergence of many past US and European efforts in open data. They identify eight principles: complete, primary, timely, accessible, machine processable, non-discriminatory, non-proprietary, and license-free.

Feb
2009

At a TED talk, Tim Berners-Lee gets people to chant "Raw data, now." He makes reference to Rosling's talk of 2006 and adds that it's important to link data from different sources. He calls this Linked Data. He mentions DBpedia, which takes data from Wikipedia and connects them up.

May
2009

The US government launches Data.gov with 47 datasets. About five years later, it has about 100,000 datasets.

2010

Many governments attempt to open their data to public but there are concerns about privacy. It's only in 2015 that privacy principles become an essential part of discussions on open data. It's become clear that providing raw data is not always possible. Governments must balance potential benefits to public against privacy rights of individuals.

2012

Guillermo Moncecchi of Open Knowledge International writes that beyond transparency, open data is also about building a public data infrastructure. While the focus during 2007-2009 was on data transparency, during 2010-2016 focus shifts to public infrastructure. Consider data about street light locations. Citizens can use this data to solve problems on their own. Metrics change from number of published datasets to APIs and reuse.

Jun
2017
Top five ranking positions of open data from governments. Source: OKFN 2017.

Open Knowledge Foundation publishes the Global Open Data Index (GODI). This shows that only 38% of government data is really open. This is based on Open Definition 2.1. A later update available online shows that only 166 of 1410 datasets (11%) are open. The report gives a number of recommendations to governments and policy makers to make their data more open.

Jan
2018

Open Data Charter replaces their earlier ambitious call of "open by default" with a more practical "publish with purpose". Rather than getting governments to open up as much data as possible quickly, the intent is to get them to take small steps in that direction. Governments can open datasets with a clear view of tangible benefits to citizens.

Feb
2018
Open data powers this visualization of traffic congestion. Source: Tjukanov 2018a.

Using open data exposed by Here.com via an API, Tjukanov uses visualization to show average traffic congestion in many of the world's largest cities. Interesting patterns emerge, plotted within a radius of 50km from the city center. He uses QGIS, an open source GIS platform. This is an example of the value that open data plus open source can unlock.

Jan
2020
Organograms of UK government departments. Source: Cook 2020.

Using open data collated by the Institute for Government from data.gov.uk, Peter Cook produces interesting organograms, which are visualizations of organization structures. He does this for many UK government departments. Clicking on individual data points shows name, job title, salary range and other details. This sort of data has been available (though patchy) since March 2011.

References

  1. Berners-Lee, Tim. 2009. "The next web." TED, February. Accessed 2019-12-14.
  2. Calderon, Ania. 2018. "Publishing with Purpose." Open Data Charter, on Medium, January 29. Accessed 2019-12-14.
  3. Chau, Aries. 2017. "70 Amazing Free Data Sources You Should Know." KDnuggets, December. Accessed 2019-12-14.
  4. Chignard, Simon. 2013. "A brief history of Open Data." Paris Innovation Review, March 29. Accessed 2019-12-14.
  5. Cook, Peter. 2020. "Organograms." Accessed 2020-01-15.
  6. Creative Commons. 2019. "CC0 1.0 Universal (CC0 1.0) Public Domain Dedication." Creative Commons. Accessed 2019-12-14.
  7. Crystal, Madison. 2019. "Building the foundation for future research through Open data, code and protocols." Blog, PLOS, December 11. Accessed 2019-12-14.
  8. data.world. 2019. "Common license types for datasets." data.world Help Center. Accessed 2019-12-14.
  9. Davies, Tim, Stephen B. Walker, Mor Rubinstein, and Fernando Perini, eds. 2019. "The State of Open Data: Histories and Horizons." African Minds, IDRC, May 05. Accessed 2019-12-14.
  10. de Milliano, Sabine. 2017. "Why Open Data is not as Simple as it Seems." GiS Professional, October 11. Accessed 2019-12-14.
  11. Dodds, Leigh. 2018. "Announcing the open standards for data guidebook." Open Data Institute, April 13. Accessed 2019-12-14.
  12. European Commission. 2019. "What is open data?" Module 1, Open Data, European Data Portal, European Commission. Accessed 2019-12-14.
  13. Freeguard, Gavin. 2017. "Hacking organograms: unlocking government data." Blog, Institute for Government, July 5. Accessed 2020-01-15.
  14. Fretwell, Luke. 2014. "A brief history of open data." FCW, June 09. Accessed 2019-12-14.
  15. Hare, Jason. 2017. "Open Data Anniversary: Ten Years after the Sebastopol Meeting." Blog, Opendatasoft, December 07. Accessed 2019-12-14.
  16. James, Laura. 2013. "Defining Open Data." Blog, OKFN, October 03. Accessed 2019-12-14.
  17. Lämmerhirt, Danny. 2017. "The state of open licensing in 2017." Blog, Open Knowledge Foundation, June 08. Accessed 2019-12-14.
  18. Lämmerhirt, Danny, Mor Rubinstein, and Oscar Montiel. 2017. "The State of Open Government Data in 2017." Open Knowledge International, June. Accessed 2019-12-14.
  19. Marr, Bernard. 2018. "Big Data And AI: 30 Amazing (And Free) Public Data Sources For 2018." Forbes, February 26. Accessed 2019-12-14.
  20. ODC. 2009. "ODC Open Database License (ODbL) Summary." Open Data Commons, September 12. Updated 2016-07-06. Accessed 2019-12-14.
  21. ODC. 2019. "Homepage." Open Data Commons. Accessed 2019-12-14.
  22. ODI. 2019. "Homepage." Open Data Institute. Accessed 2019-12-14.
  23. ODI Standards. 2019a. "What are open standards for data?" Open Standards for Data, Open Data Institute. Accessed 2019-12-14.
  24. ODI Standards. 2019b. "A checklist for choosing open standards." Open Standards for Data, Open Data Institute. Accessed 2019-12-14.
  25. OKFN. 2017. "Place overview." Open Knowledge Foundation. Accessed 2019-12-14.
  26. OKFN. 2019. "About." Open Knowledge Foundation. Accessed 2019-12-14.
  27. Open Data Charter. 2019. "Homepage." Accessed 2019-12-14.
  28. Open Data Handbook. 2019. "What is Open Data?" Open Data Handbook. Accessed 2019-12-14.
  29. Open Definition. 2019a. "Conformant Licenses." Open Definition. Accessed 2019-12-14.
  30. Open Definition. 2019b. "Guide to Open Data Licensing." Open Definition. Accessed 2019-12-14.
  31. Open Definition. 2019c. "Homepage." Open Definition. Accessed 2019-12-14.
  32. Pickell, Devin. 2019. "50 Best Open Data Sources Ready to be Used Right Now." G2, March 15. Accessed 2019-12-14.
  33. Pollock, Rufus. 2015. "Announcement – Open Definition 2.1." Blog, OKFN, November 10. Accessed 2019-12-14.
  34. Resource.org. 2007. "Open Government Data Principles." Resource.org, December 7-8. Accessed 2019-12-14.
  35. Rosling, Hans. 2006. "The best stats you've ever seen." TED, February. Accessed 2019-12-14.
  36. Tauberer, Joshua. 2014. "History of the Movement." In: Open Government Data: The Book, Second Edition, August. Accessed 2019-12-14.
  37. Tjukanov, Topi. 2018a. "Urban forms." Accessed 2019-12-14.
  38. Tjukanov, Topi. 2018b. "How much area you can cover by car in one hour? A comparison between European capitals." Twitter, February 04. Accessed 2019-12-14.
  39. Wikipedia. 2019. "Open Knowledge Foundation." Wikipedia, September 23. Accessed 2019-12-14.
  40. World Bank. 2019. "Open Data Essentials." The World Bank, July 24. Accessed 2019-12-14.
  41. Zakon, Alicia. 2016. "A Brief History of Open Government." IdeaScale, December 07. Accessed 2019-12-14.

Further Reading

  1. Dodds, Leigh. 2013. "Publisher’s Guide to Open Data Licensing." Open Data Institute, December 15. Accessed 2019-12-19.
  2. Chignard, Simon. 2013. "A brief history of Open Data." Paris Innovation Review, March 29. Accessed 2019-12-14.
  3. Lämmerhirt, Danny, Mor Rubinstein, and Oscar Montiel. 2017. "The State of Open Government Data in 2017." Open Knowledge International, June. Accessed 2019-12-14.
  4. Open Definition
  5. Open Data Handbook
  6. Data Ethics Canvas

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
3
0
1456
1
0
12
2028
Words
0
Chats
4
Edits
0
Likes
1032
Hits

Cite As

Devopedia. 2020. "Open Data." Version 4, January 15. Accessed 2020-11-24. https://devopedia.org/open-data
Contributed by
2 authors


Last updated on
2020-01-15 17:13:43