Open Data
- Summary
-
Discussion
- Could you describe open data?
- Where could open data be useful?
- Could you mention some sources of open data?
- Which are some organizations working with or for open data?
- Why and what aspects of open data should we standardize?
- What sort of licensing should I adopt when opening my data?
- What are some challenges with open data?
- Milestones
- References
- Further Reading
- Article Stats
- Cite As
The idea of open data is to share data freely with others. Openness also allows others to modify, reuse or redistribute the data. Openness has two facets: legal and technical. Legal openness is about applying a suitable open license to the data. Technical openness is about removing technical barriers and making it easy to access, read, store or process the data.
By opening up data, others can unlock value in the form of information and knowledge. For example, in the travel sector, locations, images, prices and reviews are data that can help us plan a holiday. Information is data within a given context. Knowledge personalizes information and helps us make decisions.
Many organizations worldwide promote open data. Open licenses, datasets and tools are available. Governments are releasing open data that citizens can use.
Discussion
-
Could you describe open data? Database is really about structure and organization of data, also known as the database model. This is generally covered by copyright. In the context of open data, we're more concerned with the contents of the database, which we simply call data.
Data can mean a single item or an entire collection. Particularly for factual databases, a collection can be protected but not individual items. For example, a protected collection may be about the melting point of various substances but no one can be prevented from stating a particular item, such as element E melts at temperature T.
To understand the meaning of openness, we can refer to the Open Definition that states, "Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)."
Open data should be accessible at a reasonable cost if not free. It should be available in bulk. There shouldn't be restrictions on who or for what purpose they wish to use the data. Tools to use the data should be freely available and not proprietary. Data shouldn't be locked up behind passwords or firewalls.
-
Where could open data be useful? Open data can make governments more transparent. Citizens will have confidence that their money is being spent as budgeted or in implementing the right policies. For example, one activist noted that Canadian citizens used open data to save their government $3.2bn in fraudulent charitable donations. In Brazil, DataViva provides millions of interactive visualizations based on open government data.
New business opportunities are possible. For example, once Transport for London opened their data, developers used it to build apps. Likewise, Thomson Reuters uses open data to provide better services to its customers. OpenStreetMap and Copernicus are examples that enable new GIS applications.
In research, open data is a part of what is called Open Science. It leads to reproducible research and faster advancements. Open data also enables researchers to revalidate their own findings.
Open data can be used to protect the environment. Some apps that do this include mWater, Save the Rain and Ecofacts.
-
Could you mention some sources of open data? There are many curated lists on the web for open data. From some of these, we mention a few useful sources of open data by category:
- General: DBpedia, Datasets Subreddit, Kaggle, FiveThirtyEight, Microsoft Marco
- Government: Data.gov, Data.gov.uk, European Union Open Data Portal
- Economy: World Bank Open Data, Global Financial Data, International Monetary Fund
- Business: OpenCorporates, Yellowpages, EU-Startups, Glassdoor
- Health & Science: World Health Organization, HealthData.gov, NHS Digital, Open Science Data Cloud, NASA Earth Data, LondonAir
- Research: Google Scholar, Pew Research Center, OpenLibrary Data Dumps, CERN Open Data
- Environment: Climate Data Online, IEA Atlas of Energy
-
Which are some organizations working with or for open data? We mention a few organizations:
- Open Data Institute: Works with companies and governments to build an open, trustworthy data ecosystem, where people can make better decisions using data and manage any harmful impacts.
- Open Data Commons: Provides a set of legal tools to publish and use open data. They've published open licenses applicable to data.
- Open Knowledge Foundation: A worldwide network of people passionate about openness, using advocacy, technology and training to unlock information and enable people to work with it to create and share knowledge. It was briefly called Open Knowledge International. The Open Definition is one of their projects.
- Open Data Charter: A collaboration between governments and organisations working to open up data based on a shared set of principles.
-
Why and what aspects of open data should we standardize? Data is more valuable if we can combine two different datasets to obtain new insights. Interoperability is the key. Given diverse systems, tools and data formats, interoperability can be almost impossible.
Without standards, it becomes more difficult for us to publish, access or share data effectively. Standards also make it easier to repeat processes, compare results and reach a shared understanding. Moreover, we need open standards that are available to public and are defined through collaboration and consensus.
Open standards should define a common data model. The data pipeline should be streamlined. It should be easy to combine data. It should promote common understanding.
Open Data Institute's page on standards is a useful resource to learn more. A checklist for selecting a suitable standard looks at the licensing, project needs, maintenance of the standard, and guidance on usage.
-
What sort of licensing should I adopt when opening my data? Releasing your data without a license creates uncertainty. In some jurisdictions, data lacking explicit permission may be protected by intellectual property rights. It's therefore better to attach a license.
Aspects that define a license include public domain, attribution, share-alike, non-commercial, database only and no derivatives.
There are a number of licenses that conform to Open Definition. Creative Commons CC0 is an example. It releases the data into public domain. Anyone can copy, modify and distribute, even for commercial purposes, without asking permission. A similar license is Open Data Commons Public Domain Dedication and Licence (PDDL). Other conformant but less reusable licenses are data licenses from governments (Germany, UK, Canada, Taiwan). UK's Open Government Licence is an example.
Two other licenses worth looking at come from the Linux Foundation: CDLA–Sharing-1.0 and CDLA-Permissive-1.0, where CDLA refers to Community Data License Agreement.
Open Data Commons Open Database License (ODC-ODbL) is seen as a "viral license". Any changes you make to the dataset, you're required to release the same under this license.
-
What are some challenges with open data? Publishers continue to use customized licenses. This makes it hard to reuse data. It makes licenses incompatible across datasets. Instead, they should use standardized open licenses. Ambivalent or redundant clauses cause confusion. Licenses often are not clear about the data to which they apply. Data is often not linked to legal terms.
Data is hard to find. Sometimes their locations change. A platform such as CKAN might help.
Data could be misinterpreted, which could result in a wrong decision. This creates a fear of accountability and prevents producers from opening their datasets.
Quality of data is another concern. Data should be machine-readable and in raw form. Publishing data in HTML or PDF is not research friendly. For context and interpretation, metadata should be shared.
Raw data is good but also requires advanced technical skills and domain knowledge. We therefore need to build better data literacy. AI algorithms are being used to analyse data but many are black-box models. They also promote data centralization and control, which are at odds with the open data movement.
Data infrastructure must of consistent quality. Data can also be biased by gender or against minority groups.
Milestones
In the US, the Government Printing Office (GPO) goes online and opens a few government-related documents. This is an early example of open government data at a time when the Internet was becoming popular. Historically and legally, the idea can be traced back to the the Freedom of Information Act of 1966.
The Open Knowledge Foundation creates the Open Definition. This is based on established principles from the open source movement for software. This definition is later translated into more than 30 languages. In November 2015, Open Definition 2.1 is released.
2006
At a TED talk, Hans Rosling presents compelling visuals of global trends in health and economics. Using publicly available datasets from different sources, he debunks myths about the developing world. He makes a case for governments and organizations to open up their data. Data must be enabled using design tools. His mantra is to animate and liberate data from closed databases. Data must also be searchable.
2007
Thirty individuals meet at Sebastopol, California to discuss open public data. Many of them come from the culture of free and open source software movement. This event can be seen as a convergence of many past US and European efforts in open data. They identify eight principles: complete, primary, timely, accessible, machine processable, non-discriminatory, non-proprietary, and license-free.
2009
2009
Many governments attempt to open their data to public but there are concerns about privacy. It's only in 2015 that privacy principles become an essential part of discussions on open data. It's become clear that providing raw data is not always possible. Governments must balance potential benefits to public against privacy rights of individuals.
Guillermo Moncecchi of Open Knowledge International writes that beyond transparency, open data is also about building a public data infrastructure. While the focus during 2007-2009 was on data transparency, during 2010-2016 focus shifts to public infrastructure. Consider data about street light locations. Citizens can use this data to solve problems on their own. Metrics change from number of published datasets to APIs and reuse.
2017
Open Knowledge Foundation publishes the Global Open Data Index (GODI). This shows that only 38% of government data is really open. This is based on Open Definition 2.1. A later update available online shows that only 166 of 1410 datasets (11%) are open. The report gives a number of recommendations to governments and policy makers to make their data more open.
2018
Open Data Charter replaces their earlier ambitious call of "open by default" with a more practical "publish with purpose". Rather than getting governments to open up as much data as possible quickly, the intent is to get them to take small steps in that direction. Governments can open datasets with a clear view of tangible benefits to citizens.
2018
Using open data exposed by Here.com via an API, Tjukanov uses visualization to show average traffic congestion in many of the world's largest cities. Interesting patterns emerge, plotted within a radius of 50km from the city center. He uses QGIS, an open source GIS platform. This is an example of the value that open data plus open source can unlock.
2020
Using open data collated by the Institute for Government from data.gov.uk, Peter Cook produces interesting organograms, which are visualizations of organization structures. He does this for many UK government departments. Clicking on individual data points shows name, job title, salary range and other details. This sort of data has been available (though patchy) since March 2011.
References
- Berners-Lee, Tim. 2009. "The next web." TED, February. Accessed 2019-12-14.
- Calderon, Ania. 2018. "Publishing with Purpose." Open Data Charter, on Medium, January 29. Accessed 2019-12-14.
- Chau, Aries. 2017. "70 Amazing Free Data Sources You Should Know." KDnuggets, December. Accessed 2019-12-14.
- Chignard, Simon. 2013. "A brief history of Open Data." Paris Innovation Review, March 29. Accessed 2019-12-14.
- Cook, Peter. 2020. "Organograms." Accessed 2020-01-15.
- Creative Commons. 2019. "CC0 1.0 Universal (CC0 1.0) Public Domain Dedication." Creative Commons. Accessed 2019-12-14.
- Crystal, Madison. 2019. "Building the foundation for future research through Open data, code and protocols." Blog, PLOS, December 11. Accessed 2019-12-14.
- Davies, Tim, Stephen B. Walker, Mor Rubinstein, and Fernando Perini, eds. 2019. "The State of Open Data: Histories and Horizons." African Minds, IDRC, May 05. Accessed 2019-12-14.
- Dodds, Leigh. 2018. "Announcing the open standards for data guidebook." Open Data Institute, April 13. Accessed 2019-12-14.
- European Commission. 2019. "What is open data?" Module 1, Open Data, European Data Portal, European Commission. Accessed 2019-12-14.
- Freeguard, Gavin. 2017. "Hacking organograms: unlocking government data." Blog, Institute for Government, July 5. Accessed 2020-01-15.
- Fretwell, Luke. 2014. "A brief history of open data." FCW, June 09. Accessed 2019-12-14.
- Hare, Jason. 2017. "Open Data Anniversary: Ten Years after the Sebastopol Meeting." Blog, Opendatasoft, December 07. Accessed 2019-12-14.
- James, Laura. 2013. "Defining Open Data." Blog, OKFN, October 03. Accessed 2019-12-14.
- Lämmerhirt, Danny. 2017. "The state of open licensing in 2017." Blog, Open Knowledge Foundation, June 08. Accessed 2019-12-14.
- Lämmerhirt, Danny, Mor Rubinstein, and Oscar Montiel. 2017. "The State of Open Government Data in 2017." Open Knowledge International, June. Accessed 2019-12-14.
- Marr, Bernard. 2018. "Big Data And AI: 30 Amazing (And Free) Public Data Sources For 2018." Forbes, February 26. Accessed 2019-12-14.
- ODC. 2009. "ODC Open Database License (ODbL) Summary." Open Data Commons, September 12. Updated 2016-07-06. Accessed 2019-12-14.
- ODC. 2019. "Homepage." Open Data Commons. Accessed 2019-12-14.
- ODI. 2019. "Homepage." Open Data Institute. Accessed 2019-12-14.
- ODI Standards. 2019a. "What are open standards for data?" Open Standards for Data, Open Data Institute. Accessed 2019-12-14.
- ODI Standards. 2019b. "A checklist for choosing open standards." Open Standards for Data, Open Data Institute. Accessed 2019-12-14.
- OKFN. 2017. "Place overview." Open Knowledge Foundation. Accessed 2019-12-14.
- OKFN. 2019. "About." Open Knowledge Foundation. Accessed 2019-12-14.
- Open Data Charter. 2019. "Homepage." Accessed 2019-12-14.
- Open Data Handbook. 2019. "What is Open Data?" Open Data Handbook. Accessed 2019-12-14.
- Open Definition. 2019a. "Conformant Licenses." Open Definition. Accessed 2019-12-14.
- Open Definition. 2019b. "Guide to Open Data Licensing." Open Definition. Accessed 2019-12-14.
- Open Definition. 2019c. "Homepage." Open Definition. Accessed 2019-12-14.
- Pickell, Devin. 2019. "50 Best Open Data Sources Ready to be Used Right Now." G2, March 15. Accessed 2019-12-14.
- Pollock, Rufus. 2015. "Announcement – Open Definition 2.1." Blog, OKFN, November 10. Accessed 2019-12-14.
- Resource.org. 2007. "Open Government Data Principles." Resource.org, December 7-8. Accessed 2019-12-14.
- Rosling, Hans. 2006. "The best stats you've ever seen." TED, February. Accessed 2019-12-14.
- Tauberer, Joshua. 2014. "History of the Movement." In: Open Government Data: The Book, Second Edition, August. Accessed 2019-12-14.
- Tjukanov, Topi. 2018a. "Urban forms." Accessed 2019-12-14.
- Tjukanov, Topi. 2018b. "How much area you can cover by car in one hour? A comparison between European capitals." Twitter, February 04. Accessed 2019-12-14.
- Wikipedia. 2019. "Open Knowledge Foundation." Wikipedia, September 23. Accessed 2019-12-14.
- World Bank. 2019. "Open Data Essentials." The World Bank, July 24. Accessed 2019-12-14.
- Zakon, Alicia. 2016. "A Brief History of Open Government." IdeaScale, December 07. Accessed 2019-12-14.
- data.world. 2019. "Common license types for datasets." data.world Help Center. Accessed 2019-12-14.
- de Milliano, Sabine. 2017. "Why Open Data is not as Simple as it Seems." GiS Professional, October 11. Accessed 2019-12-14.
Further Reading
- Dodds, Leigh. 2013. "Publisher’s Guide to Open Data Licensing." Open Data Institute, December 15. Accessed 2019-12-19.
- Chignard, Simon. 2013. "A brief history of Open Data." Paris Innovation Review, March 29. Accessed 2019-12-14.
- Lämmerhirt, Danny, Mor Rubinstein, and Oscar Montiel. 2017. "The State of Open Government Data in 2017." Open Knowledge International, June. Accessed 2019-12-14.
- Open Definition
- Open Data Handbook
- Data Ethics Canvas
Article Stats
Cite As
See Also
- Open Data Maturity Model
- Linked Open Data
- Semantic Web
- Structured vs Unstructured Data
- Big Data
- Dark Data