Open Data in More Detail
Open Data in More Detail
Open data is the idea that certain data Links to an external site. should be freely available to everyone to use and republish as they wish, without restrictions from copyright Links to an external site., patents Links to an external site. or other mechanisms of control. The goals of the open data movement are similar to those of other "Open" movements such as open source Links to an external site., open content Links to an external site., and open access Links to an external site.. The philosophy behind open data has been long established (for example in the Mertonian tradition of science Links to an external site.), but the term "open data" itself is recent, gaining popularity with the rise of the Internet Links to an external site. and World Wide Web Links to an external site. and, especially, with the launch of open-data government initiatives such as Data.gov Links to an external site..
Open data is often focused on non-textual material such as maps Links to an external site., genomes Links to an external site., connectomes Links to an external site., chemical compounds Links to an external site., mathematical and scientific formulae, medical data and practice, bioscience and biodiversity. Problems often arise because these are commercially valuable or can be aggregated into works of value. Access to, or re-use of, the data is controlled by organisations, both public and private. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of open data argue that these restrictions are against the communal good and that these data should be made available without restriction or fee. In addition, it is important that the data are re-usable without requiring further permission, though the types of re-use (such as the creation of derivative works) may be controlled by license.
A typical depiction of the need for open data:
Numerous scientists have pointed out the irony that right at the historical moment when we have the technologies to permit worldwide availability and distributed process of scientific data, broadening collaboration and accelerating the pace and depth of discovery…..we are busy locking up that data and preventing the use of correspondingly advanced technologies on knowledge
John Wilbanks, VP Science, Creative Commons
Creators of data often do not consider the need to state the conditions of ownership, licensing and re-use. For example, many scientists do not regard the published data arising from their work to be theirs to control and the act of publication in a journal is an implicit release of the data into the commons. However the lack of a license makes it difficult to determine the status of a data set Links to an external site. and may restrict the use of data offered in an Open spirit. Because of this uncertainty it is also possible for public or private organizations such as IEEE Links to an external site. to aggregate said data, protect it with copyright and then resell it.
Under "Toward Open Data" Connolly (2005, v.i.) gives two quotations:
- I want my data back. (Jon Bosak circa 1997)
- I've long believed that customers of any application own the data they enter into it.. (This quote refers to Veen's own heart-rate data.)
Major sources of open data
Open data can come from any source. This section lists some of the fields that publish (or at least discuss publishing) a large amount of open data.
Open data in science
Links to an external site.
While the open-science-data movement long predates the Internet, the availability of fast, ubiquitous networking has significantly changed the context of Open science data, since publishing or obtaining data has become much less expensive and time-consuming.
In 2004, the Science Ministers of all nations of the OECD Links to an external site. (Organisation for Economic Co-operation and Development), which includes most developed countries of the world, signed a declaration which essentially states that all publicly-funded archive data should be made publicly available. Following a request and an intense discussion with data-producing institutions in member states, the OECD published in 2007 the OECD Principles and Guidelines for Access to Research Data from Public Funding as a soft-law recommendation.[7] Links to an external site.
Examples of open data in science:
- data.uni-muenster.de - Open data about scientific artifacts from University of Muenster, Germany. Launched in 2011.
- linkedscience.org/data - Open scientific datasets encoded as Linked Data Links to an external site.. Launched in 2011.
Open data in government
Several national governments have created web sites to distribute a portion of the data they collect. It is a concept for a collaborative project in municipal Government to create and organize Culture for Open Data or Open government data. A list of over 200 local, regional and national open data catalogues is available on the open source datacatalogs.org project, which aims to be a comprehensive list of data catalogues from around the world. Prominent examples include:
- Data.gov Links to an external site. - U.S. government open-data website. Launched in May 2009.
- Data.gov.uk Links to an external site. - U.K. government open-data website. Launched in September 2009.
Additionally, other levels of government have established open data websites. There are many government entities pursuing Open Data in Canada Links to an external site.. Data.gov Links to an external site. lists the sites of a total of 31 U.S. states, 13 cities, and > 150 agencies and subagencies providing open data; e.g. the state of California, USA Links to an external site.. The United Nations has an open data website that publishes statistical data from Member States and UN Agencies.
Arguments for and against open data
The debate on Open Data is still evolving. The best open government applications seek to empower consumers, to help small businesses, or to create value in some other positive, constructive way. Open government data is only a way-point on the road to improving education, improving government, and building tools to solve other real world problems. While many arguments have been made categorically, the following discussion of arguments for and against open data highlights that these arguments often depend highly on the type of data and its potential uses.
Arguments made on behalf of Open Data include the following:
- "Data belong to the human race". Typical examples are genomes, data on organisms, medical science, environmental data.
Links to an external site. - Public money was used to fund the work and so it should be universally available.
Links to an external site. - It was created by or at a government institution (this is common in US National Laboratories and government agencies)
- Facts cannot legally be copyrighted.
- Sponsors of research do not get full value unless the resulting data are freely available.
- Restrictions on data re-use create an anticommons.
- Data are required for the smooth process of running communal human activities (map data, public institutions).
- In scientific research, the rate of discovery is accelerated by better access to data.
Links to an external site.
It is generally held that factual data cannot be copyrighted. However, publishers frequently add copyright statements (often forbidding re-use) to scientific data accompanying publications. It may be unclear whether the factual data embedded in full text are part of the copyright.
While the human abstraction of facts from paper publications is normally accepted as legal there is often an implied restriction on the machine extraction by robots.
Unlike Open Access Links to an external site., where groups of publishers have stated their concerns, Open Data is normally challenged by individual institutions. Their arguments have been discussed less in public discourse and there are fewer quotes to rely on at this time.
Arguments against making all data available as Open Data include the following:
- Government funding may not be used to duplicate or challenge the activities of the private sector (e.g. PubChem Links to an external site.).
- Governments have to be accountable for the efficient use of taxpayer's money: If public funds are used to aggregate the data and if the data will bring commercial (private) benefits to only a small number of users, the users should reimburse governments for the cost of providing the data.
- The revenue earned by publishing data permits non-profit organisations to fund other activities (e.g. learned society publishing supports the society).
- The government gives specific legitimacy for certain organisations to recover costs (NIST Links to an external site. in US, Ordnance Survey Links to an external site. in UK).
- Privacy concerns may require that access to data is limited to specific users or to sub-sets of the data.
- Collecting, 'cleaning', managing and disseminating data are typically labour- and/or cost-intensive processes - whoever provides these services should receive fair remuneration for providing those services.
- Sponsors do not get full value unless their data is used appropriately - sometimes this requires quality management, dissemination and branding efforts that can best be achieved by charging fees to users.
- Often, targeted end-users cannot use the data without additional processing (analysis, apps etc.) - if anyone has access to the data, none may have an incentive to invest in the processing required to make data useful (Typical examples include biological, medical, and environmental data).
Relation to other open activities
The goals of the Open Data movement are similar to those of other "Open" movements.
- Open access Links to an external site. is concerned with making scholarly publications freely available on the internet. In some cases, these articles include open datasets as well.
- Open content Links to an external site. is concerned with making resources aimed at a human audience (such as prose, photos, or videos) freely available.
-
Open notebook science
Links to an external site. refers to the application of the Open Data concept to as much of the scientific process as possible, including failed experiments and raw experimental data.
Links to an external site. -
Open research
Links to an external site./Open science
Links to an external site./Open science data
Links to an external site. (Linked open science
Links to an external site.) means an approach to open and interconnect scientific assets like data, methods and tools with Linked Data
Links to an external site. techniques to enable transparent, reproducible and transdisciplinary research.
Links to an external site. - Open source Links to an external site. (software) is concerned with the licenses under which computer programs can be distributed and is not normally concerned primarily with data.
This page is an adaptation of the Wikipedia entry on open data Links to an external site., and uses the CC BY-SA license.