Open Data in More Detail

Open Data in More Detail

Open data is the idea that certain data Links to an external site. should be freely available to everyone to use and republish as they wish, without restrictions from copyright Links to an external site.patents Links to an external site. or other mechanisms of control. The goals of the open data movement are similar to those of other "Open" movements such as open source Links to an external site.open content Links to an external site., and open access Links to an external site.. The philosophy behind open data has been long established (for example in the Mertonian tradition of science Links to an external site.), but the term "open data" itself is recent, gaining popularity with the rise of the Internet Links to an external site. and World Wide Web Links to an external site. and, especially, with the launch of open-data government initiatives such as Data.gov Links to an external site..

Open data is often focused on non-textual material such as maps Links to an external site.genomes Links to an external site.connectomes Links to an external site.chemical compounds Links to an external site., mathematical and scientific formulae, medical data and practice, bioscience and biodiversity. Problems often arise because these are commercially valuable or can be aggregated into works of value. Access to, or re-use of, the data is controlled by organisations, both public and private. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of open data argue that these restrictions are against the communal good and that these data should be made available without restriction or fee. In addition, it is important that the data are re-usable without requiring further permission, though the types of re-use (such as the creation of derivative works) may be controlled by license.

A typical depiction of the need for open data:

Numerous scientists have pointed out the irony that right at the historical moment when we have the technologies to permit worldwide availability and distributed process of scientific data, broadening collaboration and accelerating the pace and depth of discovery…..we are busy locking up that data and preventing the use of correspondingly advanced technologies on knowledge

John Wilbanks, VP Science, Creative Commons

Creators of data often do not consider the need to state the conditions of ownership, licensing and re-use. For example, many scientists do not regard the published data arising from their work to be theirs to control and the act of publication in a journal is an implicit release of the data into the commons. However the lack of a license makes it difficult to determine the status of a data set Links to an external site. and may restrict the use of data offered in an Open spirit. Because of this uncertainty it is also possible for public or private organizations such as IEEE Links to an external site. to aggregate said data, protect it with copyright and then resell it.

Under "Toward Open Data" Connolly (2005, v.i.) gives two quotations:

  • I want my data back. (Jon Bosak circa 1997)
  • I've long believed that customers of any application own the data they enter into it.. (This quote refers to Veen's own heart-rate data.)

Major sources of open data

Open data can come from any source. This section lists some of the fields that publish (or at least discuss publishing) a large amount of open data.

Open data in science

While the open-science-data movement long predates the Internet, the availability of fast, ubiquitous networking has significantly changed the context of Open science data, since publishing or obtaining data has become much less expensive and time-consuming.

In 2004, the Science Ministers of all nations of the OECD Links to an external site. (Organisation for Economic Co-operation and Development), which includes most developed countries of the world, signed a declaration which essentially states that all publicly-funded archive data should be made publicly available. Following a request and an intense discussion with data-producing institutions in member states, the OECD published in 2007 the OECD Principles and Guidelines for Access to Research Data from Public Funding as a soft-law recommendation.[7] Links to an external site.

Examples of open data in science:

Open data in government

Several national governments have created web sites to distribute a portion of the data they collect. It is a concept for a collaborative project in municipal Government to create and organize Culture for Open Data or Open government data. A list of over 200 local, regional and national open data catalogues is available on the open source datacatalogs.org project, which aims to be a comprehensive list of data catalogues from around the world. Prominent examples include:

Additionally, other levels of government have established open data websites. There are many government entities pursuing Open Data in Canada Links to an external site.Data.gov Links to an external site. lists the sites of a total of 31 U.S. states, 13 cities, and > 150 agencies and subagencies providing open data; e.g. the state of California, USA Links to an external site.. The United Nations has an open data website that publishes statistical data from Member States and UN Agencies.

Arguments for and against open data

The debate on Open Data is still evolving. The best open government applications seek to empower consumers, to help small businesses, or to create value in some other positive, constructive way. Open government data is only a way-point on the road to improving education, improving government, and building tools to solve other real world problems. While many arguments have been made categorically, the following discussion of arguments for and against open data highlights that these arguments often depend highly on the type of data and its potential uses.

Arguments made on behalf of Open Data include the following:

  • "Data belong to the human race". Typical examples are genomes, data on organisms, medical science, environmental data.
    Links to an external site.
  • Public money was used to fund the work and so it should be universally available.
    Links to an external site.
  • It was created by or at a government institution (this is common in US National Laboratories and government agencies)
  • Facts cannot legally be copyrighted.
  • Sponsors of research do not get full value unless the resulting data are freely available.
  • Restrictions on data re-use create an anticommons.
  • Data are required for the smooth process of running communal human activities (map data, public institutions).
  • In scientific research, the rate of discovery is accelerated by better access to data.
    Links to an external site.

It is generally held that factual data cannot be copyrighted. However, publishers frequently add copyright statements (often forbidding re-use) to scientific data accompanying publications. It may be unclear whether the factual data embedded in full text are part of the copyright.

While the human abstraction of facts from paper publications is normally accepted as legal there is often an implied restriction on the machine extraction by robots.

Unlike Open Access Links to an external site., where groups of publishers have stated their concerns, Open Data is normally challenged by individual institutions. Their arguments have been discussed less in public discourse and there are fewer quotes to rely on at this time.

Arguments against making all data available as Open Data include the following:

  • Government funding may not be used to duplicate or challenge the activities of the private sector (e.g. PubChem Links to an external site.).
  • Governments have to be accountable for the efficient use of taxpayer's money: If public funds are used to aggregate the data and if the data will bring commercial (private) benefits to only a small number of users, the users should reimburse governments for the cost of providing the data.
  • The revenue earned by publishing data permits non-profit organisations to fund other activities (e.g. learned society publishing supports the society).
  • The government gives specific legitimacy for certain organisations to recover costs (NIST Links to an external site. in US, Ordnance Survey Links to an external site. in UK).
  • Privacy concerns may require that access to data is limited to specific users or to sub-sets of the data.
  • Collecting, 'cleaning', managing and disseminating data are typically labour- and/or cost-intensive processes - whoever provides these services should receive fair remuneration for providing those services.
  • Sponsors do not get full value unless their data is used appropriately - sometimes this requires quality management, dissemination and branding efforts that can best be achieved by charging fees to users.
  • Often, targeted end-users cannot use the data without additional processing (analysis, apps etc.) - if anyone has access to the data, none may have an incentive to invest in the processing required to make data useful (Typical examples include biological, medical, and environmental data).

Relation to other open activities

The goals of the Open Data movement are similar to those of other "Open" movements.

This page is an adaptation of the Wikipedia entry on open data Links to an external site., and uses the CC BY-SA license.