Glossary

Open Data

Data that anyone can access, use, or share. (Source

Open access

Open access (OA) refers to free, unrestricted online access to research outputs such as journal articles and books. OA content is open to all, with no access fees. (Source

FAIR data principles

The FAIR Data Principles are a set of guiding principles in order to make data findable, accessible, interoperable and reusable (Wilkinson et al., 2016). These principles provide guidance for scientific data management and stewardship, and are relevant to all stakeholders in the current digital ecosystem. They directly address data producers and data publishers to promote maximum use of research data. (Source)

5 Star Open Data

A rating system for open data proposed by Tim Berners-Lee, founder of the World Wide Web. To score the maximum five stars, data must: 1) be available on the Web under an open licence; 2) be in the form of structured data; 3) be in a non-proprietary file format; 4) use URIs as its identifiers (see also RDF), and; 5) include links to other data sources (see linked data). So to score 3 stars, it must satisfy all of (1)-(3)... (Source)

Open Data Certificate

Open Data Certificate is a free online tool developed and maintained by the Open Data Institute to assess and recognise the sustainable publication of quality open data. It assess the legal, practical, technical and social aspects of publishing open data using best practice guidance. (Source)

Responsible data

The collective duty to account for unintended consequences of working with data by: 1) prioritising people’s rights to consent, privacy, security and ownership when using data in social change and advocacy efforts, and; 2) implementing values and practices of transparency and openness. (Source)

Big data

Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyse (Source)

Data ecosystem

Complex adaptive systems that include data infrastructure, tools, media, producers, consumers, curators, and sharers. They are complex organizations of dynamic social relationships through which data/information moves and transforms in flows. (Source)

Standard

A published specification for, for example, the structure of a particular file format, recommended nomenclature to use in a particular domain, a common set of metadata fields... Conforming to relevant standards greatly increases the value of published data by improving machine readability and easing data integration. (Source)

Data

An object, variable, or piece of information that has the perceived capacity to be collected, stored, and identifiable. It comes largely in two forms: structured and unstructured. Structured data is essentially answers to questions asked by the collector of the data, is generally easy to organize and identify and has a strict hierarchy that is not easily manipulated (for example,. responses to a survey organized in a table format, and information about people’s years of education and income in a chart). Unstructured data does not lend itself readily to automated analysis, is often used in ways that differ from the intended purpose when collected (such as photos, videos, tweets), and does not need to follow a hierarchical method of identification. Data is also used as a policy concept and social phenomenon (e.g. “data is changing the world”), or as a shortcut for data ecosystems, Big Data, etc. (Source)

Data value chain

Several organizations are looking into the data value chain. See these links for more information. IBM, GSMA, Open Data Watch.

Data spectrum

The data spectrum ranges from closed, to shared, to open. Whether big, medium or small. Whether state, commercial or personal. The important thing about data is how it is licensed. (Source)

Geodata / geospatial data

Any dataset where data points include a location, for example as latitude and longitude, or another standard encoding. Maps, transport routes, environmental data, catastral data, and many other kinds of data can be published as geodata. (Source)

Shared data

See Data Spectrum.

Data provenance

Data provenance can be defined as the origins, custody, and ownership of research data. Because datasets are used and reformulated or reworked to create new data, provenance is important to trace newly designed or repurposed data back to its original datasets. The concept of provenance guarantees that data creators are held accountable for their work, and provides a chain of information where data can be tracked, as researchers use other researchers’ data and adapt it for their own purposes. (Source)

Data governance 

The exercise of decision-making and authority for data-related matters. The organisational bodies, rules, decision rights, and accountabilities of people and information systems as they perform information-related processes. Data Governance determines how an organization makes decisions — how we “decide how to decide.” (Source)

Metadata

Information used to administer, describe, preserve, present, use or link other information held in resources, especially knowledge resources, be they physical or virtual. Metadata may be further subcategorized into several types (including general, access and structural metadata). (Source)

Data licence

Licences tell you what you can do with the content or data that you access. A licence will tell you whether you can: republish the content or data on your own website; derive new content or data from it; make money by selling products that use it, and; republish it while charging a fee for access.

 

Many licences will let you access content or data for free, but say that you cannot republish it or adapt it, or use it within commercial products. (Source)

 

A range of standard open licences are available, such as the Creative Commons CC-BY licence, which only requires attribution. (Source)

Data security 

Data security means protecting digital data, such as that in a database, from destructive forces and from the unwanted actions of unauthorised users, such as a cyberattack or a data breach. (Source)

Controlled / Standardised vocabulary

Carefully selected sets of terms that are used to describe units of information; used to create taxonomies, thesauri and ontologies. In traditional settings the terms in the controlled vocabularies are words or phrases, in a linked data setting then they are normally assigned unique identifiers (URIs) which in turn link to descriptive phrases. (Source

Blockchain

A new class of data infrastructure technologies has recently emerged, known as ‘distributed ledgers’. Blockchains are one specific type of technology in that domain, and though far from the only type, they are receiving a lot of attention. Blockchain technology emerged from the digital currency Bitcoin, and has been hailed as a revolutionary step forward for data storage and the decentralisation of computer systems. (Source)

Data infrastructure

Data infrastructure consists of data assets, the organisations that operate and maintain them, and guides describing how to use and manage the data. Trustworthy data infrastructure is sustainably funded and is directed to maximise data use and value, meeting society’s needs. Data such as statistics, maps and real-time sensor readings help us to make decisions, build services and gain insight. Data infrastructure will only become more vital as our populations grow and our economies and societies become ever more reliant on getting value from data. (Source)

Data ownership

Someone — an individual, a group, a business, an organization — has a proprietary interest. Speaking of ownership necessarily implies the existence of property rights. The most basic element of property ownership is the exclusive right to control the terms and conditions of access to a resource. (Source)

Ontology

A formal model that allows knowledge to be represented for a specific domain. An ontology describes the types of things that exist (classes), the relationships between them (properties) and the logical ways those classes and properties can be used together (axioms). (Source)

Machine Readable Data

Data formats that may be readily parsed by computer programs without access to proprietary libraries. For example, CSV, TSV and RDF formats are machine readable, but PDF and Microsoft Excel are not. Creating and publishing data following Linked Data principles helps search engines and humans to find, access and re-use data. Once information is found, computer programs can re-use data without the need for custom scripts to manipulate the content. (Source)

Human Readable Data

Data in a format that can be conveniently read by a human. Some human-readable formats, such as PDF, are not machine-readable as they are not structured data, i.e. the representation of the data on disk does not represent the actual relationships present in the data. (Source)

Data Trusts

legal structure that provides independent third-party stewardship of data (Source)

Data Ethics 

a branch of ethics that evaluates data practices with the potential to adversely impact on people and society – in data collection, sharing and use (Source)

Data Steward

A person with data-related responsibilities as set by a Data Governance or Data Stewardship program. Often, Data Stewards fall into multiple types. Data Quality Stewards, Data Definition Stewards, Data Usage Stewards, etc. (Source)

Data policy 

A well-written open data policy will clearly define the commitment of the organisation to publishing, sharing and consuming data. It will be used by internal stakeholders to help identify and prioritise releases, and by external stakeholders to understand how an organisation will be releasing its data and ways in which they can be involved. (Source)

Data integration

Almost any interesting use of data will combine data from different sources. To do this it is necessary to ensure that the different datasets are compatible: they must use the same names for the same objects, the same units or co-ordinates, etc. If the data quality is good this process of data integration may be straightforward but if not it is likely to be arduous. A key aim of linked data is to make data integration fully or nearly fully automatic. Non-open data is a barrier to data integration, as obtaining the data and establishing the necessary permission to use it is time-consuming and must be done afresh for each dataset. (Source)

Linked data

A form of data representation where every identifier is an http://… URI, using standard lists (see vocabulary) of identifiers where possible, and where datasets include links to reference datasets of the same objects. A key aim is to make data integration automatic, even for large datasets. Linked data is usually represented using RDF. (Source)

Linked open data

Linked Data published on the public Web and licensed under one of several open licenses permitting reuse. (Source)

Interoperability

Interoperability denotes the ability of diverse systems and organizations to work together (inter-operate). In this case, it is the ability to interoperate - or intermix - different datasets.

Interoperability is important because it allows for different components to work together. This ability to componentize and to ‘plug together’ components is essential to building large, complex systems. (Source)

Data formats

[Data] file formats are standard methods for encoding digital information. Examples of file formats are comma-separated values (.csv), Microsoft Excel (.xlsx), JPEG (.jpg), Audio-Video Interleave format (.avi). (Source)


 

If you can not find the definition of a word you are looking for, please look at these resources:

 

Open Data Handbook Glossary

W3 Consortium Glossary

Data-Pop Alliance Key Terms