Skip to main content

Quality & Utility Assessment

This page presents the QUANTUM Dataset Quality & Utility Assessment framework for assessing datasets intended for secondary use in the EHDS context. It guides data holders through the four EHDS categories, the corresponding quality and utility dimensions, and the related metrics used to describe the dataset assessment. Open each category, dimension, and metric to view the definitions, recommended measurement approach, and available assessment levels that support consistent interpretation and reporting.

EHDS category Dimension Metric
1. Access and provision 1 dimension
1.1 Accessibility Dimension weight: 9.95%

Definition: Accessibility refers to the dataset being accompanied by clear and transparent access and usage conditions.

Metric #1 – Availability of a data access & usage policy at the time of release of the dataset Metric weight: 50.00%

Metric description: Availability of a data access & usage policy at the time of release of the dataset

Recommended measurement approach: Preferably using a standard vocabulary as for example the DCAT property dct:accessRights

Available levels

  • 0No policy available
  • 1Basic policy available
  • 2Comprehensive policy available
Metric #2 – Average time from data access application to data release for a specific dataset Metric weight: 50.00%

Metric description: Average time from data access application to data release for a specific dataset

Recommended measurement approach: The HDAB/data holder to provide the average using digital time-stamps for the process

Available levels

  • 0More than 6 months
  • 13 to 6 months
  • 21 to 3 months
  • 3Less than 1 month
2. Coverage 2 dimensions
2.1 Population coverage (how much of it) Dimension weight: 6.47%

Definition: Population coverage refers to the degree to which a dataset includes the potential eligible population.

Metric #1 – Coverage Rate (percentage of the eligible population represented in the dataset) Metric weight: 100.00%

Metric description: Coverage Rate (percentage of the eligible population represented in the dataset)

Recommended measurement approach: According to data holder information: (Number of individuals in the dataset / Total eligible population) x 100%

Available levels

  • <80%Limited coverage
  • 80-90%Good coverage
  • 90-95%Very good coverage
  • 95-100%Near-universal or universal coverage
2.2 Population representativity (how well of it) Dimension weight: 7.96%

Definition: Population representativity refers to the degree to which the data adequately represent the population in question.

Metric #1 – How closely does the observed population represent the expected population? Metric weight: 100.00%

Metric description: How closely does the observed population represent the expected population?

Recommended measurement approach: According to data holder's information.

Available levels

  • 0No information on sampling methodology
  • 1Sampling information does not demonstrate the sample representativity
  • 2Sampling information demonstrates the sample representativity
  • 3Dataset contains all expected population
3. Data documentation 3 dimensions
3.1 Compliance Dimension weight: 7.46%

Definition: Compliance refers to the degree to which data has attributes that adhere to ethical standards, conventions, protocols or regulations.

Metric #1 – Is there documentation of compliance with ethical standards, conventions, protocols or regulations? Metric weight: 100.00%

Metric description: Is there documentation of compliance with ethical standards, conventions, protocols or regulations?

Recommended measurement approach: According to data holder's information.

Available levels

  • 0No.
  • 1Documentation of applicable ethical standards, conventions, protocols or regulations, but no documentation of deviations or compliance.
  • 2Documentation of applicable ethical standards, conventions, protocols or regulations, as well as documentation of deviations or compliance.
3.2 Data provenance Dimension weight: 7.96%

Definition: Data provenance means a description of the source of the data, including context, purpose, method and technology of data generation, documenting agents involved in the provenance of data, data validation routines, source data verification, traceability of changes, and quality control of data.

Metric #1 – Is the source of the dataset documented? Metric weight: 50.00%

Metric description: Is the source of the dataset documented?

Recommended measurement approach: Ideally using a standard vocabulary as in DCAT-AP "dct:source; dct:creator; dct:contributor"

Available levels

  • 0No source is documented
  • 1Source is documented
Metric #2 – Are the processes and operations on the data documented? Metric weight: 50.00%

Metric description: Are the processes and operations on the data documented?

Recommended measurement approach: Using PROV-O (PROV Ontology)

Available levels

  • 0No documentation on data processes and operations
  • 1Some documentation on data processes and operations but not complying with PROV-O standards
  • 2Full documentation on data processes and operations complying with PROV-O standards
3.3 Metadata scope Dimension weight: 8.46%

Definition: Metadata scope refers to the availability, comprehensiveness, level of detail of metadata and data dictionary that help users understand the data being used.

Metric #1 – Existence of comprehensive standardised metadata Metric weight: 40.00%

Metric description: Existence of comprehensive standardised metadata

Recommended measurement approach: Link/reference to the standardised metadata model

Available levels

  • 0Non-standardised metadata
  • 1Partially complying with standardised metadata model (e.g. HealthDCAT-AP)
  • 2Fully complying with standardised metadata model (e.g. HealthDCAT-AP)
Metric #2 – Existence of an exhaustive data dictionary at variable level Metric weight: 60.00%

Metric description: Existence of an exhaustive data dictionary at variable level

Recommended measurement approach: Link/reference to the standardised vocabularies in the meta-data model. Note that what the data dictionary contains may depend on the type of data; in the case of non-structured data the data dictionary may include features on the data source, its components and its relationship with other data.

Available levels

  • 0No data dictionary
  • 1Partial data dictionary: some variables described with basic information (i.e., names and brief definitions)
  • 2Complete data dictionary: all variables described with detailed information (i.e., names, definitions, units, allowed values, etc.)
4. Technical quality 6 dimensions
4.1 Accuracy Dimension weight: 9.95%

Definition: Accuracy refers to the degree to which observations correctly describe what it was designed to measure.

Metric #1 – Is accuracy of the dataset documented? Metric weight: 100.00%

Metric description: Is accuracy of the dataset documented?

Recommended measurement approach: The report should provide information on steps done to validate measurement accuracy of measured variables; in addition it should, at variable level, provide information on conformance to known (externally measured) distributions or to assumed value ranges of distributions.

Available levels

  • 0Accuracy not documented
  • 1Information on the efforts to ensure accuracy is provided (non statistical information provided)
  • 2Statistical information on accuracy is provided at variable and/or individual level
4.2 Coherence (within the dataset) Dimension weight: 8.96%

Definition: Coherence is defined as the dimension that expresses how different parts of the dataset are uniform in their representation and meaning over time, such as formats, semantics (stability of the data models), and methods.

Metric #1 – Is coherence of the dataset documented? Metric weight: 100.00%

Metric description: Is coherence of the dataset documented?

Recommended measurement approach: At dataset level, the report should contain information on the changes observed in formats, semantics (stability of the data models), and methods.

Available levels

  • 0Coherence not documented
  • 1Coherence documented for some entities, attributes and relations in the dataset
  • 2Coherence documented for all entities, attributes and relations in the dataset
4.3 Completeness Dimension weight: 8.96%

Definition: Completeness refers to the degree to which all information that could be available is present in a particular dataset organized as tabular data.

Metric #1 – Is completeness of the dataset documented? Metric weight: 100.00%

Metric description: Is completeness of the dataset documented?

Recommended measurement approach: For each variable the dataset report should calculate the number of null values over the number of records. Explicit code values signifying that the information is not applicable are not considered null.

Available levels

  • 0Completeness not documented
  • 1Some variables are analysed for completeness
  • 2All variables are analysed for completeness
4.4 Consistency (among datasets) Dimension weight: 9.95%

Definition: Consistency refers to the degree to which data has attributes that are plausible and are uniform with other data and over time.

Metric #1 – Is consistency of the dataset documented? Metric weight: 100.00%

Metric description: Is consistency of the dataset documented?

Recommended measurement approach: For each variable the dataset report should contain consistency checks that refer to, amongst others: plausible range of numeric values; codings are plausible in relation to one another - "business logic" tests; use of valid codes (semantic).

Available levels

  • 0Consistency not documented
  • 1Consistency of some variables is documented
  • 2Consistency of all variables is documented
4.5 Precision Dimension weight: 5.97%

Definition: Precision refers to the degree of approximation by which data can represent reality.

Metric #1 – Is precision of the dataset documented? Metric weight: 100.00%

Metric description: Is precision of the dataset documented?

Recommended measurement approach: For each variable, the dataset report should contain computational checks including amongst others: granularity in numeric variables; number of categories or ranges for grouped numeric variables; number of categories/codes used for ordinal or nominal variables.

Available levels

  • 0Precision not documented
  • 1Precision of some variables is documented
  • 2Precision of all variables is documented
4.6 Validity Dimension weight: 7.96%

Definition: Validity refers to the degree to which representations of data in a dataset conform to the specification of a data model or data models.

Metric #1 – Availability of a conformance report for the data model Metric weight: 100.00%

Metric description: Availability of a conformance report for the data model

Recommended measurement approach: For each variable, the dataset report should contain computational checks for variable level syntax conformance (.i.e., conformance with expected syntax as per the data model)

Available levels

  • 0No report available
  • 1Report available for validity of some variables
  • 2Report available for validity of all variables

Back to previous page.