Documentation
The JAIL |OD| BREAK project has been carried out for the Open Access and Digital Ethics exam of Digital Humanities and Digital Knowledge course at the University of Bologna. The aim of the project is the analysis and the further re-use of open access datasets, in order to find some kind of new knowledge reachable through the mashup of the original data.
You can directly consult our GitHub repository here.
Introduction
On this page you can read the detailed analysis of the JAIL |OD| BREAK project, which aims to investigate the European prison conditions to understand if and in what terms the human rights of prisoners are respected.
Scenario
Prison conditions have always been a sensitive issue to reflect on. We believe that is necessary that the guilt of the person, which deprives her/him of her/his freedom, does not also imply a loss of her/his fundamental human rights. We therefore searched for information, analyzed varied datasets, sometimes difficult to find or not useful anymore because not updated, with the aim of creating a concise and clear mashup of elements capable of providing interested users a complete and rich perspective of European prison 
General Analysis
You can take a look and download the project's General Analysis here.
Quality analysis of the datasets
As far as quality analysis is concerned, the standards we have taken into consideration come mainly from a source, which is the Open Data Goldbook for Data Managers and Data Holders, provided by the European Data Portal. We relied on this resource because our datasets refer to European data and consequently it seemed particularly fitting to take advantage of this relevant document.
The standards for quality we decided to take in consideration are:
- Completeness: are the datasets complete? Is there information concerning what is the data about, where does it come from and for what purpose has it been published? Are there sufficient indicators?
- Accuracy: are the datasets accurate enough for their purpose? How much are they error-free so as to be a reliable source?
- Timeliness: data changes over time. Are the data up to date? This topic is closely related to the maintenance of datasets.
- Reusability: are the datasets made available under an open license so that they can be reused?
UNODC (United Nations Office on Drugs and Crime)
The data is clear and quite complete although some datasets are quite sparse. The source of each dataset is specified. The data are open, accessible and, about reusability, they can be viewed and downloaded in different formats. The data are clean and there are no visible errors. However, the description of these same data is sometimes lacking. Status and version number of the datasets are not explicitly reported.
Eurostat
As for the datasets from Eurostat, they are well described thanks also to the explanatory texts (metadata). As for completeness, it depends on the availability of national data from statistical institutes. Instead, with regard to consistency, it is reported in the metadata that "the data are checked for completeness, internal consistency, and consistency over time and coherence with other relevant data sources". On the other hand, there is a lack of explicit information regarding accuracy. The data are explorable, editable and filterable. However, they can be downloaded in a few formats. The indicators are sufficient but sometimes unclear. The updating is explicitly reported.
UNECE (United Nations Economic Commission)
The structure of the UNECE Statistical Database is briefly described in an About section. For each dataset, a brief description of variables is given and users can change and modify variables according to a tool that allows the construction of specific queries. Source's data can be considered as reliable since they come directly from official sources. Even if sometimes a lack of data is present and considerable, footnotes are provided to better understand how data are structured. The availability of places and range of years is explicitly reported for every country. There are no information about timeliness. The dataset is accessible and reusable. Concerning completeness, which is quite satisfying, lacking of data tend to depend mainly from the State involved.
World Health Organization (WHO):
The WHO Regional Office for Europe developed the HIPED (Health in Prisons European Database) because of the lack of systematically collected and comparable data on the health of incarcerated people. The database provides an overview of health in prisons according to important public health indicators and includes data collected through the National questionnaire for the minimum public health dataset for prisons in the WHO European Region in 2016/2017. The main purposes are explicit: to provide comprehensive, consistent and reliable public health data on prison populations and their health needs across WHO European Region Member States. The database is divided into several domains and this allows users to easily explore it. Some data are not up to date, but this mainly depends on the national statistical institutes. Unfortunately, those out-of-date data (i.e., prison population analysis dates to 2016) compromises the timeliness. The updates are clarified but these also date back to 2019, so they are out of date. The data can be filtered and downloaded in various formats. There is additional information to individual domains, but it is not systematic.
Legal Analysis
First, we wanted to establish all the different licenses that our sources use as they often differ. We have then structured a complete legal checklist that you can consult here. We took into account the standards defined by the Open Data Goldbook for Data Managers and Data Holders and we took into account the four main directives of the Open Data release:
- Privacy: GDPR Regulation (EU) 2016/679, Regulation (EU) 2018/1807, Directive 2002/58/EC;
- PSI: Directive (EU) 2019/1024;
- CDSM: DIRECTIVE (EU) 2019/790;
- INSPIRE: Directive 2007/2/EC that define particular limitation on public access for the spatial and geo data.
Given those standards and directives the main topics that were considered for the legal analysis of the datasets are: IPR policies, privacy issues, limitations on public access, licences, economic conditions and temporal aspects of the dataset.
Generally speaking, the licenses used by our sources are:
UNODC (United Nations Office on Drugs and Crime)
There is not explicit information about the license used, but it is explicitly said that “all data and metadata provided on dataUNODC website are available free of charge and may be copied freely, duplicated, and further distributed provided that UNODC is cited as the reference. There is the chance of accessing a Terms of Use and Policy Notice page, even if there is no particularly detailed information.
Eurostat
It is under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. Reuse is authorised as long as the source is acknowledged. The reuse policy of European Commission documents is regulated by Decision 2011/833/EU (OJ L 330, 14.12.2011, p. 39). In the Privacy policy section there is information about Personal Data Protection policy, which is based on Regulation (EU) No 1725/2018 of the European Parliament and of the Council of 23 October 2018 on the protection of natural persons with regard to the processing of personal data by the Union institutions, bodies, offices and agencies and on the free movement of such data (repealing Regulation (EC) No 45/2001 of 18 December 2000 and Decision No 1247/2002/EC).
UNECE (United Nation Economic Commission for Europe)
Data are available free of charge and there is a page regardinf their terms of use. No explicit information is given about the license used, even if it is explicitly reported that users are free to copy, reproduce and redistribute the data for both commercial and non-commercial purposes providing UNECE as the source of data. It is impossible to acceed the Terms of Use and Privacy Policy pages.
World Health Organization (WHO):
The World Health Organization (“WHO”) encourages the public to access and use the data that it collects and publishes on its web site if appropriate credits are given to WHO. The license used is not explicitly defined but WHO grants the royalty-free, worldwide, non-exclusive right to use, reproduce, extract, download, copy, distribute, display or include the Datasets and data contained therein in other products for public health purposes. The Privacy Policy page fully describe the way in which it collects and use personal 
Ethical Analysis
Concerning ethical analysis, we considered different aspects: the centrality of the human being, equality and transparency, reliability and affidability, sustainability and finally the rationality in judgment which means the freedom from cognitive bias. The principles we relied on were the ones proposed by the document “Data Ethics: Principles and Guidelines for Companies, Authorities & Organisations”.
UNODC (United Nations Office on Drugs and Crime)
Even though the specifications regarding the ethical processing of data are not exhaustive on the portal, UNODC is committed to support Member States in the implementation of the 2030 Agenda for Sustainable Development, which “draws together various elements into a comprehensive and forward- looking framework and explicitly recognizes the importance of sustainable development. The 2030 Agenda includes the rule of law and fair, effective and human justice systems, as well as health-oriented responses to drug use confirming that their absence impedes development in countries of all income levels. However, the fact that ethical principles are not spelled out in a specific section of the digital portal and that they have to be deduced and extrapolated limits the perception of how important these are in the context of the UNODC. More information about ethical principles for data processing would be necessary for clarifying the UNODC guidelines.
World Health Organization (WHO):
The World Health Organization promotes transparency and management of corporate-level risk, within the framework of WHO’s ethical principles. It promotes the practice of the ethical principles derived from the international civil service standards of conduct for all WHO staff and associated personnel. WHO has a Code of Ethics and Professional Conduct based on integrity, accountability, independence and impartiality, respect and professional commitment. The document is incredibly detailed and fives all the necessary information regarding their policy of action and their ethical principles.
UNECE (United Nation Economic Commission for Europe)
UNECE datasets are bias free. UNECE bases its work on confidentiality, data security, transparency principles. There is a great transparency in the the access, use and sharing of data. Users can profit of a clear and accessible communication. There is an even greater transparency about how data are shared, linked, and used.
Eurostat
The Statistical Office of the European Union ensure to work in order that users can take advantage of up-to-date, precise and detailed data. The absence of prejudice is specified as long as the freedom from cognitive bias in the processing of data. A detailed document is also provided that gives information on methodological aspects and the related transparency of data processing.
Technical Analysis
Eurostat (Statistical Office of the European Union)
Format: xlxs, SDMX 2.1, tsv, csv, json.
Metadata: A huge amount of metadata is provided, expressed in SDMX format a on its datasets expressed in SDMX format, an ISO standard since 2013 for the exchange of statistical data and metadata in XML format. These metadata contain information on metadata updating, contacts, data presentation, units of measurements, reference period, institutional mandate, confidentiality, distribution policy, frequency of dissemination, accessibility, quality, relevance, accuracy, consistency, cost, review and any comments.
URI/Provenence:
- Prisoners by Age and Sex - number and rate for the relevant sex and age groups
- Prison Capacity and Number of Persons Held
UNECE (United Nation Economic Commission for Europe [Statistical Office of the European Union])
Format: csv, tsv, json.
Metadata: Information and metadata are provided about last updates, contacts, unit used and the type of data, creation date, copyright, source, definitions and countries considered depending on the period.
URI/Provenence:
UNODC (United Nations Office on Drugs and Crime)
Format: png, csv, xlsx, pdf, pptx, twbx.
Metadata: UNODC provides some metadata about recent updates, type of data, creation date and copyright, source.
URI/Provenence:
WHO (World Health Organization)
Format: csv, html, json, xlsx, xml.
Metadata: There is a discrete amount of metadata, concerning updates, sources, country considered and time (period).
URI/Provenence:
Mash-up and final datasets
Followed principles and goals
The JAIL |OD| BREAK project encountered many difficulties in finding accurate and temporally valid, that is, updated data. This is one of the main points on which even the organizations for the protection of prisoners conditions themselves focus: the data are often not exhaustive and above all inhomogeneous. Many countries have valid information and many others a lack or even a complete absence of data.
Considering this complex starting point, we had to find the most relevant data and extrapolate the information we considered useful. In fact, the information was often contained in a single, long and complex dataset, a sort of cauldron in which to provide a lot but unclear and disorganized information. We therefore focused on the most meticulous aspects, hidden between the lines. To cite a few examples, we investigated the presence of medical staff and specifically psychological and psychiatric staff, the quantity of deaths and specifically the prisoners who died of violent death and above all suicide, and so on. An intense work of enrichment with data from multiple platforms, harmonization and mashup was therefore necessary, which led to the creation of our final dataset.
Specifically speaking, in order to create the JAIL |OD| BREAK final mashed-up dataset we decided to embrace and follow the FAIR principles stated by the Guidelines on FAIR Data Management in Horizon 2020. We therefore pursued the idea of making our research data findable, accessible, interoperable and re-usable (FAIR).
Findable: the first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process.
F1. (Meta)data are assigned a unique identifier: both the data we retrieved in the original datasets, the mashed up data and the metadata we created according to the DCAT-AP are compliant with this point, presenting URI.
F2. Data are described with rich metadata: we associated a rich amount of metadata compliant with the DCAT-AP specification, including not only all the mandatory classes with their respective mandatory properties but also some recommended and optional properties that were useful for our data.
F3. Metadata clearly and explicitly include the identifier of the data they describe: for each dataset that is part of a catalogue and for our own dataset we associated to the metadata a unique identifier of the data described by means of the DCAT-AP optional property for datasets dct:identifier.
Accessible: Once the data are found, users need to know how they can be accessed.
A1. (Meta)data are retrievable by their identifier using a standardised communications protocol: All the data we collected and mashed up and the relative metadata are retrievable through the HTTP or its extension HTTPS. Moreover, we provided also an explicit and clear contact protocol in the metadata by means of the names and emails of the data and metadata providers.
A1.1. The protocol is open, free, and universally implementable: HTTP and HTTPS are compliant with these characteristics.
A2. Metadata are accessible, even when the data are no longer available: metadata will remain accessible from the metadata web page of this web resource.
Interoperable: Data usually need to be integrated with other data. In addition, data need to interoperate with applications or workflows for analysis, storage, and processing.
I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation: we used JSON, CSV and XML for the representation of the mashed up data and RDF with the Turtle syntax to describe and structure the metadata.
I2. (Meta)data use vocabularies that follow FAIR principles. We used the ISO 3166-1 alpha-3 standard vocabulary to represent nations, the International Classification of Diseases for the health domain and the Linked Open Data vocabulary specification called DCAT-AP. These vocabularies are documented and resolvable using globally unique and persistent identifiers.
Reusable: the ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.
R1. Meta(data) is richly described with a plurality of accurate and relevant attributes: our data and metadata are described through a rich and vary series of labels including the date of collection and modification of the data, the licence, the publisher, the creator, their content.
R1.1. (Meta)data are released with a clear and accessible data usage license: ODOHTEU datasets are released under the Creative Common License CC BY-SA 4.0, which is specified for the dataset and respective metadata we created.
R1.2. (Meta)data are associated with detailed provenance: our project includes information about the provenance of data in a machine-readable format in the metadata codification.
R1.3. (Meta)data meet domain-relevant community standards: we used the ISO 3166-1 alpha-3 standard for geographic information and the International Classification of Diseases for the health domain.
The above-mentioned principles include 3 types of entities: data, metadata and infrastructure. Given the analysis, we can state that our research data are compliant with the FAIR principles.
Sostenibility over time
The JAIL |OD| BREAK dataset contains datasets that derive from different sources, concerning all the factors necessary to understand prison conditions: prison population (M/F, sentenced/unsentenced, juvenile/adults, foreign/citizens), data on overcrowding, mortality in prison, staff available. The dataset has been created as a project within the Open Access and Digital Ethics course of the Master Degree in Digital Humanities and Digital Knowledge at the University of Bologna: the dataset is therefore not actively maintained, while the datasets used for this project are currently maintained by the relative institutions or organisations. Jail|OD|Break is distributed under Creative Commons Attribution 4.0 International License (CC BY-SA 4.0)
.Visualizations
To allow users to make full use of the data, seven views have been structured with the aim of making the most relevant information immediately accessible. The "philosophy" adopted in the development of visualizations is to follow a simple and essential line to avoid falling into the error of looking for too convoluted visualizations. The ultimate goal is to always provide a picture that is clear and never ambiguous, preferring substance to form.
- Choropleth map that serves as a general overview on prison population, an interactive thematic map, which shows the percentage of male, female, sentenced, unsentenced, adults, juveniles, citizen and foreign prisoners all over Europe. Data are observable from 2003 to 2018.
- Pie chart in which users can have a clear and intuitive overview on the percentages of prisoners selecting a specific country and a certain year according to gender and age information. Data are observable from 2003 to 2017.
- Multi-line chart which allows the selections of countries to be shown in their prison's population evolution over time. Data are observable from 2010 to 2018.
- Bar chart that shows the comparison between unsentenced and sentenced prisoners per country. Data are observable from 2003 to 2017.
- Horizontal bar chart that shows the actual prisoners held per country compared to the prisons capacity. It is one of the most effective and impressive visualizations since it shows how many countries don't respect principles of security and humanity. Data are observable from 2008 to 2018.
- Line chart in which, for every year, the total number of death in countries that actually do a medical document inspection is compared to those that do not. Data are observable from 2010 to 2018.
- Line chart (2) in which, for every year, the total number of deaths in the countries that have an inspection of prison hygiene, nutrition and living conditions is compared with data of those countries that do not do this type of control. Data are observable from 2010 to 2018.
- A specific case, that takes into account data related to year 2016. The visualization shows the number of suicides compared to the number of staff available for each country. The lack of data didn't allow to extend the study for all of the years, but it still seemed quite interesting to experiment this kind of analysis.
The choropleth map has been created using the open source d3.js library, while the other visualizations have been created using amCharts.
RDF assertion of the metadata
In order to reach the objective of giving the user better reusable and interoperable data, their metadata are provided, following the DCAT_AP version 2.0.0 documentation. Metadata are provided for the whole final dataset, but also for each of them individually. The RDF assertion for the metadata, that follows the Turtle serialization, has been released and can be found in each metadata table. Users can also consult it here.
Conclusions
The project is the result of an in-depth and at times stormy analysis. The chosen theme immediately confronted us with important challenges. First of all, as anticipated in the introduction, we had to search and analyze often outdated data, try to fill in the missing information in a given dataset by extrapolating it from others. In this journey through the prison conditions in Europe, however, we have achieved important goals, also thanks to these aforementioned shortcomings. We have created a patchwork including the various elements of the prison environment. We realized how the prison population, mainly made up of men, is not increasing dramatically, but we have grudgingly realized how it often lives in a condition of overcrowding, in many European countries. We have analyzed when the mortality rate is higher in countries where there are no in-depth checks with respect to the state of health and living conditions inside the structures. Finally, we have drawn the bitter conclusion that the prison population still lives in a pseudo-darkness, it exists but does not exist, it is judged but not protected. The system, across Europe, must continue to improve so that prisons truly become a place for rehabilitation and not merely punitive.