HiRID, a high time-resolution icu dataset. Anonymization procedure

HiRID, a high time-resolution icu dataset. Anonymization procedure

Posted Variation: 1.0


HiRID is really a easily available care that is critical containing data associated with very nearly 34 thousand patient admissions into the Department of Intensive Care Medicine associated with the Bern University Hospital, Switzerland (ICU), an interdisciplinary 60-bed device admitting >6,500 clients each year. The ICU provides the complete selection of contemporary interdisciplinary intensive care medication for adult clients. The dataset was created in cooperation involving the Swiss Federal Institute of tech (ETH) ZГјrich, Switzerland additionally the ICU.

The dataset contains de-identified information that is demographic a total of 681 regularly gathered physiological factors, diagnostic test outcomes and therapy parameters from very nearly 34 thousand admissions throughout the duration. Information is saved with an uniquely high time quality of 1 entry every 120 seconds.


Critical infection is seen as a the existence or chance of developing organ dysfunction that is life-threatening. Critically sick clients are usually looked after in intensive care units (ICUs), which focus on supplying monitoring that is continuous advanced therapeutic and diagnostic technologies. This dataset ended up being gathered during routine care in the Department of Intensive Care Medicine associated with the Bern University Hospital, Switzerland (ICU), an interdisciplinary 60-bed product admitting >6,500 clients each year. It had been initially removed to aid a research in the very very early forecast of circulatory failure into the intensive care device machine learning 1 that is using. The documentation that is latest when it comes to dataset is available2.


The HiRID database includes a selection that is large of routinely collected data relating to patient admissions to your Department of Intensive Care Medicine for the Bern University Hospital, Switzerland (ICU). The information had been removed from the ICU Patient information Management System which will be familiar with register that is prospectively wellness information, dimensions of organ function parameters, link between laboratory tests and therapy parameters from ICU admission to discharge.

Dimensions from bedside monitoring

Dimensions and settings of medical products such as for example technical ventilation

Findings by healthcare providers e.g.: GCS, RASS, urine along with other fluid production

Administered drugs, liquids and nourishment

HiRID has an increased time quality than many other posted datasets, above all for bedside monitoring with many parameters recorded every two minutes.

So that the anonymization of people into the information set, we implemented the procedures effectively sent applications for the MIMIC-IIwe and Amsterdam UMC db dataset, which adopted the ongoing health Insurance Portability and Accountability Act (HIPAA) secure Harbor needs and, when it comes to Amsterdam UMC db, additionally europe’s General information Protection Regulation (GDPR) standards 3,4.

Elimination of all eighteen distinguishing information elements placed in HIPAA

Times were shifted by way of a random offset so that the admission date lies. We made sure to protect the seasonality, time of time therefore the day’s week.

Individual age, weight and height are binned into containers of size 5. The max bin is 90 years and contains also all older patients for patient age.

Dimensions and medicines with changing devices as time passes had been standardised towards the unit that is latest utilized. This standardization ended up being essential to produce a summary about calculated admission times, on the basis of the devices found in a particular client, impossible.

Complimentary text ended up being taken out of the database

k-anonymization was applied on patient age, weight, sex and height.

Ethical approval and consent that is patient

The review that is institutional (IRB) associated with the Canton of Bern authorized the research. The necessity for acquiring informed patient consent had been waived due to the retrospective and nature that is observational of research datingrating.net/catholicmatch-review.

Information Description

The data that are overall for sale in two states: as natural information and/or as pre-processed information. Also you will find three guide tables for adjustable lookup.

Guide tables

adjustable guide – guide table for factors (for natural phase)

ordinal adjustable guide – guide dining dining table for categorical/ordinal variables for string value lookup

pre-processed adjustable guide – guide dining table for factors (for merged and imputed phase)

Natural information

The raw information was just processed if this is necessary for patient de-identification and otherwise left unchanged set alongside the source that is original. The origin information offers the complete group of available factors (685 factors). It consist of the after tables:

Preprocessed information

The pre-processed information is made from intermediary pipeline stages from the accompanying book by Hyland et1 that is al. Supply factors representing the exact same medical principles had been merged into one meta-variable per concept. The information provides the 18 many predictive meta-variables just, as defined within our book. Two various phases regarding the pipeline can be obtained

Merged phase supply factors are merged into meta-variables by medical ideas e.g. non-opioid-analgesics. The full time grid is kept unchanged and it is sparse.

Imputed phase the info through the merged stage is down sampled up to a five-minute time grid. Enough time grid is full of imputed values. The imputation strategy is complex and it is talked about when you look at the initial book.

The code utilized to build these phases are available in this GitHub repository beneath the preprocessing folder 5.

Which information to utilize?

The pre-processed information is intended primarily being a way that is quick jump-start a task or even for used in an evidence of concept. We advice utilising the supply data whenever feasible for regular tasks. It’s the many versatile kind possesses the whole group of factors into the initial time quality.

Information platforms

Information is obtainable in two platforms: CSV for wide compatibility and Apache Parquet for convenience and gratification.

Considering that the information sets are fairly big, they truly are split up into partitions, in a way that they may be prepared in parallel in a simple means. The lookup dining dining dining dining table mapping patient id to partition id is supplied within the file called combined with the information. The partitions are aligned involving the various information sets and tables, so that the info of an individual can invariably be located within the partition utilizing the exact same id. Note however, that an individual may well not take place in all data sets, e.g. a patient may be lacking within the data that are preprocessed because someone did not meet with the demographic requirements become within the research.

Patient ID / ICU admission

The dataset treats each ICU admission uniquely which is extremely hard to determine numerous ICU admissions as originating from the patient that is same. A unique “Patient ID” is generated for each ICU ( re-)admission.

Information schemata

The schemata each and every dining table are located in the *schemata.pdf* file.

Use Records

Whilst the database contains detailed information about the care that is clinical of, it should be addressed with appropriate care and respect.

Scientists have to formally request access via PhysioNet. To be provided access, the user needs to be described as a credentialed PhysioNet user, digitally signal the information Use Agreement and offer a certain research concern.

Conflicts of Interest

The writers declare no disputes of great interest


Access Policy: Only PhysioNet credentialed users whom signal the specified DUA have access to the files.