FindZebra online search delving into rare disease case reports using natural language processing

Valentin Liévin; Jonas Meinertz Hansen; Allan Lund; Deborah Elstein; Mads Emil Matthiesen; Kaisa Elomaa; Kaja Zarakowska; Iris Himmelhan; Jaco Botha; Hanne Borgeskov; Ole Winther

doi:10.1371/journal.pdig.0000269

Abstract

Early diagnosis is crucial for well-being and life quality of the rare disease patient. Access to the most complete knowledge about diseases through intelligent user interfaces can play an important role in supporting the physician reaching the correct diagnosis. Case reports may offer information about heterogeneous phenotypes which often further complicate rare disease diagnosis. The rare disease search engine FindZebra.com is extended to also access case report abstracts extracted from PubMed for several diseases. A search index for each disease is built in Apache Solr adding age, sex and clinical features extracted using text segmentation to enhance the specificity of search. Clinical experts performed retrospective validation of the search engine, utilising real-world Outcomes Survey data on Gaucher and Fabry patients. Medical experts evaluated the search results as being clinically relevant for the Fabry patients and less clinically relevant for the Gaucher patients. The shortcomings for Gaucher patients mainly reflect a mismatch between the current understanding and treatment of the disease and how it is reported in PubMed, notably in the older case reports. In response to this observation, a filter for the publication date was added in the final version of the tool available from deep.findzebra.com/<disease> with <disease> = gaucher, fabry, hae (Hereditary angioedema).

Author summary

Rare diseases affect a substantial part of the population. However, they are especially challenging to diagnose. Because of their rarity, physicians often ignore rare diseases in the differential diagnosis. When confronted with hard-to-diagnose patients, physicians often turn to online resources like Google or PubMed, which index both general disease information as well as case reports. Case reports are a unique asset in helping the diagnosis of rare diseases because they often present with a varied and complex phenotype, which might not appear in the general literature. Nonetheless, searching for patient-relevant case reports is challenging. A tool dedicated to searching case reports assisting diagnosis is still missing because general-purpose search engines, like PubMed search, are primarily set up for literature search and because advanced search tools like FindZebra do not handle case reports.

In this study, we present a novel online search tool https://deep.findzebra.com/ dedicated to searching PubMed case reports based on a patient description (age, sex, symptoms, negative findings, etc.). Two medical experts evaluated the tool on forty challenging cases (twenty Fabry and twenty Gaucher). To our knowledge, this is the first specialized search tool for case reports that is built to assist diagnosis. This study provides a clear recipe for building and validating modern medical information retrieval systems to index and search complex and heterogeneous data.

Citation: Liévin V, Hansen JM, Lund A, Elstein D, Matthiesen ME, Elomaa K, et al. (2023) FindZebra online search delving into rare disease case reports using natural language processing. PLOS Digit Health 2(6): e0000269. https://doi.org/10.1371/journal.pdig.0000269

Editor: Ewen M. Harrison, University of Edinburgh, UNITED KINGDOM

Received: September 27, 2022; Accepted: May 3, 2023; Published: June 29, 2023

Copyright: © 2023 Liévin et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: This project relies on two data sources: a collection of PubMed abstracts (https://huggingface.co/datasets/findzebra/case-reports) and clinical Takeda owned Outcome Survey data for Fabry (FOS) and Gaucher (GOS). De-identified records from 4484 Fabry and 1095 Gaucher patients served as the basis for selecting 20 Fabry and 20 Gaucher patients with atypical symptoms used in the expert validation. These 40 patient cases are available in S1 Data. Inquiries about the Takeda Outcome Survey can be addressed to GMA.Research@Takeda.com.

Funding: The study was funded by Takeda. OW and VL are supported by the Novo Nordisk Foundation (NNF20OC0062606) and DeepMind/Google through their employment at UCph and DTU.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: KE, IH and JB are employed by Takeda and hold Takeda stocks/stock options. HB is a former employee in Takeda Pharma A/S, Denmark, holding a current position at Department of Clinical Pharmacology, Aalborg University Hospital in Denmark. KZ was employed by Takeda at the time of the study and holds Takeda stocks/stock options. OW, MM, VL and JH are employed by FindZebra, which received funding from Takeda for conducting the study. AL received reimbursement from FindZebra for clinical expertise in this study. AL reports also personal consultancy fees and travel grants from Takeda during the study, as well as grant support paid to his institution. DE received consultancy fees from Takeda for clinical expertise in this study.

Introduction

A disease is considered rare when it affects: in Europe 1:2000 and in the US about 1:1600 people. Currently, there are more than 6000 distinct rare diseases in the EU [1]. Around 80% of rare diseases are of genetic origin and, of those, 70% manifest already in childhood. Many rare diseases are chronic, progressive, and life-threatening. Early diagnosis may save lives, slow disease progression and/or prevent further irreversible organ damage, and ultimately improve the quality of life for these patients. A recent population-based telephone survey in Germany revealed a median duration of the diagnostic delay of 20 or more years for some rare lysosomal storage disorders (LSD) [2].

The diagnostic odyssey is complicated since the signs and symptoms may intuitively implicate more common pathologies, and most primary care professionals may have no experience with any of these disorders. Moreover, recognizing the disease trajectory is also complicated by variable ages of onset, the progressive natural history with change in manifestations with age, variable presentation of multiple and diverse organs and tissues, and the lack of awareness of specialised diagnostic markers. In today’s era of quick access to the internet and social media it is becoming equally true that patients find their diagnosis acting upon irrelevant information garnered from trawling these sources [3,4]. Two challenging rare diseases with complex phenotypes, a sufficient amount of patients available and good expert understanding are Fabry and Gaucher.

Fabry disease is a rare X-linked inherited LSD [5]. Major morbidity of several organ systems often begins in childhood with gastrointestinal, dermatological, and often ocular signs; as patients age, cardiac and renal disease may rather rapidly culminate in end-stage organ failure; stroke and other cardiovascular events that are life-threatening are also common. Early intervention with disease-specific therapies, enzyme replacement therapies (ERT), or pharmacological chaperones (PC) is critical and universally recommended before the development of more devastating and irreversible renal, cardiac, and/or cerebrovascular signs [6].

Gaucher disease has a clinical spectrum from a perinatal lethal neuronopathic form (type 2), to a form that has variable neurological and visceral signs (types 3a, 3b, and 3c), to a chronic, non-neuronopathic form (type 1), where some patients are truly asymptomatic through-out their normative life-spans but others suffer from mild to severe visceral and haematological signs that appear variously anywhere from childhood to old age [7]. All these patients can benefit from early administration of several disease-specific treatment options to some extent: ERT and SRT will improve visceral and haematological signs as well as well-being; the SRTs and PCs can partly impact the neurological trajectory in the neuronopathic forms as well as the encroaching signs of Parkinson disease and other Lewy Body Dementia symptoms in type 1 patients [8,9].

Given the diagnostic delay, there are only a few published algorithms to assist in earlier diagnosis for either disease and none of them have been deployed into clinical use [10–13].

PubMed is the most complete information source on medical scientific knowledge. It comes with good information retrieval capabilities but is not designed for aiding diagnosis. This is demonstrated when benchmarking on medical cases against dedicated rare disease tools like FindZebra [14,15]. FindZebra, a search engine that has been popular within the medical community for the last past ten years, indexes a collection of curated disease articles from sources such as OrphaNet, OMIM and Wikipedia. The articles mostly describe generic disease phenotypes and therefore FindZebra only offers limited coverage of the more exceptional disease phenotypes, which are described in the specialised medical literature. A tool that applies advanced information retrieval techniques to searching the specialised literature has to our knowledge so far been missing.

The case reports registered on PubMed are great candidates for extending the coverage of FindZebra. Over two million case reports are registered, and each one of them covers a unique medical case. Case reports could be used to improve the clinical management of today’s patients, for instance by tailoring the treatment to the patient profile, or by supporting the healthcare providers in their disease management choices [16,17]. However, case reports show higher variability in style and quality than the curated articles gathered from OrphaNet, OMIM, Wikipedia, etc. Searching case reports is thus a more complex task and care must be taken to prioritize the information that is essential to recognize disease phenotypes in both the case reports and the search queries (age, sex, symptoms, genetics, negative findings, etc.). The proposed tool relies on two main components (Fig 1) built on publicly available tools (PubMedBERT [18] and Solr [19]):

a deep learning model that allows transforming unstructured text documents into structured patient profiles
a full-text search engine (Solr) that allows searching for similar patients across the profile dimensions.

Download:

Fig 1. Converting unstructured PubMed abstract into a structured search index using 1) text segmentation and 2) a Solr instance with composite fields corresponding to the text segmentation categories.

https://doi.org/10.1371/journal.pdig.0000269.g001

In this study, we evaluate the tool based on real Fabry and Gaucher patients to identify issues particular to the task that will be essential for rolling the tool out to all rare diseases. In summary, the main contributions of the study are: 1) a recipe for building and validating an information retrieval system for heterogeneous medical information and 2) the search tool made available at deep.findzebra.com.

Material and methods

We discuss how the novel search tool integrates with the existing FindZebra.com. We present the collection of case report abstracts indexed by our system before detailing the clinical data used to evaluate the tool. We conclude by describing the development of the search engine (segmentation of the abstracts and search ranking algorithm setup) and last the setup of the expert validation.

FindZebra workflow

FindZebra.com allows searching across a collection of curated medical articles. For canonical disease profiles, this step is sufficient to find information that is relevant to the patient. For rare phenotypes, this new tool allows “diving in” the large pool of case reports within a particular disease to retrieve case reports that match the patient profile (Fig 2).

Download:

Fig 2. Workflow for retrieving documentation relevant to atypical patients.

The patient presents with two typical findings where one leads to identification of Disease A. The case report search (the contribution of this paper) leads to a case report with the same atypical symptom combination.

https://doi.org/10.1371/journal.pdig.0000269.g002

Data

PubMed case reports.

We collected case reports from 803 PubMed articles for Fabry disease and 883 for Gaucher disease. For each article, we retained only the abstract, which in most cases summarises information about the case at hand. We detail the data collection process in S1 Text.

Clinical data—Fabry and Gaucher Outcomes Surveys.

We based the study on the real-world data from long-term observational Fabry and Gaucher Outcomes Surveys, FOS and GOS respectively. FOS and GOS aim at improving the clinical management of patients (see the S1 Text for further details). This was a non-interventional study, limited to the use of readily available data. It did not involve recontacting patients, and the informed consents, captured for the original FOS and GOS, allowed the use of their data for the validation purposes of this study. Data was anonymized by removing all information that could potentially identify a patient; a new randomization number was assigned to the Patient ID, all other potential identifiers, such as country, site name and date of birth were removed, as well as all other dates, e.g. visit dates and dates of laboratory assessments.

Records from 4484 Fabry patients and 1095 Gaucher patients were collected. Each record features demographic information (age, sex and mutation when available), a list of signs and symptoms and a quantitative evaluation of the relevant organs (Fabry: eGFR, LVMI; Gaucher: liver size, spleen size, haemoglobin value and platelet count).

We selected two anonymized patients, called patient F for Fabry and patient G for Gaucher. Their records are presented in Table 1. Throughout the text, we use patients F and G to showcase the data processing and evaluation steps.

Download:

Table 1. Examples of survey data for Fabry and Gaucher patients.

https://doi.org/10.1371/journal.pdig.0000269.t001

Converting survey entries to search queries.

We converted the survey data (tabular format) into full-text queries and numerical features into the corresponding signs using reference tables [20,21]. For instance, a low platelet count value was translated as “thrombocytopenia” whereas a normal value is converted as “no thrombocytopenia”. The resulting textual features are combined into a comma-separated list of terms, see examples in Table 2.

Download:

Table 2. Example of generated queries.

https://doi.org/10.1371/journal.pdig.0000269.t002

We designed a segmentation model that transforms raw text into structured representations by extracting non-overlapping spans of text. Each span—or segment—is labelled using eight categories, which we summarise in Table 3. Each category is selected to represent a particular clinical feature that might be useful for diagnosis.

Download:

Table 3. Segmentation categories.

Each category represents a dimension of the patient profile. The categories are used to index the case reports and parse the queries.

https://doi.org/10.1371/journal.pdig.0000269.t003

Text segmentation

The model builds upon a domain-specific masked language model [22], PubMedBERT [18], which follows the same architecture as the popular BERT model [23]. BERT allows computing contextual language representations, which we augmented with a conditional random field likelihood to improve the local coherence of the segments [24].

We randomly selected and labelled 100 abstracts for each disease. Each of the 200 abstracts were manually labeled into text segments. We chose to label spans of text such that each span of text encapsulates a single clinical feature completely. This results in segments of text that might overlap multiple text entities (see examples in Table 3). 20 documents were set aside for testing and the remaining documents were used for training (160) and validation (20). We detail the fine-tuning process in S1 Text. Our implementation relies on popular machine learning libraries [25–27].

We used the same model for both parsing the user queries and indexing the abstracts. To make the model robust to both the abstracts and the comma-separated queries, we augmented the training data by swapping abstracts with pseudo-queries for half the training iterations. Pseudo-queries were obtained by concatenating N ~ Poisson(λ = 5) segments extracted from the replaced abstract.

Search engine

The search engine is built on a composite Solr index for each disease separately. The ranking function is a weighted combination of the BM25 scores computed across each segmentation category. We detail the configuration and design of the index in S1 Text.

We analysed the corpora based on the extracted profiles. We used the SciSpaCy library to link symptoms to the Unified Medical Language System (UMLS) entities [28,29]. We report the frequency of symptoms as well as a summary of the demographic data in S1 Text.

Validation protocol

The case report search engine is specifically designed for the use cases where the diagnosis is established or suspected but a deeper understanding of the phenotype is needed. Therefore, we focused on the subset of patients with atypical symptoms, which we defined in this study as the symptoms occurring in less than 10% of the population.

Step 1—Preliminary non-expert validation.

During development, we inspected the quality of retrieval based on simple tests. For a selection of patients, we tested if the retrieved articles corresponded to the age, sex, mutations (if any) and the domain of symptoms (e.g., skeletal, psychiatric involvement, …).

Step 2—Expert validation.

Two rare disease experts (DE and AL) evaluated the relevance of the search results given for the 20 patients for each disease. The anonymized data for the 40 patients is available in S1 Data. We selected patients with atypical phenotypes and diverse disease profiles. The experts were asked to evaluate the clinical relevance of each of the top three returned articles using a scale from one to five and using a text field. For each retrieved document, we report the maximum grade given by the two experts. For each patient, we report the precision for the top three results (P@3) based on the maximum grade and for multiple relevance thresholds. In S1 Text, we provide further details about the patient selection, the rating scale, the evaluation interface and experts’ agreement.

Step 3—Population and corpus level analysis.

To gain a better understanding of the diseases, their differences and how it affects retrieval, we analysed how the search engine maps the population of patients to the corpus of PubMed articles. We built a bipartite graph, using patients and articles as nodes, and created edges if an article was retrieved as top three. We used the resulting network to study the relationship between patients and PubMed articles.

Ethics statement

This was a retrospective, non-interventional study, limited to the use of readily available patient data in Takeda-owned Fabry and Gaucher Outcomes surveys (FOS and GOS, respectively). The written informed consents obtained from the patients participating in the original FOS and GOS, allowed the use of their data for the validation purposes of this study which did not involve recontacting patients. Therefore, this study was not a subject for Ethics Committee approval. Data was furthermore anonymized by removing all information that could potentially identify a patient; a new randomization number was assigned to the Patient ID, all other potential identifiers, such as country, site name and date of birth were removed, as well as all other dates, e.g. visit dates and dates of laboratory assessments.

Results

This section begins with a quantitative and qualitative evaluation of the final segmentation model. It continues with a description of the population of the selected patients. We then review the case report search tool in three acts: i) we display the case of two patients, ii) we present the expert review and iii) we illustrate how the search tools maps the cohort of patients to the PubMed corpus. As presented in S1 Text, the two diseases exhibit different profiles of symptoms, which supports the need to adapt the ranking function to each disease.

Text segmentation The final model scored 0·75 F1 score on the test set (0·76 F1 score on the validation set). The model appeared to be robust to a wide diversity of case reports and user queries. In Fig 3, we present a Fabry case report segmented using the final model. In Table 4, we present an example of a segmented query. In S1 Text, we present three additional labelled abstracts: one for Fabry, one for Gaucher and one out-of-domain example (COVID-19, see Supplement III).

Download:

Fig 3. Segmentation example of the article (test set): “Two cases of Fabry’s disease: A hemizygote with a point mutation in the alpha-galactosidase A gene and his relative [30]”.

Each colour corresponds to one of eight segmentation categories listed in Table 3.

https://doi.org/10.1371/journal.pdig.0000269.g003

Download:

Table 4. Example of segmented query.

https://doi.org/10.1371/journal.pdig.0000269.t004

Clinical data and queries

We summarise the demographic features as well as the distribution of symptoms in S1 Text. We found that the populations of patients from the survey and from the PubMed articles follow similar demographics. Furthermore, we found that symptoms stated in the records were often discussed in the PubMed corpora.

The retrieval workflow (Fig 2) has two steps. For completeness, we also evaluate step 1 (FindZebra search) for all patients in the two Surveys. The correct diagnosis appeared in the top ten search results for 68·4% of the Fabry patients, whereas this was the case for only 21·7% of the Gaucher patients (see S1 Text for further details).

Subsets of typical and atypical patients

The medical expert validation of rare disease search (step 2 in Fig 2) focuses on the group of patients with atypical symptoms. Using the criteria defined in the previous section, we labelled 56% of the Fabry patients and 64% of the Gaucher patients as atypical. In Table 5, we report the number of patients for each group, the proportion of patients treated for Fabry or Gaucher and the mean number of symptoms recorded for each patient. We found that atypical patients have on average twice the number of symptoms and were more likely to be treated than the typical ones, indicating a more serious form of the disease.

Download:

Table 5. Statistics for the groups of typical and atypical patients.

https://doi.org/10.1371/journal.pdig.0000269.t005

Expert validation

We found considerable disparities in the evaluation of the two diseases. Whereas retrieval was judged to be effective when applied to the Fabry patients, articles were more often judged as irrelevant for the Gaucher patients. We report the precision in Table 6 and display the distribution of maximum grades given to each article in Fig 4.

Download:

Table 6. Precision given 3 retrieved articles per patient (20 patients).

Each document is labelled as relevant if the maximum rating given by the two experts (AL and DE) is greater or equal to the threshold. The rating scale is described in S1 Text.

https://doi.org/10.1371/journal.pdig.0000269.t006

Download:

Fig 4. Distribution of scores assigned to each document for each disease (Fabry disease left and Gaucher disease right).

For 20 patients per disease, we retrieve the top-3 abstracts. For each article, we use the maximum among the two expert ratings as evaluation score.

https://doi.org/10.1371/journal.pdig.0000269.g004

In the case of the Fabry patients, a minority of articles were judged irrelevant (11·7% of patients were assigned with a maximum rating lower than three) and 51·7% of the search results were graded with a maximum rating of at least four. The Gaucher patients were more difficult to match with relevant case reports, as 65·0% of the articles were rated one. Only 8·3% of the articles received at least one grade above three.

The experts’ comments revealed six failure patterns listed in Table 7. The most common cause of failure (2 for Fabry, 18 for Gaucher) was attributed to retrieving articles that presented a radically different clinical picture, despite sharing a few symptoms and/or demographic features with the referenced patient. The second most prevalent cause of failure was associated with returning abstracts that were no longer considered valid by the medical experts. Other identified causes were diagnosis mismatch (failure pattern #3), missing data about the reference patient (#4), symptom mismatch (failure pattern #5), and age mismatch (failure pattern #6).

Download:

Table 7. Failure patterns.

Number of identified failure patterns for each disease.

https://doi.org/10.1371/journal.pdig.0000269.t007

Case study

In S1 Text, we illustrate the whole retrieval and validation process based on patients F and G. For each patient, we present the raw data, the segmented queries, and the retrieved abstracts associated with their corresponding expert ratings and comments.

Population and corpora

To illustrate how the search scoring function maps a population of patients to the PubMed case reports, we sampled a subset of 500 patients for each disease to exclude the effect of the population size on the analysis. We created a patient-article network for each disease using the top three retrieved articles and the subset of patients. Both networks are visualised in Fig 5. We provide an analysis of the networks in S1 Text.

Download:

Fig 5. Visualization of the patient-article networks for Gaucher (left) and Fabry (right).

Nodes represent patients and articles; edges are drawn if an article is retrieved as the top three for a given patient. Each colour corresponds to a cluster of patients and articles. This visualization shows how the population of patients maps to the corpus of articles. The Fabry network shows a higher degree of clustering than the Gaucher Network.

https://doi.org/10.1371/journal.pdig.0000269.g005

We recorded the number of retrieved articles for each disease as well as the mean number of patients connected to each article. We found significant differences between the two diseases: Fabry articles were connected to 7·6 patients on average (with a total of 295 retrieved articles) whereas Gaucher articles were connected to 3·9 patients per article (with a total of 386 articles).

Discussion

We have built a search engine that allows searching case reports based on patient features that are automatically extracted from the query and the indexed reports using deep learning. The tool allows case reports based on multiple features (sex, age, gender, mutation, symptoms, etc), which performs robustly thanks to the simple BM25-based design. Nonetheless, we tested more than semantic overall: we evaluated if the top three search results were clinically relevant according to medical experts. The articles were more often judged as clinically relevant for Fabry patients than for Gaucher patients, and we felt, looking at our results, that this could be partly explained by the dichotomy in explanations of the clinical manifestations and by the divergence in disease-specific management options of these two diseases.

Fabry disease is an X-linked disorder which implies that the male patients are generally more severely affected and at an earlier age than females plus there is also the impact of the various mutations that may be predictive of a specific phenotypic expression in both genders. On the other hand, Gaucher disease is usually divided into genotypes, and each genotype might lead to radically different disease trajectories (e.g., lethal, severe neuropathic genotype versus mild, non-neuropathic genotype). Therefore, matching Gaucher patients was more challenging, because the greater diversity of profiles made it easier to miss, and because age and sex were not as informative as in Fabry (Gaucher is not X-linked).

This highlights the limitations of our ranking function. In some cases, we found the ranking function to be misaligned with the expert judgement, as it placed too little weight on sex (failure pattern #6), or on symptoms (failure pattern #5). In other cases, failure was attributed to BM25, as it only allows handling symptoms independently, thus failing to grasp the whole clinical picture. This might lead to placing too much weight on a few rare terms (“buzz words”), which is linked to failure pattern #1.

Furthermore, we found that many retrieved articles were outdated, especially in the case of the Gaucher disease (failure pattern #1), this is explained by the recent changes in the clinical management of the two diseases. In Fabry disease, the triad of end-stage organ failure of renal, cardiac, and cerebrovascular events in patients may be partly prevented, but they are still irreversible once established. The ramifications on the other organ systems remain poorly controlled, which may also impact quality of life and longevity, so that patients today face many of the same challenges as those of decades past. However, disease management in Gaucher disease has been transformed. Several new modalities of therapeutics can now assure normative function by reversing visceral signs and symptoms in the non-neuropathic patients. Furthermore, the disease phenotypes have evolved due to a tendency to diagnose earlier and due to longer survival. Ultimately, this was seen in our study which underscored the explosion of recent developments for the several types of Gaucher, so that older case reports were of limited value for patients being seen today.

This uncovers a broader problem that not all case reports contain valid information. Whereas validating the content of case reports is challenging, recency can be easily controlled. For the released version of the search tool, we added the possibility of filtering on publication date.

The validation protocol was designed to mimic the real-world use of our tool, but a discrepancy remains between the evaluation setup and the real-world usage. First, we used all the recorded data for each patient, whereas in a clinical context, the healthcare professionals rely only on the subset of the features that might be relevant in that particular clinical setting. Second, the generated queries only contained information about age, sex, mutation, symptoms and negative findings. Our tool handles additional profile dimensions such as medications, family history and ethnicity (which is a critical factor in the management of Fabry and Gaucher diseases). Third, whereas information systems are traditionally evaluated using the top ten results, we restricted the evaluation to the top three results. Scrolling past the top three results might be required in the more difficult cases such as the ones observed in the Gaucher evaluation. Lastly, we acknowledge the challenge and limitations of data anonymization especially in the rare disease space. However, as the FOS and GOS patient data collected for the purpose of this study is global and consists of relatively high numbers of patients, we consider anonymization sufficient.

Conclusion

We have built a search engine specialised for case reports and submitted it to a thorough expert validation process using real-world clinical Outcomes Surveys data. To the best of our knowledge, our tool pioneers the task of indexing and retrieving cases reports with the aim of aiding diagnosis of rare diseases. It allows browsing large quantities of case reports natural language and clinical descriptions using a structured ranking function (age, sex, mutation, symptom, etc…). Our approach details a general approach that can be used to make the clinical literature more readily useful for healthcare practitioners.

Based on real-world rare diseases information, we found that retrieved articles were often clinically relevant for the Fabry patients, whereas articles retrieved for the Gaucher patients had less clinical value. Further analysis of the expert comments and the patient-abstract network allowed us to identify shortcomings associated with our method. Our study highlights the gap that remains between modern search technologies and clinical practice.

Although this study was restricted to the Fabry and Gaucher diseases, we will now focus on scaling the process to all rare diseases recorded at FindZebra.com. We learned from the expert evaluation and will use their feedback to improve our tool, by adding a temporal filter to the query field. It is hoped that our tool will be used in the field to help the healthcare professionals to improve the clinical management of the many patients who suffer from rare diseases. Our tool will remain publicly available.

Supporting information

S1 Text. The docx file contains the supplementary information referenced in the main text.

https://doi.org/10.1371/journal.pdig.0000269.s001

(DOCX)

S1 Data. This zip file contains the 40 patient cases in JSON format (20 Fabry and 20 Gaucher) used for the evaluation.

https://doi.org/10.1371/journal.pdig.0000269.s002

(ZIP)

References

1. Rare diseases. (n.d.). Retrieved April 4, 2022, from https://ec.europa.eu/health/non-communicable-diseases/steering-group/rare-diseases_en.
2. Mengel E, Gaedeke J, Gothe H, Krupka S, Lachmann A, Reinke J, et al. The patient journey of patients with Fabry disease, Gaucher disease and Mucopolysaccharidosis type II: A German-wide telephone survey. PLoS One. 2020;15: e0244279. eCollection 2020. pmid:33382737.
- View Article
- PubMed/NCBI
- Google Scholar
3. Nicholl H, Tracey C, Begley T, King C, Lynch AM. Internet Use by Parents of Children With Rare Conditions: Findings From a Study on Parents’ Web Information Needs. J Med Internet Res. 2017; 19: e51. pmid:28246072
- View Article
- PubMed/NCBI
- Google Scholar
4. Wake Forest Baptist Medical Center. "Internet can be valuable tool for people with undiagnosed rare disorders." ScienceDaily 2019 Aug 7. <www.sciencedaily.com/releases/2019/08/190807144400.htm>.
5. Kok K, Zwiers KC, Boot RG, Overkleeft HS, Aerts JMFG, Artola M. Fabry Disease: Molecular Basis, Pathophysiology, Diagnostics and Potential Therapeutic Directions. Biomolecules. 2021;11: 271. pmid:33673160.
- View Article
- PubMed/NCBI
- Google Scholar
6. Hughes DA, Aguiar P, Lidove O, Nicholls K, Nowak A, Thomas M, et al. Do clinical guidelines facilitate or impede drivers of treatment in Fabry disease? Orphanet Journal of Rare Diseases. 2022;17: 42. pmid:35135579
- View Article
- PubMed/NCBI
- Google Scholar
7. Zimran A, Elstein D. Lipid storage diseases. In: K. Kaushansky, M, Lichtman, J Prchal, M.M. Levi, O. Press, L. Burns, M. Caligiuri (Eds.), Williams Hematology, 9th edition; New York: McGraw-Hill, Chapter 72 (2016).
8. Revel-Vilk S, Szer J, Mehta A, Zimran A. How we manage Gaucher Disease in the era of choices. Br J Haematol. 2018;182: 467–480. pmid:29808905
- View Article
- PubMed/NCBI
- Google Scholar
9. Mehta A, Kuter DJ, Salek SS, Belmatoug N, Bembi B, Bright J, et al. Presenting signs and patient co-variables in Gaucher disease: outcome of the Gaucher Earlier Diagnosis Consensus (GED-C) Delphi initiative [published correction appears in Intern Med J. 2019 Aug;49(8):1059]. Intern Med J. 2019;49: 578–591.
- View Article
- Google Scholar
10. Mehta A, Rivero-Arias O, Abdelwahab M, Campbell S, McMillan A, Rolfe MJ, et al. Scoring system to facilitate diagnosis of Gaucher disease. Intern Med J. 2020; 50: 1538–1546. pmid:33174353.
- View Article
- PubMed/NCBI
- Google Scholar
11. Savolainen MJ, Karlsson A, Rohkimainen S,Toppila I, Lassenius MI, Vaca Falconi C, et al. The Gaucher earlier diagnosis consensus point-scoring system (GED-C PSS): Evaluation of a prototype in Finnish Gaucher disease patients and feasibility of screening retrospective electronic health record data for the recognition of potential undiagnosed patients in Finland. Molecular Genetics and Metabolism Reports. 2021;21: 100725. pmid:33604241
- View Article
- PubMed/NCBI
- Google Scholar
12. Jefferies JL, Spencer AK, Lau HA, Nelson MW, Giuliano JD, Zabinski JW, et al. A new approach to identifying patients with elevated risk for Fabry disease using a machine learning algorithm. Orphanet J Rare Dis. 2021 20;16: 518. pmid:34930374.
- View Article
- PubMed/NCBI
- Google Scholar
13. Andrade-Campos MM, de Frutos LL, Cebolla JJ, Serrano-Gonzalo I, Medrano-Engay B, Roca-Espiau M, et al. Identification of risk features for complication in Gaucher’s disease patients: a machine learning anal NNF20OC0062606ysis of the Spanish registry of Gaucher disease. Orphanet J Rare Dis. 2020;15: 256. .
- View Article
- Google Scholar
14. Dragusin R, Petcu P, Lioma C, Larsen B, Jørgensen HL, Cox IJ, et al. FindZebra: a search engine for rare diseases. Int J Med Inform. 2013;82: 528–538. Epub 2013 Feb 23. pmid:23462700.
- View Article
- PubMed/NCBI
- Google Scholar
15. Svenstrup D, Jørgensen HL, Winther O. Rare disease diagnosis: A review of web search, social media and large-scale data-mining approaches. Rare Diseases. 2015;3:1. pmid:26442199
- View Article
- PubMed/NCBI
- Google Scholar
16. Kawamoto K, Houlihan CA, Balas EA, Lobach DF. Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success. BMJ. 2005;330: 765. Epub 2005 Mar 14. pmid:15767266
- View Article
- PubMed/NCBI
- Google Scholar
17. Garg AX, Adhikari NKJ, McDonald H, Rosas-Arellano MP, Devereaux PJ, Beyene J, et al. Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. JAMA. 2005;293: 1223–1238. pmid:15755945.
- View Article
- PubMed/NCBI
- Google Scholar
18. Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans. Comput. Healthcare. 2021;3:1, Article 2 (January 2022), 23 pages.
- View Article
- Google Scholar
19. Sparck JK, Walker S, Robertson SE. "A probabilistic model of information retrieval: development and comparative experiments: Part 2". Information processing & management 36.6 (2000): 809–840.
- View Article
- Google Scholar
20. Zimran A, Elstein D, Gonzalez DE, Lukina EA, Qin Y, Dinh Q, et al. Treatment-naïve Gaucher disease patients achieve therapeutic goals and normalization with velaglucerase alfa by 4 years in phase 3 trials. Blood Cells Mol Dis. 2018;68: 153–159. Epub 2016 Oct 21. pmid:27839979.
- View Article
- PubMed/NCBI
- Google Scholar
21. Kampmann C, Linhart A, Baehner F, Palecek T, Wiethoff CM, Miebach E, et al. Onset and progression of the Anderson-Fabry disease related cardiomyopathy. Int J Cardiol. 2008;130: 367–373. pmid:18572264
- View Article
- PubMed/NCBI
- Google Scholar
22. Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics. 2020;36: 1234–1240, pmid:31501885
- View Article
- PubMed/NCBI
- Google Scholar
23. Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019;1 (Long and Short Papers): pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
24. Lafferty JD, McCallum A, Pereira FCN. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML ’01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 282–289.
25. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
26. Falcon W. "Pytorch lightning" GitHub. Note: https://github.com/PyTorchLightning/pytorch-lightning 3 (2019): 6.
27. Liaw R, Liang E, Nishihara R, Moritz P, Gonzalez JE Stoica I., 2018. Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118.
28. Neumann M, King D, Beltagy I, Ammar W. 2019. ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 319–327, Florence, Italy. Association for Computational Linguistics.
29. Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004 Jan 1; 32(Database issue): D267–70. pmid:14681409
- View Article
- PubMed/NCBI
- Google Scholar
30. Inaoki M, Otsuki N, Ishise S, Ueda Y, Sakuraba H. Two cases of Fabry’s disease: a hemizygote with a point mutation in the alpha-galactosidase A gene and his relative. J Dermatol. 1992;19: 481–486. pmid:1328341
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Rare diseases. (n.d.). Retrieved April 4, 2022, from https://ec.europa.eu/health/non-communicable-diseases/steering-group/rare-diseases_en.

[ref2] 2. Mengel E, Gaedeke J, Gothe H, Krupka S, Lachmann A, Reinke J, et al. The patient journey of patients with Fabry disease, Gaucher disease and Mucopolysaccharidosis type II: A German-wide telephone survey. PLoS One. 2020;15: e0244279. eCollection 2020. pmid:33382737.
View Article
PubMed/NCBI
Google Scholar

[3] View Article

[4] PubMed/NCBI

[5] Google Scholar

[ref3] 3. Nicholl H, Tracey C, Begley T, King C, Lynch AM. Internet Use by Parents of Children With Rare Conditions: Findings From a Study on Parents’ Web Information Needs. J Med Internet Res. 2017; 19: e51. pmid:28246072
View Article
PubMed/NCBI
Google Scholar

[7] View Article

[8] PubMed/NCBI

[9] Google Scholar

[ref4] 4. Wake Forest Baptist Medical Center. "Internet can be valuable tool for people with undiagnosed rare disorders." ScienceDaily 2019 Aug 7. <www.sciencedaily.com/releases/2019/08/190807144400.htm>.

[ref5] 5. Kok K, Zwiers KC, Boot RG, Overkleeft HS, Aerts JMFG, Artola M. Fabry Disease: Molecular Basis, Pathophysiology, Diagnostics and Potential Therapeutic Directions. Biomolecules. 2021;11: 271. pmid:33673160.
View Article
PubMed/NCBI
Google Scholar

[12] View Article

[13] PubMed/NCBI

[14] Google Scholar

[ref6] 6. Hughes DA, Aguiar P, Lidove O, Nicholls K, Nowak A, Thomas M, et al. Do clinical guidelines facilitate or impede drivers of treatment in Fabry disease? Orphanet Journal of Rare Diseases. 2022;17: 42. pmid:35135579
View Article
PubMed/NCBI
Google Scholar

[16] View Article

[17] PubMed/NCBI

[18] Google Scholar

[ref7] 7. Zimran A, Elstein D. Lipid storage diseases. In: K. Kaushansky, M, Lichtman, J Prchal, M.M. Levi, O. Press, L. Burns, M. Caligiuri (Eds.), Williams Hematology, 9th edition; New York: McGraw-Hill, Chapter 72 (2016).

[ref8] 8. Revel-Vilk S, Szer J, Mehta A, Zimran A. How we manage Gaucher Disease in the era of choices. Br J Haematol. 2018;182: 467–480. pmid:29808905
View Article
PubMed/NCBI
Google Scholar

[21] View Article

[22] PubMed/NCBI

[23] Google Scholar

[ref9] 9. Mehta A, Kuter DJ, Salek SS, Belmatoug N, Bembi B, Bright J, et al. Presenting signs and patient co-variables in Gaucher disease: outcome of the Gaucher Earlier Diagnosis Consensus (GED-C) Delphi initiative [published correction appears in Intern Med J. 2019 Aug;49(8):1059]. Intern Med J. 2019;49: 578–591.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref10] 10. Mehta A, Rivero-Arias O, Abdelwahab M, Campbell S, McMillan A, Rolfe MJ, et al. Scoring system to facilitate diagnosis of Gaucher disease. Intern Med J. 2020; 50: 1538–1546. pmid:33174353.
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref11] 11. Savolainen MJ, Karlsson A, Rohkimainen S,Toppila I, Lassenius MI, Vaca Falconi C, et al. The Gaucher earlier diagnosis consensus point-scoring system (GED-C PSS): Evaluation of a prototype in Finnish Gaucher disease patients and feasibility of screening retrospective electronic health record data for the recognition of potential undiagnosed patients in Finland. Molecular Genetics and Metabolism Reports. 2021;21: 100725. pmid:33604241
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref12] 12. Jefferies JL, Spencer AK, Lau HA, Nelson MW, Giuliano JD, Zabinski JW, et al. A new approach to identifying patients with elevated risk for Fabry disease using a machine learning algorithm. Orphanet J Rare Dis. 2021 20;16: 518. pmid:34930374.
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref13] 13. Andrade-Campos MM, de Frutos LL, Cebolla JJ, Serrano-Gonzalo I, Medrano-Engay B, Roca-Espiau M, et al. Identification of risk features for complication in Gaucher’s disease patients: a machine learning anal NNF20OC0062606ysis of the Spanish registry of Gaucher disease. Orphanet J Rare Dis. 2020;15: 256. .
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref14] 14. Dragusin R, Petcu P, Lioma C, Larsen B, Jørgensen HL, Cox IJ, et al. FindZebra: a search engine for rare diseases. Int J Med Inform. 2013;82: 528–538. Epub 2013 Feb 23. pmid:23462700.
View Article
PubMed/NCBI
Google Scholar

[43] View Article

[44] PubMed/NCBI

[45] Google Scholar

[ref15] 15. Svenstrup D, Jørgensen HL, Winther O. Rare disease diagnosis: A review of web search, social media and large-scale data-mining approaches. Rare Diseases. 2015;3:1. pmid:26442199
View Article
PubMed/NCBI
Google Scholar

[47] View Article

[48] PubMed/NCBI

[49] Google Scholar

[ref16] 16. Kawamoto K, Houlihan CA, Balas EA, Lobach DF. Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success. BMJ. 2005;330: 765. Epub 2005 Mar 14. pmid:15767266
View Article
PubMed/NCBI
Google Scholar

[51] View Article

[52] PubMed/NCBI

[53] Google Scholar

[ref17] 17. Garg AX, Adhikari NKJ, McDonald H, Rosas-Arellano MP, Devereaux PJ, Beyene J, et al. Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. JAMA. 2005;293: 1223–1238. pmid:15755945.
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref18] 18. Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans. Comput. Healthcare. 2021;3:1, Article 2 (January 2022), 23 pages.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref19] 19. Sparck JK, Walker S, Robertson SE. "A probabilistic model of information retrieval: development and comparative experiments: Part 2". Information processing & management 36.6 (2000): 809–840.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref20] 20. Zimran A, Elstein D, Gonzalez DE, Lukina EA, Qin Y, Dinh Q, et al. Treatment-naïve Gaucher disease patients achieve therapeutic goals and normalization with velaglucerase alfa by 4 years in phase 3 trials. Blood Cells Mol Dis. 2018;68: 153–159. Epub 2016 Oct 21. pmid:27839979.
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref21] 21. Kampmann C, Linhart A, Baehner F, Palecek T, Wiethoff CM, Miebach E, et al. Onset and progression of the Anderson-Fabry disease related cardiomyopathy. Int J Cardiol. 2008;130: 367–373. pmid:18572264
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref22] 22. Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics. 2020;36: 1234–1240, pmid:31501885
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref23] 23. Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019;1 (Long and Short Papers): pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.

[ref24] 24. Lafferty JD, McCallum A, Pereira FCN. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML ’01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 282–289.

[ref25] 25. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.

[ref26] 26. Falcon W. "Pytorch lightning" GitHub. Note: https://github.com/PyTorchLightning/pytorch-lightning 3 (2019): 6.

[ref27] 27. Liaw R, Liang E, Nishihara R, Moritz P, Gonzalez JE Stoica I., 2018. Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118.

[ref28] 28. Neumann M, King D, Beltagy I, Ammar W. 2019. ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 319–327, Florence, Italy. Association for Computational Linguistics.

[ref29] 29. Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004 Jan 1; 32(Database issue): D267–70. pmid:14681409
View Article
PubMed/NCBI
Google Scholar

[83] View Article

[84] PubMed/NCBI

[85] Google Scholar

[ref30] 30. Inaoki M, Otsuki N, Ishise S, Ueda Y, Sakuraba H. Two cases of Fabry’s disease: a hemizygote with a point mutation in the alpha-galactosidase A gene and his relative. J Dermatol. 1992;19: 481–486. pmid:1328341
View Article
PubMed/NCBI
Google Scholar

[87] View Article

[88] PubMed/NCBI

[89] Google Scholar

Figures

Abstract

Author summary

Introduction

Material and methods

FindZebra workflow

Data

PubMed case reports.

Clinical data—Fabry and Gaucher Outcomes Surveys.

Converting survey entries to search queries.

Text segmentation

Search engine

Validation protocol

Step 1—Preliminary non-expert validation.

Step 2—Expert validation.

Step 3—Population and corpus level analysis.

Ethics statement

Results

Clinical data and queries

Subsets of typical and atypical patients

Expert validation

Case study

Population and corpora

Discussion

Conclusion

Supporting information

S1 Text. The docx file contains the supplementary information referenced in the main text.

S1 Data. This zip file contains the 40 patient cases in JSON format (20 Fabry and 20 Gaucher) used for the evaluation.

References