Informing epidemic (research) responses in a timely fashion by knowledge management – a Zika virus use case

ABSTRACT The response of pathophysiological research to emerging epidemics often occurs after the epidemic and, as a consequence, has little to no impact on improving patient outcomes or on developing high-quality evidence to inform clinical management strategies during the epidemic. Rapid and informed guidance of epidemic (research) responses to severe infectious disease outbreaks requires quick compilation and integration of existing pathophysiological knowledge. As a case study we chose the Zika virus (ZIKV) outbreak that started in 2015 to develop a proof-of-concept knowledge repository. To extract data from available sources and build a computationally tractable and comprehensive molecular interaction map we applied generic knowledge management software for literature mining, expert knowledge curation, data integration, reporting and visualization. A multi-disciplinary team of experts, including clinicians, virologists, bioinformaticians and knowledge management specialists, followed a pre-defined workflow for rapid integration and evaluation of available evidence. While conventional approaches usually require months to comb through the existing literature, the initial ZIKV KnowledgeBase (ZIKA KB) was completed within a few weeks. Recently we updated the ZIKA KB with additional curated data from the large amount of literature published since 2016 and made it publicly available through a web interface together with a step-by-step guide to ensure reproducibility of the described use case. In addition, a detailed online user manual is provided to enable the ZIKV research community to generate hypotheses, share knowledge, identify knowledge gaps, and interactively explore and interpret data. A workflow for rapid response during outbreaks was generated, validated and refined and is also made available. The process described here can be used for timely structuring of pathophysiological knowledge for future threats. The resulting structured biological knowledge is a helpful tool for computational data analysis and generation of predictive models and opens new avenues for infectious disease research. ZIKV Knowledgebase is available at www.zikaknowledgebase.eu.


86
The response to a (re-)emerging infectious disease (ID) epidemic requires a rapid 87 compilation of existing pathophysiological knowledge to inform research priorities 88 guiding basic and clinical research. Gaps in understanding of the underlying 89 mechanisms make it difficult to design effective disease-modifying therapies. Hence, 90 during an emerging ID outbreak, the available information at the time of its 91 emergence and the subsequent rapid accumulation of scientific knowledge from 92 various sources needs to be captured and analysed in a timely and comprehensive 93 fashion. Responding to an ID outbreak therefore would benefit from the use of a 94 knowledge repository that organizes the disease-related knowledge into pathway, 95 molecular interaction and disease maps. Such maps are a relatively new concept 96 that have been used in neurodegenerative and heart diseases (1,2), but which have 97 compounds and diseases (for instance "activates", "restricts", "targets"). To optimise 220 recall and specificity of the mining, we extended the dictionaries for viral names, 221 acronyms and interaction predicates as well as defined a black-list of acronyms 222 causing mostly false positives. The term "microcephaly" is a reusable scientific concept that participates not just in 230 one "Subject-Predicate-Object" construct detected, but in all such constructs 231 detected that mention " microcephaly". Supplementary information is associated with 232 the "microcephaly" object, including, for example, information from the Disease 233 ontology, and other integrated resources, such as Gene-Disease-Association data 234 (DisGeNET). This expandable set of relationships forms a large network of 235 knowledge that enables new knowledge to be inferred by "reasoning" based on the 236 logic encoded in those relationships. 237 238 Finally, the extracted relationships can be curated to manually optimise quality and 239 information content. A curation user interface was implemented to enable the expert 240 team to support or refute the automatically generated relationships. At least two 241 independent researchers (the "4-eye review mode") manually evaluated the 242 evidence for every extracted relationship. In the case that the evaluations from the 243 two researchers conflicted, the conflicts were either resolved during the weekly 244 online conferences or were excluded, as our goal was to maximise specificity 245 (correctness) rather than sensitivity (completeness) of the integrated information. 246 In addition, experts could expand the network with any relevant supporting evidence 247 from other integrated sources, such as public or proprietary databases and 248 experimental data. Semantic mapping describes the process of identifying and linking concepts that are 252 shared between two information sources. We integrated the databases listed in 253 Table 1 using existing concepts such as genes, pathogens or diseases which were 254 identified by ontological descriptors. Semantically identical objects are mapped to 255 descriptive data from literature and databases to allow informed and efficient 256 querying of the overall collected information (e.g. "Dengue disease" is mapped to the 257 following synonyms: "Breakbone fever", "Dengue disorder", "Dengue fever" and 258 "Dengue") (25). To this end, mapping scripts are created to resolve a given input 259 data format and match the provided entity identifiers or ontology terms. Experimental 260 data from key publications is mapped by the same approach. While these data are 261 henceforth available for search and reporting they are not yet displayed as part of 262 any specific molecular interaction and disease map.   s  t  a  n  d  a  r  d  i  z  e  d  v  o  c  a  b  u  l  a  r  y  o  f   p  h  e  n  o  t  y  p  i  c  a  b  n  o  r  m  a  l  i  t  i  e  s   i  n  h  u  m  a  n  d  i  s  e  a  s  e  1  1  5  9  2  2  0  1

Querying and visualization of integrated information in tables, networks and 266 disease maps 267
To help the expert team establish a specific molecular interaction and disease map 268 we defined a number of queries to explore the collective knowledge. These queries 269 were used, for example, to find diseases and genes associated with a virus of 270 interest to find diseases associated with genes prioritized according to experimental 271

evidence. 272
Based on these queries we developed a streamlined, wizard-based user interface to

Deployment of an open access, web-based user interface 281
To make the results of our internal test case generally available and to support ZIKV 282 research, we provide and maintain a regularly updated ZIKA KB at the following URL 283 www.zikaknowledgebase.eu. As we continue to extend this resource user 284 registration for access will be implemented to ensure the knowledge base is used for 285 research only. 286 287

Semantic representation of ZIKV infection 290
The data model implemented to provide a semantic representation of ZIKV infection 291 is described in detail in supplemental Figure S2. Briefly, the model focuses on 292 genes, diseases, pathogens and drugs, and distinguishes between associations 293 derived from literature mining and those provided by experimental data such as 294

PPIs. 295
Text mining results 296 We searched PubMed with the terms "Zika virus", "Dengue", "West Nile virus", 297 "Japanese encephalitis virus", "Tick-borne encephalitis virus", "Microcephaly" and 298 processing algorithm was applied to these sets of documents to efficiently extract the 308 fast growing information in the biomedical literature. The text mining extracted a total 309 of 11916 relationships, which were manually evaluated to 2982 verified relationships 310 (Table 2). The distribution of the curated relationships is depicted in Fig 3, (Table 1)

. Recently, a variety of ZIKV-and other flavivirus-related 328
large-scale data sets, including microarray gene expression (26,27), RNAseq (28) as 329 well as CRISPR/Cas data (29), have become publicly available and were integrated 330 to identify host factors that are affected during viral infection. 331

Molecular interaction and disease maps 332
Curated text mining results were used to populate the initial ZIKV molecular 333 interaction and disease map. In a second step the map was extended with 334 interaction data (PPI & protein-drug interaction by applying, a network search to 335 implement the breadth-first algorithm (30) which connected genes extracted from text 336 mining relationships based on the overall network. This set of interaction data can be 337 filtered and explored interactively. In a systems medicine approach, a 338 multidisciplinary expert team systematically analysed literature, public databases and 339 experimental resources to create a formal, structured model of molecular and cellular 340 ZIKV-host interactions ("molecular interaction and disease map") 341

Publicly available ZIKA KB 342
After an assessment period of internal use, a web-browser based user interface was 343 implemented to make the PREPARE ZIKA KB available to all ZIKV researchers. By 344 openly sharing the collected data and information, the ZIKA KB allows researchers to 345 generate hypotheses, identify knowledge gaps and interactively explore and interpret 346 data. All data are currently in the public domain. Upon request, data submission can 347 be modified to allow registered users to specify that submitted data should not be 348 publicly available 349 . 350

Use of the ZIKA KB 351
In the following we provide several example use cases. For instance, publicly 352 available interaction data, such as the PPI and protein-drug interaction data can be 353 used to visualize drug targets and host factors involved in ZIKV pathogenesis. 354 Alternatively, users can filter for PPIs whose source or target is a drug or refine 355 search results to include only proteins localized to a specific cellular compartment, 356 such as the endoplasmic reticulum. The returned networks can be interrogated 357 subsequently to identify host factors that are targeted by the virus and to search for 358 drugs that interact with these host factors. and thus might contribute to drug 359 repositioning for future treatment options for ZIKV infection. The maps can also be 360 explored further by using integrated expression and knockout data. 361 To explore the integrated literature knowledge for relevance or obtain an overview of 362 drug targets or identify critical genes within the network consisting of gene-disease-363 pathogen relationships, predefined perspectives were overlaid onto the default map. 364 The association of ZIKV with microcephaly was reported most frequently across all 365 ZIKV literature and this association is visualized by the thickness of the edges (Fig  366   4A). Known drug targets interacting directly with ZIKV or microcephaly were 367 highlighted in green for potential intervention evaluation (Fig 4B). Genes playing a 368 role in ZIKV infected human neural progenitor cells (hNPCs) were also highlighted 369 for comparative analyses of complementary experimental analyses (Fig 4C). represents the corresponding primary manifestation of these viral infections. 387 Encephalitis is equally frequently associated with ZIKV and DENV confirming that 388 ZIKV is also an aetiological agent in encephalitis. these compounds provide a resource to study ZIKV pathogenesis and can contribute 405 to insights into the biology of ZIKV. To this end, the ZIKV molecular interaction and 406 disease map described in Figure 4 was extended and filtered to include these 407 potential "ZIKV effective drugs" which were connected to genes associated to ZIKV 408 through PPIs (Fig 6). After this extension ten of the identified ZIKV effective drugs 409 were part of the new map which we then used to gain insight into potential drug 410 mechanisms and ZIKV biology. One of the drugs, Bortezomib, is a known antiviral 411 compound that inhibits replication of flaviviruses (32). Bortezomib is a proteasome 412 inhibitor, suggesting that proteasome action is essential for ZIKV replication. This genes whose expression is affected during ZIKV infection indicated that Sorafenib 534 likely acts via its target genes, FLT3 and/or BRAF, but not via its alternative target 535 genes VEGFR or PDGFR. In addition, the number of drugs to be screened was 536 reduced from 774 to 64 by filtering potential drug candidates based on their network 537 distance to ZIKV infection associated genes and additional phenotype relevant 538 additional knowledge, such as contained in the "FDA pregnancy" label. 539

540
The conclusions that can be drawn are limited by the initially low number of available 541 publications and limited experimental data, a situation which is inherent to most 542 emerging epidemics. Nevertheless, the work presented shows that the use of a 543 knowledge integrating system can provide guidance for clinical and research 544 responses, such as follow-up studies regarding the association between ZIKV, 545 microcephaly and epilepsy, the validation of candidate drugs for ZIKV treatment, and 546 the validation of candidate genes in specific functional assays to better understand 547 molecular ZIKV infection mechanisms. or to complement existing functional genomic 548 approaches with proteomics studies, such as the integrated proteomics approach 549 identifying cellular targets of ZIKV proteins (38,39). These studies allow additional 550 comparative analyses between ZIKV and other flavivirus family members in terms of 551 virulence and pathogenic traits. 552 Another limitation of the system is the restricted types of information which can be 553 retrieved by text mining. While qualitative associations between genes/proteins, 554 drugs, diseases and organisms are readily amendable to automatic approaches, it is 555 currently almost impossible to extract, for example, clinical study designs, detailed 556 quantitative information or complex treatment plans. 557 Finally, the ZIKA KB in its current stage enables exploration of the integrated 558 information, as well as generation and curation of text-mining analysis but is not a 559 public tool for molecular interaction and disease map generation. The functions 560 required for these tasks will need further refinement before they can be made 561 available in a general way. 562

563
In summary, this approach in our opinion, provides a feasible way to collect and 564 integrate existing knowledge to better understand the molecular mechanisms of an 565 emerging pathogen. In addition our approach helps to identify gaps in knowledge 566 and, together with the other features, guides rapid and effective responses to future 567 epidemics. We have made the specific outcome of our approach, the Zika 568 KnowledgeBase, publicly available as a hopefully valuable resource to the ZIKV 569 research community. 570 In the light of the current COVID-19 pandemic we now apply the described workflow 571 to SARS-CoV-2 and other coronaviruses and will make the developed resource 572 available as described. 573 574 575