= Disease informatics =

Disease Informatics (also called Infectious Disease informatics) addresses some major challenges to global public health, demanding solid medical interference, but also credible data centric strategies ^{[1]}. With rapid advancement of genetic technology tools that analyze the DNA and RNA of pathogens to identify, track, and characterize them, alongside artificial intelligence and the field of Infectious Disease Informatics (IDI) has emerged as area of expertise.

Considering infectious diseases contribute to millions of deaths every year, the ability to identify and understand disease diffusion is crucial for society to apply control and prevention measures. The knowledge gained by researchers in the field of disease informatics can be used to aid policymakers' decisions on issues such as spreading public awareness, updating the training of health professionals, and buying vaccines.

Aside from aiding in policymakers' decisions, the goals of disease informatics also include increased identification of biomarkers for transmissibility, improved vaccine design, and a deeper understanding of host-pathogen interactions, and the optimization of antimicrobial development.

In parallel, recent insights of the COVID-19 pandemic emphasize some essential of involvement of data science for epidemic forecasting, risk modeling, and policy support. Acknowledging how infectious disease plays a role in numerous amounts of deaths each year, the need to recognize the disease transmission is pivotal for prevention and societal safeguard. Together, these approaches mark a paradigm shift; where managing infectious diseases no longer relies solely on biological knowledge, but equally on computational insights and collaborative information systems.

== Background ==
Throughout most of history, human viewpoint towards epidemics or outbreaks had a combination of false theories and surprising extent of practical sense . Eventually, detection and administration of infectious disease outbreaks relied mostly on manual reporting, clinical observation, and delayed laboratory confirmations. A reportable condition of infectious disease is where timely informing of individual cases is studied for the management and prevention of an outbreak or condition. The traditional methods, often suffered from slow response times and limited scalability, factors that proved critical during fast-moving outbreaks such as SARS in 2003 or H1N1 in 2009.This need gave rise to Infectious Disease Informatics a field that blends epidemiology, computer science, bioinformatics, and bio surveillance to enhance the management of infectious diseases.

=== Case study of HIV and SARs: A Network Analysis of comorbidity risk at the time of outbreak. ===
The rate of mortality and morbidity co-exist to a term "comorbidity" that is associated to the increase possibility of health conditions due to infections. In simple term this word relates to the existence of diverse diseases and their caused disorders in an individual. So comorbidity occurs when it shows the relation between co-existence of two disease simultaneously in an individual or a patient. Similarly, viruses imposing risk on respiratory system of an individual has been emerging a threat to global medical security, Severe Acute Respiratory Syndrome (SARS) is a pandemic, along coronavirus (CoV), has been termed as SARs associated coronavirus (SARS-CoV). This case study shows an approach towards quantitative discovery of societal disease comorbidities considering various techniques of accessible mRNA expression, disease to gene relation, protein mapping, relation among two co existing diseases and the relation of drug to disease data.

=== Connection to broader Health Informatics domain ===
Disease Informatics is a branch within the broader aspect of Health Informatics, which focuses on collection of information and communication machinery in medical facilities. It comprises diverse domains which serve a foundation for disease focused applications such as telemedicine, including Electronic health records (EHRs) which refers to how clinical systems are designed to for storing, retrieval and display of electronic data which is collected over the time a patient is under care.

IDI systems success rate depends on how well it can access and process clinical data from hospital information systems. As modern health informatics infrastructures facilitate real time data sharing, integration of diverse information systems, and the application of international standards such as HL7 FHIR (Fast Healthcare Interoperability Resources),these capabilities are essential for enabling timely and accurate infectious disease surveillance.

== The Emerging role of Informatics in Public health practice ==
Control of infectious disease is the cornerstone of public health. Emerging informatics in public health shows a radical alteration in how health data is collected, monitored, analyzed and utilized to improve public/population health outcomes.

=== Essential services in public health practice ===
Public health practice is grounded based on three core functions: Assessment, Assurance and Policy Development. These show few essential services aimed at improving public health.

- By monitoring health status, diagnosing, educating and empowering communities to overcome medical issues.
- Enforce government regulations and laws to prioritize safety of societal health.
- Assure capable health service workforce.
- Continuous study/research for innovative techniques.

=== Syndromic surveillance as a tool for early detection ===
Infectious Disease Informatics (IDI) plays a vital role in enabling early detection and rapid response to outbreaks, particularly through tools like syndromic surveillance. Syndromic surveillance (relates to public health surveillance) focuses on how a contagious disease can be identified and studied, while monitoring the current public health data. These tools are progressively used for effective detection and response to any infectious disease, though it is a natural outbreak or a result of some lab experiment.

The approach of how this surveillance system works by implementing natural language processing to identify the potential primary factors of an epidemic.

- Patients encounter while seeking medical care at any healthcare facilities.
- Data collection is done by sending de-identifiable data such as symptoms and patients characteristics.
- Data is then shared with state or local health departments or HIEs.
- To enable early detection of public health risk, (NSSP) National Syndromic Surveillance Program hosted by CDC aggregates this information via Bio Sense platform.
- CDC supports surveillance by providing fundings, training to health departments, technical and project assistance and analytical tools for data analysis.
- This NSSP network of public health professionals collaborates to build capacity through training, live webinars, and joint efforts to improve surveillance methods and emergency responses.

== Computational Methods ==

=== Artificial intelligence ===
The use of artificial intelligence (AI) tools, such as machine learning and natural language processing (NLP), in disease informatics increase efficiency by automating and speeding up several data analysis processes. Advances with AI and increased accessibility of data aid in predictive modeling and public health surveillance. AI uses predictive modeling to examine vast data sets and forecast future outcomes to increase the ability to predict disease outbreaks and help guide public health treatments. AI also provides a valuable avenue by combining its ability of spatial modeling with geographic information system (GIS) data to uncover geographical patterns (for example disease clusters) to support data-driven decision-making for local-level predictions of disease diffusion. As the growth of AI continues, more advances for its use in disease informatics are expected to come.

=== Machine learning ===
Machine learning (ML) techniques aid the study of disease informatics with its capability to spatially and temporally predict the progression and transmission of infectious diseases. In disease informatics, the role of Machine learning algorithm can play a pivotal role to control the downside of any infectious disease over time by predicting the cause and further spread by analyzing extensive amounts of complex data sets to identify patterns across varying types of data such as demographics, electronic health records, environmental conditions, etc. Researchers apply algorithms to data sets (for example genomic data, social media posts, and health records) to make predictions about the potential sources of an outbreak, the likelihood of an individual contracting a certain disease, and forecasting the number of cases of a disease in a given region. For analyzing large, complex data sets to identify trends, techniques like Support vector machine, Ensemble learning, Conditional Random Field(CRF), Decision tree and other algorithms are used.

=== Text mining ===
The use of text mining has become a beneficial avenue for querying large amounts of data to aid in gene mapping and the analysis of genomes. This tool provides the ability to query medical databases for processes such as genomic mapping, by integrating the genomic and proteomic data to map the genes and highlight their interrelationships with various diseases. Retrieving data of targeted sequences can be done in two ways, through a similarity search or by keyword search. A similarity search (using software like BLAST (biotechnology) is performed by entering a known sequence as a query sequence to search for sequences that have similarities. A keyword search (public tools include SRS, Entrez, and ACNUC) uses annotations that define the features of genes, such as sequence positions, to retrieve the desired gene sequences being searched for.

=== NLP ===
Natural Language processing (NLP) is highly considered for analyzing the patient data which consists of symptoms as this information are mostly provided in online health communities. It converts unstructured information into usable data for early detection and diagnosis. NLP tools known as (PubTator 3.0) which identifies relation across various entities as such genetic relation, chemical, variants, various infections, discovered species and cell mapping for experimental search.

== Limitations and future prospects ==

=== Accessibility concerns ===
The accuracy of these AI tools and techniques relies upon providing them with high-quality, comprehensive data. Accessibility and collection of such data is still an ongoing challenge because most of the data pulled is incomplete, noisy, and contains human errors (i.e. grammar, abbreviations, spelling) which means the data must undergo a thorough cleaning (data cleansing) before it is eligible to be used.

The formation of a standardized taxonomy for data analysis and predictive modeling would facilitate research collaboration, accelerate decisions, and help select the right predictive models to be used.

One method being used is federated learning, which allows the AI to be trained across multiple different centers without the need for sharing raw data, keeping the data safe within its source.

Another concern is the potential for bias and overfitting of the predictive models, which could lead to inaccurate predictions. Human error can still persist even using these tools to automate tasks, due to the fact that if the AI tools are trained incorrectly, they will produce inaccurate data. A relevant study suggests that implementing AI with wearable devices and other emerging technology in the future would benefit some of the challenges by providing real-time data for the models to use, which could lead to increased accuracy of the data in its raw form, creating less need to spend time cleaning the data, and allowing the models to make more accurate predictions.

=== Ethical concerns ===
A critical concern for using AI and predictive modeling in disease informatics is data security and privacy. The data sources being used (electronic health records, demographics, etc.) contain highly sensitive information that must be protected for all parties involved. Any models or techniques being used need to be in compliance with local governmental regulations and laws such as HIPAA in the United States. The data used must also undergo rigorous data anonymization and de-identification protocols to protect patient privacy.

Similarly, in Europe Health Technology Assessment Regulation (HTAR), are for the evaluation of various medical trial benefits and the consequences of a new medical technology. These laws are applied when a computational medical experiment or tool such as robotic surgery or software for diagnosis needs to demonstrate their safety and potential effective impact in the medical area.
