Dr. Musen is Professor of Biomedical Informatics at Stanford University, where he is Director of the Stanford Center for Biomedical Informatics Research. Dr. Musen conducts research related to intelligent systems, reusable ontologies, metadata for publication of scientific data sets, and biomedical decision support. His group developed Protégé, the world?s most widely used technology for building and managing terminologies and ontologies. He is principal investigator of the National Center for Biomedical Ontology, one of the original National Centers for Biomedical Computing created by the U.S. National Institutes of Heath (NIH). He is principal investigator of the Center for Expanded Data Annotation and Retrieval (CEDAR). CEDAR is a center of excellence supported by the NIH Big Data to Knowledge Initiative, with the goal of developing new technology to ease the authoring and management of biomedical experimental metadata. Dr. Musen chairs the Health Informatics and Modeling Topic Advisory Group for the World Health Organization?s revision of the International Classification of Diseases (ICD-11) and he directs the WHO Collaborating Center for Classification, Terminology, and Standards at Stanford University.
Early in his career, Dr. Musen received the Young Investigator Award for Research in Medical Knowledge Systems from the American Association of Medical Systems and Informatics and a Young Investigator Award from the National Science Foundation. In 2006, he was recipient of the Donald A. B. Lindberg Award for Innovation in Informatics from the American Medical Informatics Association. He has been elected to the American College of Medical Informatics and the Association of American Physicians. He is founding co-editor-in-chief of the journal Applied Ontology.

Academic Appointments

Administrative Appointments

  • Deputy Director for Bioinformatics, Immune Tolerance Network (2005 - 2007)
  • Head, Stanford Center for Biomedical Informatics Research (1992 - Present)
  • Co-Editor-in-Chief, Applied Ontology: An International Journal of Ontological Analysis and Conceptual Modeling (2005 - Present)
  • Principal Investigator, National Center for Biomedical Ontology (2005 - Present)
  • Principal Investigator, Center for Expanded Data Annotation and Retrieval (2014 - Present)

Honors & Awards

  • Young Investigator Award for Research in Medical Knowledge Systems, American Association for Medical Systems and Informatics (1989)
  • Elected Fellow, American College of Medical Informatics (1989)
  • NSF Young Investigator Award, National Science Foundation (1992)
  • Elected Member, American Society for Clinical Investigation (1997)
  • Chair, Scientific Program Committee, American Medical Informatics Association Annual Symposium (2003)
  • General Chair, International Semantic Web Conference (2005)
  • Donald A. B. Lindberg Award for Innovation in Informatics, American Medical Informatics Association (2006)
  • Elected Member, Association of American Physicians (2010)
  • General Chair, Association for Computing Machinery Conference on Knowledge Capture (K-Cap '11) (2011)
  • "Ten Year Award" for the most influential paper presented at ISWC, ten years previously, Semantic Web Science Association (2014)

Boards, Advisory Committees, Professional Organizations

  • Member, National Advisory Council on Biomedical Imaging and Bioengineering (2011 - 2015)
  • Chair, Health Informatics and Modeling Topic Advisory Group, ICD Revision Steering Group, World Health Organization (2008 - Present)

Professional Education

  • Ph.D., Stanford University, Medical Information Sciences (1988)
  • M.D., Brown University, Medicine (1980)
  • Sc.B., Brown University, Biology (1977)

Research & Scholarship

Current Research and Scholarly Interests

The construction of automated systems to assist biomedical decision making is impeded by difficulties in formalizing knowledge and in encoding that knowledge for use by the computer. Current work in our laboratory addresses mechanisms by which computers can assist in the development of large, electronic biomedical knowledge bases. Emphasis is placed on new methods for the automated generation of computer-based tools that end-users can use to enter knowledge of specific biomedical content. In particular, we are studying:

- Representation of biomedical concepts and terminologies for development of intelligent systems

- Development of reusable domain descriptions (ontologies) and problem-solving methods

- Visual metaphors to facilitate knowledge entry by application specialists

- Decision-support systems for use in biomedicine

- Guideline-based and protocol-based clinical care

The Protégé system provides a uniform infrastructure for our work on knowledge modeling and representation.

The National Center for Biomedical Ontology, supported by the NIH Common Fund, develops a new generation of technology for storing, accessing, evaluating, and using biomedical knowledge resources.


2016-17 Courses

Stanford Advisees

Graduate and Fellowship Programs


All Publications

  • Using the wisdom of the crowds to find critical errors in biomedical ontologies: a study of SNOMED CT JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Mortensen, J. M., Minty, E. P., Januszyk, M., Sweeney, T. E., Rector, A. L., Noy, N. F., Musen, M. A. 2015; 22 (3): 640-648


    The verification of biomedical ontologies is an arduous process that typically involves peer review by subject-matter experts. This work evaluated the ability of crowdsourcing methods to detect errors in SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) and to address the challenges of scalable ontology verification.We developed a methodology to crowdsource ontology verification that uses micro-tasking combined with a Bayesian classifier. We then conducted a prospective study in which both the crowd and domain experts verified a subset of SNOMED CT comprising 200 taxonomic relationships.The crowd identified errors as well as any single expert at about one-quarter of the cost. The inter-rater agreement (?) between the crowd and the experts was 0.58; the inter-rater agreement between experts themselves was 0.59, suggesting that the crowd is nearly indistinguishable from any one expert. Furthermore, the crowd identified 39 previously undiscovered, critical errors in SNOMED CT (eg, 'septic shock is a soft-tissue infection').The results show that the crowd can indeed identify errors in SNOMED CT that experts also find, and the results suggest that our method will likely perform well on similar ontologies. The crowd may be particularly useful in situations where an expert is unavailable, budget is limited, or an ontology is too large for manual error checking. Finally, our results suggest that the online anonymous crowd could successfully complete other domain-specific tasks.We have demonstrated that the crowd can address the challenges of scalable ontology verification, completing not only intuitive, common-sense tasks, but also expert-level, knowledge-intensive tasks.

    View details for DOI 10.1136/amiajnl-2014-002901

    View details for Web of Science ID 000356717100018

  • Using Semantic Web in ICD-11: Three Years Down the Road SEMANTIC WEB - ISWC 2013, PART II Tudorache, T., Nyulas, C. I., Noy, N. F., Musen, M. A. 2013; 8219: 195-211
  • The Protege OWL Plugin: An open development environment for Semantic Web applications SEMANTIC WEB - ISWC 2004, PROCEEDINGS Knublauch, H., Fergerson, R. W., Noy, N. F., Musen, M. A. 2004; 3298: 229-243
  • A unified software framework for deriving, visualizing, and exploring abstraction networks for ontologies. Journal of biomedical informatics Ochs, C., Geller, J., Perl, Y., Musen, M. A. 2016; 62: 90-105


    Software tools play a critical role in the development and maintenance of biomedical ontologies. One important task that is difficult without software tools is ontology quality assurance. In previous work, we have introduced different kinds of abstraction networks to provide a theoretical foundation for ontology quality assurance tools. Abstraction networks summarize the structure and content of ontologies. One kind of abstraction network that we have used repeatedly to support ontology quality assurance is the partial-area taxonomy. It summarizes structurally and semantically similar concepts within an ontology. However, the use of partial-area taxonomies was ad hoc and not generalizable. In this paper, we describe the Ontology Abstraction Framework (OAF), a unified framework and software system for deriving, visualizing, and exploring partial-area taxonomy abstraction networks. The OAF includes support for various ontology representations (e.g., OWL and SNOMED CT's relational format). A Protégé plugin for deriving "live partial-area taxonomies" is demonstrated.

    View details for DOI 10.1016/j.jbi.2016.06.008

    View details for PubMedID 27345947

  • Utilizing a structural meta-ontology for family-based of the BioPortal ontologies JOURNAL OF BIOMEDICAL INFORMATICS Ochs, C., He, Z., Zheng, L., Geller, J., Perl, Y., Hripcsak, G., Musen, M. A. 2016; 61: 63-76
  • Utilizing a structural meta-ontology for family-based quality assurance of the BioPortal ontologies. Journal of biomedical informatics Ochs, C., He, Z., Zheng, L., Geller, J., Perl, Y., Hripcsak, G., Musen, M. A. 2016; 61: 63-76


    An Abstraction Network is a compact summary of an ontology's structure and content. In previous research, we showed that Abstraction Networks support quality assurance (QA) of biomedical ontologies. The development of an Abstraction Network and its associated QA methodologies, however, is a labor-intensive process that previously was applicable only to one ontology at a time. To improve the efficiency of the Abstraction-Network-based QA methodology, we introduced a QA framework that uses uniform Abstraction Network derivation techniques and QA methodologies that are applicable to whole families of structurally similar ontologies. For the family-based framework to be successful, it is necessary to develop a method for classifying ontologies into structurally similar families. We now describe a structural meta-ontology that classifies ontologies according to certain structural features that are commonly used in the modeling of ontologies (e.g., object properties) and that are important for Abstraction Network derivation. Each class of the structural meta-ontology represents a family of ontologies with identical structural features, indicating which types of Abstraction Networks and QA methodologies are potentially applicable to all of the ontologies in the family. We derive a collection of 81 families, corresponding to classes of the structural meta-ontology, that enable a flexible, streamlined family-based QA methodology, offering multiple choices for classifying an ontology. The structure of 373 ontologies from the NCBO BioPortal is analyzed and each ontology is classified into multiple families modeled by the structural meta-ontology.

    View details for DOI 10.1016/j.jbi.2016.03.007

    View details for PubMedID 26988001

  • Is the crowd better as an assistant or a replacement in ontology engineering? An exploration through the lens of the Gene Ontology. Journal of biomedical informatics Mortensen, J. M., Telis, N., Hughey, J. J., Fan-Minogue, H., Van Auken, K., Dumontier, M., Musen, M. A. 2016; 60: 199-209


    Biomedical ontologies contain errors. Crowdsourcing, defined as taking a job traditionally performed by a designated agent and outsourcing it to an undefined large group of people, provides scalable access to humans. Therefore, the crowd has the potential to overcome the limited accuracy and scalability found in current ontology quality assurance approaches. Crowd-based methods have identified errors in SNOMED CT, a large, clinical ontology, with an accuracy similar to that of experts, suggesting that crowdsourcing is indeed a feasible approach for identifying ontology errors. This work uses that same crowd-based methodology, as well as a panel of experts, to verify a subset of the Gene Ontology (200 relationships). Experts identified 16 errors, generally in relationships referencing acids and metals. The crowd performed poorly in identifying those errors, with an area under the receiver operating characteristic curve ranging from 0.44 to 0.73, depending on the methods configuration. However, when the crowd verified what experts considered to be easy relationships with useful definitions, they performed reasonably well. Notably, there are significantly fewer Google search results for Gene Ontology concepts than SNOMED CT concepts. This disparity may account for the difference in performance - fewer search results indicate a more difficult task for the worker. The number of Internet search results could serve as a method to assess which tasks are appropriate for the crowd. These results suggest that the crowd fits better as an expert assistant, helping experts with their verification by completing the easy tasks and allowing experts to focus on the difficult tasks, rather than an expert replacement.

    View details for DOI 10.1016/j.jbi.2016.02.005

    View details for PubMedID 26873781

  • How to apply Markov chains for modeling sequential edit patterns in collaborative ontology-engineering projects INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES Walk, S., Singer, P., Strohmaier, M., Helic, D., Noy, N. F., Musen, M. A. 2015; 84: 51-66
  • The center for expanded data annotation and retrieval. Journal of the American Medical Informatics Association Musen, M. A., Bean, C. A., Cheung, K., Dumontier, M., Durante, K. A., Gevaert, O., Gonzalez-Beltran, A., Khatri, P., Kleinstein, S. H., O'Connor, M. J., Pouliot, Y., Rocca-Serra, P., Sansone, S., Wiser, J. A. 2015; 22 (6): 1148-1152


    The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates. The metadata repository not only will capture annotations specified when experimental datasets are initially created, but also will incorporate links to the published literature, including secondary analyses and possible refinements or retractions of experimental interpretations. By working initially with the Human Immunology Project Consortium and the developers of the ImmPort data repository, we are developing and evaluating an end-to-end solution to the problems of metadata authoring and management that will generalize to other data-management environments.

    View details for DOI 10.1093/jamia/ocv048

    View details for PubMedID 26112029

  • Analysis and Prediction of User Editing Patterns in Ontology Development Projects. Journal on data semantics Wang, H., Tudorache, T., Dou, D., Noy, N. F., Musen, M. A. 2015; 4 (2): 117-132


    The development of real-world ontologies is a complex undertaking, commonly involving a group of domain experts with different expertise that work together in a collaborative setting. These ontologies are usually large scale and have complex structures. To assist in the authoring process, ontology tools are key at making the editing process as streamlined as possible. Being able to predict confidently what the users are likely to do next as they edit an ontology will enable us to focus and structure the user interface accordingly and to facilitate more efficient interaction and information discovery. In this paper, we use data mining, specifically the association rule mining, to investigate whether we are able to predict the next editing operation that a user will make based on the change history. We simulated and evaluated continuous prediction across time using sliding window model. We used the association rule mining to generate patterns from the ontology change logs in the training window and tested these patterns on logs in the adjacent testing window. We also evaluated the impact of different training and testing window sizes on the prediction accuracies. At last, we evaluated our prediction accuracies across different user groups and different ontologies. Our results indicate that we can indeed predict the next editing operation a user is likely to make. We will use the discovered editing patterns to develop a recommendation module for our editing tools, and to design user interface components that better fit with the user editing behaviors.

    View details for PubMedID 26052350

  • Using ontologies to model human navigation behavior in information networks: A study based on Wikipedia SEMANTIC WEB Lamprecht, D., Strohmaier, M., Helic, D., Nyulas, C., Tudorache, T., Noy, N. F., Musen, M. A. 2015; 6 (4): 403-422

    View details for DOI 10.3233/SW-140143

    View details for Web of Science ID 000357876600011

  • Toward a science of learning systems: a research agenda for the high-functioning Learning Health System JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Friedman, C., Rubin, J., Brown, J., Buntin, M., Corn, M., Etheredge, L., Gunter, C., Musen, M., Platt, R., Stead, W., Sullivan, K., Van Houweling, D. 2015; 22 (1): 43-50


    The capability to share data, and harness its potential to generate knowledge rapidly and inform decisions, can have transformative effects that improve health. The infrastructure to achieve this goal at scale--marrying technology, process, and policy--is commonly referred to as the Learning Health System (LHS). Achieving an LHS raises numerous scientific challenges.The National Science Foundation convened an invitational workshop to identify the fundamental scientific and engineering research challenges to achieving a national-scale LHS. The workshop was planned by a 12-member committee and ultimately engaged 45 prominent researchers spanning multiple disciplines over 2?days in Washington, DC on 11-12 April 2013.The workshop participants collectively identified 106 research questions organized around four system-level requirements that a high-functioning LHS must satisfy. The workshop participants also identified a new cross-disciplinary integrative science of cyber-social ecosystems that will be required to address these challenges.The intellectual merit and potential broad impacts of the innovations that will be driven by investments in an LHS are of great potential significance. The specific research questions that emerged from the workshop, alongside the potential for diverse communities to assemble to address them through a 'new science of learning systems', create an important agenda for informatics and related disciplines.

    View details for DOI 10.1136/amiajnl-2014-002977

    View details for Web of Science ID 000352771100007

    View details for PubMedID 25342177

  • A Method to Compare ICF and SNOMED CT for Coverage of U.S. Social Security Administration's Disability Listing Criteria. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Tu, S. W., Nyulas, C. I., Tudorache, T., Musen, M. A. 2015; 2015: 1224-1233


    We developed a method to evaluate the extent to which the International Classification of Function, Disability, and Health (ICF) and SNOMED CT cover concepts used in the disability listing criteria of the U.S. Social Security Administration's "Blue Book." First we decomposed the criteria into their constituent concepts and relationships. We defined different types of mappings and manually mapped the recognized concepts and relationships to either ICF or SNOMED CT. We defined various metrics for measuring the coverage of each terminology, taking into account the effects of inexact matches and frequency of occurrence. We validated our method by mapping the terms in the disability criteria of Adult Listings, Chapter 12 (Mental Disorders). SNOMED CT dominates ICF in almost all the metrics that we have computed. The method is applicable for determining any terminology's coverage of eligibility criteria.

    View details for PubMedID 26958262

  • Automating Identification of Multiple Chronic Conditions in Clinical Practice Guidelines. AMIA Joint Summits on Translational Science proceedings AMIA Summit on Translational Science Leung, T. I., Jalal, H., Zulman, D. M., Dumontier, M., Owens, D. K., Musen, M. A., Goldstein, M. K. 2015; 2015: 456-460


    Many clinical practice guidelines (CPGs) are intended to provide evidence-based guidance to clinicians on a single disease, and are frequently considered inadequate when caring for patients with multiple chronic conditions (MCC), or two or more chronic conditions. It is unclear to what degree disease-specific CPGs provide guidance about MCC. In this study, we develop a method for extracting knowledge from single-disease chronic condition CPGs to determine how frequently they mention commonly co-occurring chronic diseases. We focus on 15 highly prevalent chronic conditions. We use publicly available resources, including a repository of guideline summaries from the National Guideline Clearinghouse to build a text corpus, a data dictionary of ICD-9 codes from the Medicare Chronic Conditions Data Warehouse (CCW) to construct an initial list of disease terms, and disease synonyms from the National Center for Biomedical Ontology to enhance the list of disease terms. First, for each disease guideline, we determined the frequency of comorbid condition mentions (a disease-comorbidity pair) by exactly matching disease synonyms in the text corpus. Then, we developed an annotated reference standard using a sample subset of guidelines. We used this reference standard to evaluate our approach. Then, we compared the co-prevalence of common pairs of chronic conditions from Medicare CCW data to the frequency of disease-comorbidity pairs in CPGs. Our results show that some disease-comorbidity pairs occur more frequently in CPGs than others. Sixty-one (29.0%) of 210 possible disease-comorbidity pairs occurred zero times; for example, no guideline on chronic kidney disease mentioned depression, while heart failure guidelines mentioned ischemic heart disease the most frequently. Our method adequately identifies comorbid chronic conditions in CPG recommendations with precision 0.82, recall 0.75, and F-measure 0.78. Our work identifies knowledge currently embedded in the free text of clinical practice guideline recommendations and provides an initial view of the extent to which CPGs mention common comorbid conditions. Knowledge extracted from CPG text in this way may be useful to inform gaps in guideline recommendations regarding MCC and therefore identify potential opportunities for guideline improvement.

    View details for PubMedID 26306285

  • Discovering beaten paths in collaborative ontology-engineering projects using Markov chains. Journal of biomedical informatics Walk, S., Singer, P., Strohmaier, M., Tudorache, T., Musen, M. A., Noy, N. F. 2014; 51: 254-271


    Biomedical taxonomies, thesauri and ontologies in the form of the International Classification of Diseases as a taxonomy or the National Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in acquiring, representing and processing information about human health. With increasing adoption and relevance, biomedical ontologies have also significantly increased in size. For example, the 11th revision of the International Classification of Diseases, which is currently under active development by the World Health Organization contains nearly 50,000 classes representing a vast variety of different diseases and causes of death. This evolution in terms of size was accompanied by an evolution in the way ontologies are engineered. Because no single individual has the expertise to develop such large-scale ontologies, ontology-engineering projects have evolved from small-scale efforts involving just a few domain experts to large-scale projects that require effective collaboration between dozens or even hundreds of experts, practitioners and other stakeholders. Understanding the way these different stakeholders collaborate will enable us to improve editing environments that support such collaborations. In this paper, we uncover how large ontology-engineering projects, such as the International Classification of Diseases in its 11th revision, unfold by analyzing usage logs of five different biomedical ontology-engineering projects of varying sizes and scopes using Markov chains. We discover intriguing interaction patterns (e.g., which properties users frequently change after specific given ones) that suggest that large collaborative ontology-engineering projects are governed by a few general principles that determine and drive development. From our analysis, we identify commonalities and differences between different projects that have implications for project managers, ontology editors, developers and contributors working on collaborative ontology-engineering projects and tools in the biomedical domain.

    View details for DOI 10.1016/j.jbi.2014.06.004

    View details for PubMedID 24953242

  • Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains JOURNAL OF BIOMEDICAL INFORMATICS Walk, S., Singer, P., Strohmaier, M., Tudorache, T., Musen, M. A., Noy, N. F. 2014; 51: 254-271
  • WebProtege: a collaborative Web-based platform for editing biomedical ontologies BIOINFORMATICS Horridge, M., Tudorache, T., Nuylas, C., Vendetti, J., Noy, N. F., Musen, M. A. 2014; 30 (16): 2384-2385


    WebProtégé is an open-source Web application for editing OWL 2 ontologies. It contains several features to aid collaboration, including support for the discussion of issues, change notification and revision-based change tracking. WebProtégé also features a simple user interface, which is geared towards editing the kinds of class descriptions and annotations that are prevalent throughout biomedical ontologies. Moreover, it is possible to configure the user interface using views that are optimized for editing Open Biomedical Ontology (OBO) class descriptions and metadata. Some of these views are shown in the Supplementary Material and can be seen in WebProtégé itself by configuring the project as an OBO project.WebProtégé is freely available for use on the Web at It is implemented in Java and JavaScript using the OWL API and the Google Web Toolkit. All major browsers are supported. For users who do not wish to host their ontologies on the Stanford servers, WebProtégé is available as a Web app that can be run locally using a Servlet container such as Tomcat. Binaries, source code and documentation are available under an open-source license at data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btu256

    View details for Web of Science ID 000342746000022

    View details for PubMedID 24771560

  • Cross-domain targeted ontology subsets for annotation: The case of SNOMED CORE and RxNorm JOURNAL OF BIOMEDICAL INFORMATICS Lopez-Garcia, P., LePendu, P., Musen, M., Illarramendi, A. 2014; 47: 105-111


    The benefits of using ontology subsets versus full ontologies are well-documented for many applications. In this study, we propose an efficient subset extraction approach for a domain using a biomedical ontology repository with mappings, a cross-ontology, and a source subset from a related domain. As a case study, we extracted a subset of drugs from RxNorm using the UMLS Metathesaurus, the NDF-RT cross-ontology, and the CORE problem list subset of SNOMED CT. The extracted subset, which we termed RxNorm/CORE, was 4% the size of the full RxNorm (0.4% when considering ingredients only). For evaluation, we used CORE and RxNorm/CORE as thesauri for the annotation of clinical documents and compared their performance to that of their respective full ontologies (i.e., SNOMED CT and RxNorm). The wide range in recall of both CORE (29-69%) and RxNorm/CORE (21-35%) suggests that more quantitative research is needed to assess the benefits of using ontology subsets as thesauri in annotation applications. Our approach to subset extraction, however, opens a door to help create other types of clinically useful domain specific subsets and acts as an alternative in scenarios where well-established subset extraction techniques might suffer from difficulties or cannot be applied.

    View details for DOI 10.1016/j.jbi.2013.09.011

    View details for Web of Science ID 000333004500011

    View details for PubMedID 24095962

  • Investigating Collaboration Dynamics in Different Ontology Development Environments KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2014 Rospocher, M., Tudorache, T., Musen, M. A. 2014; 8793: 302-313
  • An empirically derived taxonomy of errors in SNOMED CT. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Mortensen, J. M., Musen, M. A., Noy, N. F. 2014; 2014: 899-906


    Ontologies underpin methods throughout biomedicine and biomedical informatics. However, as ontologies increase in size and complexity, so does the likelihood that they contain errors. Effective methods that identify errors are typically manual and expert-driven; however, automated methods are essential for the size of modern biomedical ontologies. The effect of ontology errors on their application is unclear, creating a challenge in differentiating salient, relevant errors with those that have no discernable effect. As a first step in understanding the challenge of identifying salient, common errors at a large scale, we asked 5 experts to verify a random subset of complex relations in the SNOMED CT CORE Problem List Subset. The experts found 39 errors that followed several common patterns. Initially, the experts disagreed about errors almost entirely, indicating that ontology verification is very difficult and requires many eyes on the task. It is clear that additional empirically-based, application-focused ontology verification method development is necessary. Toward that end, we developed a taxonomy that can serve as a checklist to consult during ontology quality assurance.

    View details for PubMedID 25954397

  • A Study on the Atomic Decomposition of Ontologies SEMANTIC WEB - ISWC 2014, PT II Horridge, M., Mortensen, J. M., Parsia, B., Sattler, U., Musen, M. A. 2014; 8797: 65-80
  • How ontologies are made: Studying the hidden social dynamics behind collaborative ontology engineering projects JOURNAL OF WEB SEMANTICS Strohmaier, M., Walk, S., Poeschko, J., Lamprecht, D., Tudorache, T., Nyulas, C., Musen, M. A., Noy, N. F. 2013; 20: 18-34
  • The knowledge acquisition workshops: A remarkable convergence of ideas INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES Musen, M. A. 2013; 71 (2): 195-199
  • WebProtege: A collaborative ontology editor and knowledge acquisition tool for the Web SEMANTIC WEB Tudorache, T., Nyulas, C., Noy, N. F., Musen, M. A. 2013; 4 (1): 89-99


    In this paper, we present WebProtégé-a lightweight ontology editor and knowledge acquisition tool for the Web. With the wide adoption of Web 2.0 platforms and the gradual adoption of ontologies and Semantic Web technologies in the real world, we need ontology-development tools that are better suited for the novel ways of interacting, constructing and consuming knowledge. Users today take Web-based content creation and online collaboration for granted. WebProtégé integrates these features as part of the ontology development process itself. We tried to lower the entry barrier to ontology development by providing a tool that is accessible from any Web browser, has extensive support for collaboration, and a highly customizable and pluggable user interface that can be adapted to any level of user expertise. The declarative user interface enabled us to create custom knowledge-acquisition forms tailored for domain experts. We built WebProtégé using the existing Protégé infrastructure, which supports collaboration on the back end side, and the Google Web Toolkit for the front end. The generic and extensible infrastructure allowed us to easily deploy WebProtégé in production settings for several projects. We present the main features of WebProtégé and its architecture and describe briefly some of its uses for real-world projects. WebProtégé is free and open source. An online demo is available at

    View details for DOI 10.3233/SW-2012-0057

    View details for Web of Science ID 000209437000007

  • BioPortal as a dataset of linked biomedical ontologies and terminologies in RDF SEMANTIC WEB Salvadores, M., Alexander, P. R., Musen, M. A., Noy, N. F. 2013; 4 (3): 277-284
  • Analysis of User Editing Patterns in Ontology Development Projects ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2013 CONFERENCES Wang, H., Tudorache, T., Dou, D., Noy, N. F., Musen, M. A. 2013; 8185: 470-487
  • Getting Lucky in Ontology Search: A Data-Driven Evaluation Framework for Ontology Ranking SEMANTIC WEB - ISWC 2013, PART I Noy, N. F., Alexander, P. R., Harpaz, R., Whetzel, P. L., Fergerson, R. W., Musen, M. A. 2013; 8218: 444-459
  • Simplified OWL Ontology Editing for the Web: Is WebProtege Enough? SEMANTIC WEB - ISWC 2013, PART I Horridge, M., Tudorache, T., Vendetti, J., Nyulas, C. I., Musen, M. A., Noy, N. F. 2013; 8218: 200-215
  • Crowdsourcing the verification of relationships in biomedical ontologies. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Mortensen, J. M., Musen, M. A., Noy, N. F. 2013; 2013: 1020-1029


    Biomedical ontologies are often large and complex, making ontology development and maintenance a challenge. To address this challenge, scientists use automated techniques to alleviate the difficulty of ontology development. However, for many ontology-engineering tasks, human judgment is still necessary. Microtask crowdsourcing, wherein human workers receive remuneration to complete simple, short tasks, is one method to obtain contributions by humans at a large scale. Previously, we developed and refined an effective method to verify ontology hierarchy using microtask crowdsourcing. In this work, we report on applying this method to find errors in the SNOMED CT CORE subset. By using crowdsourcing via Amazon Mechanical Turk with a Bayesian inference model, we correctly verified 86% of the relations from the CORE subset of SNOMED CT in which Rector and colleagues previously identified errors via manual inspection. Our results demonstrate that an ontology developer could deploy this method in order to audit large-scale ontologies quickly and relatively cheaply.

    View details for PubMedID 24551391

  • PragmatiX: An Interactive Tool for Visualizing the Creation Process Behind Collaboratively Engineered Ontologies INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS Walk, S., Poeschko, J., Strohmaier, M., Andrews, K., Tudorache, T., Noy, N. F., Nyulas, C., Musen, M. A. 2013; 9 (1): 45-78
  • Chapter 9: Analyses Using Disease Ontologies PLOS COMPUTATIONAL BIOLOGY Shah, N. H., Cole, T., Musen, M. A. 2012; 8 (12)


    Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of "significant genes." One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is widely used to makes sense of the results of high-throughput experiments. The canonical example of enrichment analysis is when the output dataset is a list of genes differentially expressed in some condition. To determine the biological relevance of a lengthy gene list, the usual solution is to perform enrichment analysis with the GO. We can aggregate the annotating GO concepts for each gene in this list, and arrive at a profile of the biological processes or mechanisms affected by the condition under study. While GO has been the principal target for enrichment analysis, the methods of enrichment analysis are generalizable. We can conduct the same sort of profiling along other ontologies of interest. Just as scientists can ask "Which biological process is over-represented in my set of interesting genes or proteins?" we can also ask "Which disease (or class of diseases) is over-represented in my set of interesting genes or proteins?". For example, by annotating known protein mutations with disease terms from the ontologies in BioPortal, Mort et al. recently identified a class of diseases--blood coagulation disorders--that were associated with a 14-fold depletion in substitutions at O-linked glycosylation sites. With the availability of tools for automatic annotation of datasets with terms from disease ontologies, there is no reason to restrict enrichment analyses to the GO. In this chapter, we will discuss methods to perform enrichment analysis using any ontology available in the biomedical domain. We will review the general methodology of enrichment analysis, the associated challenges, and discuss the novel translational analyses enabled by the existence of public, national computational infrastructure and by the use of disease ontologies in such analyses.

    View details for DOI 10.1371/journal.pcbi.1002827

    View details for Web of Science ID 000312901500032

    View details for PubMedID 23300417

  • AMIA Board white paper: definition of biomedical informatics and specification of core competencies for graduate education in the discipline JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Kulikowski, C. A., Shortliffe, E. H., Currie, L. M., Elkin, P. L., Hunter, L. E., Johnson, T. R., Kalet, I. J., Lenert, L. A., Musen, M. A., Ozbolt, J. G., Smith, J. W., Tarczy-Hornoch, P. Z., Williamson, J. J. 2012; 19 (6): 931-938


    The AMIA biomedical informatics (BMI) core competencies have been designed to support and guide graduate education in BMI, the core scientific discipline underlying the breadth of the field's research, practice, and education. The core definition of BMI adopted by AMIA specifies that BMI is 'the interdisciplinary field that studies and pursues the effective uses of biomedical data, information, and knowledge for scientific inquiry, problem solving and decision making, motivated by efforts to improve human health.' Application areas range from bioinformatics to clinical and public health informatics and span the spectrum from the molecular to population levels of health and biomedicine. The shared core informatics competencies of BMI draw on the practical experience of many specific informatics sub-disciplines. The AMIA BMI analysis highlights the central shared set of competencies that should guide curriculum design and that graduate students should be expected to master.

    View details for DOI 10.1136/amiajnl-2012-001053

    View details for Web of Science ID 000310408500002

    View details for PubMedID 22683918

  • Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Wu, S. T., Liu, H., Li, D., Tao, C., Musen, M. A., Chute, C. G., Shah, N. H. 2012; 19 (E1): E149-E156


    To characterise empirical instances of Unified Medical Language System (UMLS) Metathesaurus term strings in a large clinical corpus, and to illustrate what types of term characteristics are generalisable across data sources.Based on the occurrences of UMLS terms in a 51 million document corpus of Mayo Clinic clinical notes, this study computes statistics about the terms' string attributes, source terminologies, semantic types and syntactic categories. Term occurrences in 2010 i2b2/VA text were also mapped; eight example filters were designed from the Mayo-based statistics and applied to i2b2/VA data.For the corpus analysis, negligible numbers of mapped terms in the Mayo corpus had over six words or 55 characters. Of source terminologies in the UMLS, the Consumer Health Vocabulary and Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) had the best coverage in Mayo clinical notes at 106426 and 94788 unique terms, respectively. Of 15 semantic groups in the UMLS, seven groups accounted for 92.08% of term occurrences in Mayo data. Syntactically, over 90% of matched terms were in noun phrases. For the cross-institutional analysis, using five example filters on i2b2/VA data reduces the actual lexicon to 19.13% of the size of the UMLS and only sees a 2% reduction in matched terms.The corpus statistics presented here are instructive for building lexicons from the UMLS. Features intrinsic to Metathesaurus terms (well formedness, length and language) generalise easily across clinical institutions, but term frequencies should be adapted with caution. The semantic groups of mapped terms may differ slightly from institution to institution, but they differ greatly when moving to the biomedical literature domain.

    View details for DOI 10.1136/amiajnl-2011-000744

    View details for Web of Science ID 000314151400025

    View details for PubMedID 22493050

  • The National Center for Biomedical Ontology JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Musen, M. A., Noy, N. F., Shah, N. H., Whetzel, P. L., Chute, C. G., Story, M., Smith, B. 2012; 19 (2): 190-195


    The National Center for Biomedical Ontology is now in its seventh year. The goals of this National Center for Biomedical Computing are to: create and maintain a repository of biomedical ontologies and terminologies; build tools and web services to enable the use of ontologies and terminologies in clinical and translational research; educate their trainees and the scientific community broadly about biomedical ontology and ontology-based technology and best practices; and collaborate with a variety of groups who develop and use ontologies and terminologies in biomedicine. The centerpiece of the National Center for Biomedical Ontology is a web-based resource known as BioPortal. BioPortal makes available for research in computationally useful forms more than 270 of the world's biomedical ontologies and terminologies, and supports a wide range of web services that enable investigators to use the ontologies to annotate and retrieve data, to generate value sets and special-purpose lexicons, and to perform advanced analytics on a wide range of biomedical data.

    View details for DOI 10.1136/amiajnl-2011-000523

    View details for Web of Science ID 000300768100010

    View details for PubMedID 22081220

  • Applications of ontology design patterns in biomedical ontologies. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Mortensen, J. M., Horridge, M., Musen, M. A., Noy, N. F. 2012; 2012: 643-652


    Ontology design patterns (ODPs) are a proposed solution to facilitate ontology development, and to help users avoid some of the most frequent modeling mistakes. ODPs originate from similar approaches in software engineering, where software design patterns have become a critical aspect of software development. There is little empirical evidence for ODP prevalence or effectiveness thus far. In this work, we determine the use and applicability of ODPs in a case study of biomedical ontologies. We encoded ontology design patterns from two ODP catalogs. We then searched for these patterns in a set of eight ontologies. We found five patterns of the 69 patterns. Two of the eight ontologies contained these patterns. While ontology design patterns provide a vehicle for capturing formally reoccurring models and best practices in ontology design, we show that today their use in a case study of widely used biomedical ontologies is limited.

    View details for PubMedID 23304337

  • Deriving an abstraction network to support quality assurance in OCRe. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Ochs, C., Agrawal, A., Perl, Y., Halper, M., Tu, S. W., Carini, S., Sim, I., Noy, N., Musen, M., Geller, J. 2012; 2012: 681-689


    An abstraction network is an auxiliary network of nodes and links that provides a compact, high-level view of an ontology. Such a view lends support to ontology orientation, comprehension, and quality-assurance efforts. A methodology is presented for deriving a kind of abstraction network, called a partial-area taxonomy, for the Ontology of Clinical Research (OCRe). OCRe was selected as a representative of ontologies implemented using the Web Ontology Language (OWL) based on shared domains. The derivation of the partial-area taxonomy for the Entity hierarchy of OCRe is described. Utilizing the visualization of the content and structure of the hierarchy provided by the taxonomy, the Entity hierarchy is audited, and several errors and inconsistencies in OCRe's modeling of its domain are exposed. After appropriate corrections are made to OCRe, a new partial-area taxonomy is derived. The generalizability of the paradigm of the derivation methodology to various families of biomedical ontologies is discussed.

    View details for PubMedID 23304341

  • Enabling enrichment analysis with the Human Disease Ontology. Journal of biomedical informatics LePendu, P., Musen, M. A., Shah, N. H. 2011; 44: S31-8


    Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of "significant genes." One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene set, and is widely used to make sense of the results of high-throughput experiments. Our goal is to develop and apply general enrichment analysis methods to profile other sets of interest, such as patient cohorts from the electronic medical record, using a variety of ontologies including SNOMED CT, MedDRA, RxNorm, and others. Although it is possible to perform enrichment analysis using ontologies other than the GO, a key pre-requisite is the availability of a background set of annotations to enable the enrichment calculation. In the case of the GO, this background set is provided by the Gene Ontology Annotations. In the current work, we describe: (i) a general method that uses hand-curated GO annotations as a starting point for creating background datasets for enrichment analysis using other ontologies; and (ii) a gene-disease background annotation set - that enables disease-based enrichment - to demonstrate feasibility of our method.

    View details for DOI 10.1016/j.jbi.2011.04.007

    View details for PubMedID 21550421

  • Empowering industrial research with shared biomedical vocabularies DRUG DISCOVERY TODAY Harland, L., Larminie, C., Sansone, S., Popa, S., Marshall, M. S., Braxenthaler, M., Cantor, M., Filsell, W., Forster, M. J., Huang, E., Matern, A., Musen, M., Saric, J., Slater, T., Wilson, J., Lynch, N., Wise, J., Dix, I. 2011; 16 (21-22): 940-947


    The life science industries (including pharmaceuticals, agrochemicals and consumer goods) are exploring new business models for research and development that focus on external partnerships. In parallel, there is a desire to make better use of data obtained from sources such as human clinical samples to inform and support early research programmes. Success in both areas depends upon the successful integration of heterogeneous data from multiple providers and scientific domains, something that is already a major challenge within the industry. This issue is exacerbated by the absence of agreed standards that unambiguously identify the entities, processes and observations within experimental results. In this article we highlight the risks to future productivity that are associated with incomplete biological and chemical vocabularies and suggest a new model to address this long-standing issue.

    View details for DOI 10.1016/j.drudis.2011.09.013

    View details for Web of Science ID 000297400300005

    View details for PubMedID 21963522

  • NCBO Resource Index: Ontology-based search and mining of biomedical resources JOURNAL OF WEB SEMANTICS Jonquet, C., LePendu, P., Falconer, S., Coulet, A., Noy, N. F., Musen, M. A., Shah, N. H. 2011; 9 (3): 316-324


    The volume of publicly available data in biomedicine is constantly increasing. However, these data are stored in different formats and on different platforms. Integrating these data will enable us to facilitate the pace of medical discoveries by providing scientists with a unified view of this diverse information. Under the auspices of the National Center for Biomedical Ontology (NCBO), we have developed the Resource Index-a growing, large-scale ontology-based index of more than twenty heterogeneous biomedical resources. The resources come from a variety of repositories maintained by organizations from around the world. We use a set of over 200 publicly available ontologies contributed by researchers in various domains to annotate the elements in these resources. We use the semantics that the ontologies encode, such as different properties of classes, the class hierarchies, and the mappings between ontologies, in order to improve the search experience for the Resource Index user. Our user interface enables scientists to search the multiple resources quickly and efficiently using domain terms, without even being aware that there is semantics "under the hood."

    View details for DOI 10.1016/j.websem.2011.06.005

    View details for Web of Science ID 000300169800007

    View details for PubMedID 21918645

  • BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications NUCLEIC ACIDS RESEARCH Whetzel, P. L., Noy, N. F., Shah, N. H., Alexander, P. R., Nyulas, C., Tudorache, T., Musen, M. A. 2011; 39: W541-W545


    The National Center for Biomedical Ontology (NCBO) is one of the National Centers for Biomedical Computing funded under the NIH Roadmap Initiative. Contributing to the national computing infrastructure, NCBO has developed BioPortal, a web portal that provides access to a library of biomedical ontologies and terminologies ( via the NCBO Web services. BioPortal enables community participation in the evaluation and evolution of ontology content by providing features to add mappings between terms, to add comments linked to specific ontology terms and to provide ontology reviews. The NCBO Web services ( enable this functionality and provide a uniform mechanism to access ontologies from a variety of knowledge representation formats, such as Web Ontology Language (OWL) and Open Biological and Biomedical Ontologies (OBO) format. The Web services provide multi-layered access to the ontology content, from getting all terms in an ontology to retrieving metadata about a term. Users can easily incorporate the NCBO Web services into software applications to generate semantically aware applications and to facilitate structured data collection.

    View details for DOI 10.1093/nar/gkr469

    View details for Web of Science ID 000292325300088

    View details for PubMedID 21672956

  • The Biomedical Resource Ontology (BRO) to enable resource discovery in clinical and translational research JOURNAL OF BIOMEDICAL INFORMATICS Tenenbaum, J. D., Whetzel, P. L., Anderson, K., Borromeo, C. D., Dinov, I. D., Gabriel, D., Kirschner, B., Mirel, B., Morris, T., Noy, N., Nyulas, C., Rubenson, D., Saxman, P. R., Singh, H., Whelan, N., Wright, Z., Athey, B. D., Becich, M. J., Ginsburg, G. S., Musen, M. A., Smith, K. A., Tarantal, A. F., Rubin, D. L., Lyster, P. 2011; 44 (1): 137-145


    The biomedical research community relies on a diverse set of resources, both within their own institutions and at other research centers. In addition, an increasing number of shared electronic resources have been developed. Without effective means to locate and query these resources, it is challenging, if not impossible, for investigators to be aware of the myriad resources available, or to effectively perform resource discovery when the need arises. In this paper, we describe the development and use of the Biomedical Resource Ontology (BRO) to enable semantic annotation and discovery of biomedical resources. We also describe the Resource Discovery System (RDS) which is a federated, inter-institutional pilot project that uses the BRO to facilitate resource discovery on the Internet. Through the RDS framework and its associated Biositemaps infrastructure, the BRO facilitates semantic search and discovery of biomedical resources, breaking down barriers and streamlining scientific research that will improve human health.

    View details for DOI 10.1016/j.jbi.2010.10.003

    View details for Web of Science ID 000288289900015

    View details for PubMedID 20955817

  • How orthogonal are the OBO Foundry ontologies? Journal of biomedical semantics Ghazvinian, A., Noy, N. F., Musen, M. A. 2011; 2: S2-?


    Ontologies in biomedicine facilitate information integration, data exchange, search and query of biomedical data, and other critical knowledge-intensive tasks. The OBO Foundry is a collaborative effort to establish a set of principles for ontology development with the eventual goal of creating a set of interoperable reference ontologies in the domain of biomedicine. One of the key requirements to achieve this goal is to ensure that ontology developers reuse term definitions that others have already created rather than create their own definitions, thereby making the ontologies orthogonal.We used a simple lexical algorithm to analyze the extent to which the set of OBO Foundry candidate ontologies identified from September 2009 to September 2010 conforms to this vision. Specifically, we analyzed (1) the level of explicit term reuse in this set of ontologies, (2) the level of overlap, where two ontologies define similar terms independently, and (3) how the levels of reuse and overlap changed during the course of this year.We found that 30% of the ontologies reuse terms from other Foundry candidates and 96% of the candidate ontologies contain terms that overlap with terms from the other ontologies. We found that while term reuse increased among the ontologies between September 2009 and September 2010, the level of overlap among the ontologies remained relatively constant. Additionally, we analyzed the six ontologies announced as OBO Foundry members on March 5, 2010, and identified that the level of overlap was extremely low, but, notably, so was the level of term reuse.We have created a prototype web application that allows OBO Foundry ontology developers to see which classes from their ontologies overlap with classes from other ontologies in the OBO Foundry ( From our analysis, we conclude that while the OBO Foundry has made significant progress toward orthogonality during the period of this study through increased adoption of explicit term reuse, a large amount of overlap remains among these ontologies. Furthermore, the characteristics of the identified overlap, such as the terms it comprises and its distribution among the ontologies, indicate that the achieving orthogonality will be exceptionally difficult, if not impossible.

    View details for DOI 10.1186/2041-1480-2-S2-S2

    View details for PubMedID 21624157

  • Integration and publication of heterogeneous text-mined relationships on the Semantic Web. Journal of biomedical semantics Coulet, A., Garten, Y., Dumontier, M., Altman, R. B., Musen, M. A., Shah, N. H. 2011; 2: S10-?


    Advances in Natural Language Processing (NLP) techniques enable the extraction of fine-grained relationships mentioned in biomedical text. The variability and the complexity of natural language in expressing similar relationships causes the extracted relationships to be highly heterogeneous, which makes the construction of knowledge bases difficult and poses a challenge in using these for data mining or question answering.We report on the semi-automatic construction of the PHARE relationship ontology (the PHArmacogenomic RElationships Ontology) consisting of 200 curated relations from over 40,000 heterogeneous relationships extracted via text-mining. These heterogeneous relations are then mapped to the PHARE ontology using synonyms, entity descriptions and hierarchies of entities and roles. Once mapped, relationships can be normalized and compared using the structure of the ontology to identify relationships that have similar semantics but different syntax. We compare and contrast the manual procedure with a fully automated approach using WordNet to quantify the degree of integration enabled by iterative curation and refinement of the PHARE ontology. The result of such integration is a repository of normalized biomedical relationships, named PHARE-KB, which can be queried using Semantic Web technologies such as SPARQL and can be visualized in the form of a biological network.The PHARE ontology serves as a common semantic framework to integrate more than 40,000 relationships pertinent to pharmacogenomics. The PHARE ontology forms the foundation of a knowledge base named PHARE-KB. Once populated with relationships, PHARE-KB (i) can be visualized in the form of a biological network to guide human tasks such as database curation and (ii) can be queried programmatically to guide bioinformatics applications such as the prediction of molecular interactions. PHARE is available at

    View details for DOI 10.1186/2041-1480-2-S2-S10

    View details for PubMedID 21624156

  • Using text to build semantic networks for pharmacogenomics JOURNAL OF BIOMEDICAL INFORMATICS Coulet, A., Shah, N. H., Garten, Y., Musen, M., Altman, R. B. 2010; 43 (6): 1009-1019


    Most pharmacogenomics knowledge is contained in the text of published studies, and is thus not available for automated computation. Natural Language Processing (NLP) techniques for extracting relationships in specific domains often rely on hand-built rules and domain-specific ontologies to achieve good performance. In a new and evolving field such as pharmacogenomics (PGx), rules and ontologies may not be available. Recent progress in syntactic NLP parsing in the context of a large corpus of pharmacogenomics text provides new opportunities for automated relationship extraction. We describe an ontology of PGx relationships built starting from a lexicon of key pharmacogenomic entities and a syntactic parse of more than 87 million sentences from 17 million MEDLINE abstracts. We used the syntactic structure of PGx statements to systematically extract commonly occurring relationships and to map them to a common schema. Our extracted relationships have a 70-87.7% precision and involve not only key PGx entities such as genes, drugs, and phenotypes (e.g., VKORC1, warfarin, clotting disorder), but also critical entities that are frequently modified by these key entities (e.g., VKORC1 polymorphism, warfarin response, clotting disorder treatment). The result of our analysis is a network of 40,000 relationships between more than 200 entity types with clear semantics. This network is used to guide the curation of PGx knowledge and provide a computable resource for knowledge discovery.

    View details for DOI 10.1016/j.jbi.2010.08.005

    View details for Web of Science ID 000285036700017

    View details for PubMedID 20723615

  • Building a biomedical ontology recommender web service. Journal of biomedical semantics Jonquet, C., Musen, M. A., Shah, N. H. 2010; 1: S1-?


    Researchers in biomedical informatics use ontologies and terminologies to annotate their data in order to facilitate data integration and translational discoveries. As the use of ontologies for annotation of biomedical datasets has risen, a common challenge is to identify ontologies that are best suited to annotating specific datasets. The number and variety of biomedical ontologies is large, and it is cumbersome for a researcher to figure out which ontology to use.We present the Biomedical Ontology Recommender web service. The system uses textual metadata or a set of keywords describing a domain of interest and suggests appropriate ontologies for annotating or representing the data. The service makes a decision based on three criteria. The first one is coverage, or the ontologies that provide most terms covering the input text. The second is connectivity, or the ontologies that are most often mapped to by other ontologies. The final criterion is size, or the number of concepts in the ontologies. The service scores the ontologies as a function of scores of the annotations created using the National Center for Biomedical Ontology (NCBO) Annotator web service. We used all the ontologies from the UMLS Metathesaurus and the NCBO BioPortal.We compare and contrast our Recommender by an exhaustive functional comparison to previously published efforts. We evaluate and discuss the results of several recommendation heuristics in the context of three real world use cases. The best recommendations heuristics, rated 'very relevant' by expert evaluators, are the ones based on coverage and connectivity criteria. The Recommender service (alpha version) is available to the community and is embedded into BioPortal.

    View details for DOI 10.1186/2041-1480-1-S1-S1

    View details for PubMedID 20626921

  • Mapping Master: A Flexible Approach for Mapping Spreadsheets to OWL SEMANTIC WEB-ISWC 2010, PT II O'Connor, M. J., Halaschek-Wiener, C., Musen, M. A. 2010; 6497: 194-208
  • Ontology Development for the Masses: Creating ICD-11 in WebProtege KNOWLEDGE ENGINEERING AND MANAGEMENT BY THE MASSES, EKAW 2010 Tudorache, T., Falconer, S., Noy, N. F., Nyulas, C., Uestuen, T. B., Storey, M., Musen, M. A. 2010; 6317: 74-89
  • Optimize First, Buy Later: Analyzing Metrics to Ramp-Up Very Large Knowledge Bases SEMANTIC WEB-ISWC 2010, PT I LePendu, P., Noy, N. F., Jonquet, C., Alexander, P. R., Shah, N. H., Musen, M. A. 2010; 6496: 486-501
  • A Typology for Modeling Processes in Clinical Guidelines and Protocols SECURITY-ENRICHED URBAN COMPUTING AND SMART GRID Tu, S. W., Musen, M. A. 2010; 78: 545-553
  • Will Semantic Web Technologies Work for the Development of ICD-11? SEMANTIC WEB-ISWC 2010, PT II Tudorache, T., Falconer, S., Nyulas, C., Noy, N. F., Musen, M. A. 2010; 6497: 257-272
  • The Lexicon Builder Web service: Building Custom Lexicons from two hundred Biomedical Ontologies. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Parai, G. K., Jonquet, C., xu, r., Musen, M. A., Shah, N. H. 2010; 2010: 587-591


    Domain specific biomedical lexicons are extensively used by researchers for natural language processing tasks. Currently these lexicons are created manually by expert curators and there is a pressing need for automated methods to compile such lexicons. The Lexicon Builder Web service addresses this need and reduces the investment of time and effort involved in lexicon maintenance. The service has three components: Inclusion - selects one or several ontologies (or its branches) and includes preferred names and synonym terms; Exclusion - filters terms based on the term's Medline frequency, syntactic type, UMLS semantic type and match with stopwords; Output - aggregates information, handles compression and output formats. Evaluation demonstrates that the service has high accuracy and runtime performance. It is currently being evaluated for several use cases to establish its utility in biomedical information processing tasks. The Lexicon Builder promotes collaboration, sharing and standardization of lexicons amongst researchers by automating the creation, maintainence and cross referencing of custom lexicons.

    View details for PubMedID 21347046

  • A Comprehensive Analysis of Five Million UMLS Metathesaurus Terms Using Eighteen Million MEDLINE Citations. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium xu, r., Musen, M. A., Shah, N. H. 2010; 2010: 907-911


    The Unified Medical Language System (UMLS) Metathesaurus is widely used for biomedical natural language processing (NLP) tasks. In this study, we systematically analyzed UMLS Metathesaurus terms by analyzing their occurrences in over 18 million MEDLINE abstracts. Our goals were: 1. analyze the frequency and syntactic distribution of Metathesaurus terms in MEDLINE; 2. create a filtered UMLS Metathesaurus based on the MEDLINE analysis; 3. augment the UMLS Metathesaurus where each term is associated with metadata on its MEDLINE frequency and syntactic distribution statistics. After MEDLINE frequency-based filtering, the augmented UMLS Metathesaurus contains 518,835 terms and is roughly 13% of its original size. We have shown that the syntactic and frequency information is useful to identify errors in the Metathesaurus. This filtered and augmented UMLS Metathesaurus can potentially be used to improve efficiency and precision of UMLS-based information retrieval and NLP tasks.

    View details for PubMedID 21347110

  • The ontology life cycle: Integrated tools for editing, publishing, peer review, and evolution of ontologies. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Noy, N., Tudorache, T., Nyulas, C., Musen, M. 2010; 2010: 552-556


    Ontologies have become a critical component of many applications in biomedical informatics. However, the landscape of the ontology tools today is largely fragmented, with independent tools for ontology editing, publishing, and peer review: users develop an ontology in an ontology editor, such as Protégé; and publish it on a Web server or in an ontology library, such as BioPortal, in order to share it with the community; they use the tools provided by the library or mailing lists and bug trackers to collect feedback from users. In this paper, we present a set of tools that bring the ontology editing and publishing closer together, in an integrated platform for the entire ontology lifecycle. This integration streamlines the workflow for collaborative development and increases integration between the ontologies themselves through the reuse of terms.

    View details for PubMedID 21347039

  • Supporting the Collaborative Authoring of ICD-11 with WebProtégé. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Tudorache, T., Falconer, S., Nyulas, C., Storey, M., Ustün, T. B., Musen, M. A. 2010; 2010: 802-806


    The World Health Organization (WHO) is well under way with the new revision of the International Classification of Diseases (ICD-11). The current revision process is significantly different from past ones: the ICD-11 authoring is now open to a large international community of medical experts, who perform the authoring in a web-based collaborative platform. The classification is also embracing a more formal representation that is suitable for electronic health records. We present the ICD Collaborative Authoring Tool (iCAT), a customization of the WebProtégé editor that supports the community based authoring of ICD-11 on the Web and provides features such as discussion threads integrated in the authoring process, change tracking, content reviewing, and so on. The WHO editors evaluated the initial version of iCAT and found the tool intuitive and easy to learn. They also identified improvement potentials and new requirements for large-scale collaboration support. A demo version of the tool is available at:

    View details for PubMedID 21347089

  • An ontology-neutral framework for enrichment analysis. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Tirrell, R., Evani, U., Berman, A. E., Mooney, S. D., Musen, M. A., Shah, N. H. 2010; 2010: 797-801


    Advanced statistical methods used to analyze high-throughput data (e.g. gene-expression assays) result in long lists of "significant genes." One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is relevant for and extensible to data analysis with other high-throughput measurement modalities such as proteomics, metabolomics, and tissue-microarray assays. With the availability of tools for automatic ontology-based annotation of datasets with terms from biomedical ontologies besides the GO, we need not restrict enrichment analysis to the GO. We describe, RANSUM - Rich Annotation Summarizer - which performs enrichment analysis using any ontology in the National Center for Biomedical Ontology's (NCBO) BioPortal. We outline the methodology of enrichment analysis, the associated challenges, and discuss novel analyses enabled by RANSUM.

    View details for PubMedID 21347088

  • Software-engineering challenges of building and deploying reusable problem solverse AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING O'Connor, M. J., Nyulas, C., Tu, S., Buckeridge, D. L., Okhmatovskaia, A., Musen, M. A. 2009; 23 (4): 339-356


    Problem solving methods (PSMs) are software components that represent and encode reusable algorithms. They can be combined with representations of domain knowledge to produce intelligent application systems. A goal of research on PSMs is to provide principled methods and tools for composing and reusing algorithms in knowledge-based systems. The ultimate objective is to produce libraries of methods that can be easily adapted for use in these systems. Despite the intuitive appeal of PSMs as conceptual building blocks, in practice, these goals are largely unmet. There are no widely available tools for building applications using PSMs and no public libraries of PSMs available for reuse. This paper analyzes some of the reasons for the lack of widespread adoptions of PSM techniques and illustrate our analysis by describing our experiences developing a complex, high-throughput software system based on PSM principles. We conclude that many fundamental principles in PSM research are useful for building knowledge-based systems. In particular, the task-method decomposition process, which provides a means for structuring knowledge-based tasks, is a powerful abstraction for building systems of analytic methods. However, despite the power of PSMs in the conceptual modeling of knowledge-based systems, software engineering challenges have been seriously underestimated. The complexity of integrating control knowledge modeled by developers using PSMs with the domain knowledge that they model using ontologies creates a barrier to widespread use of PSM-based systems. Nevertheless, the surge of recent interest in ontologies has led to the production of comprehensive domain ontologies and of robust ontology-authoring tools. These developments present new opportunities to leverage the PSM approach.

    View details for DOI 10.1017/S0890060409990047

    View details for Web of Science ID 000271131600003

  • Development of Large-Scale Functional Brain Networks in Children PLOS BIOLOGY Supekar, K., Musen, M., Menon, V. 2009; 7 (7)


    The ontogeny of large-scale functional organization of the human brain is not well understood. Here we use network analysis of intrinsic functional connectivity to characterize the organization of brain networks in 23 children (ages 7-9 y) and 22 young-adults (ages 19-22 y). Comparison of network properties, including path-length, clustering-coefficient, hierarchy, and regional connectivity, revealed that although children and young-adults' brains have similar "small-world" organization at the global level, they differ significantly in hierarchical organization and interregional connectivity. We found that subcortical areas were more strongly connected with primary sensory, association, and paralimbic areas in children, whereas young-adults showed stronger cortico-cortical connectivity between paralimbic, limbic, and association areas. Further, combined analysis of functional connectivity with wiring distance measures derived from white-matter fiber tracking revealed that the development of large-scale brain networks is characterized by weakening of short-range functional connectivity and strengthening of long-range functional connectivity. Importantly, our findings show that the dynamic process of over-connectivity followed by pruning, which rewires connectivity at the neuronal level, also operates at the systems level, helping to reconfigure and rebalance subcortical and paralimbic connectivity in the developing brain. Our study demonstrates the usefulness of network analysis of brain connectivity to elucidate key principles underlying functional brain maturation, paving the way for novel studies of disrupted brain connectivity in neurodevelopmental disorders such as autism.

    View details for DOI 10.1371/journal.pbio.1000157

    View details for Web of Science ID 000268405700010

    View details for PubMedID 19621066

  • BioPortal: ontologies and integrated data resources at the click of a mouse NUCLEIC ACIDS RESEARCH Noy, N. F., Shah, N. H., Whetzel, P. L., Dai, B., Dorf, M., Griffith, N., Jonquet, C., Rubin, D. L., Storey, M., Chute, C. G., Musen, M. A. 2009; 37: W170-W173


    Biomedical ontologies provide essential domain knowledge to drive data integration, information retrieval, data annotation, natural-language processing and decision support. BioPortal ( is an open repository of biomedical ontologies that provides access via Web services and Web browsers to ontologies developed in OWL, RDF, OBO format and Protégé frames. BioPortal functionality includes the ability to browse, search and visualize ontologies. The Web interface also facilitates community-based participation in the evaluation and evolution of ontology content by providing features to add notes to ontology terms, mappings between terms and ontology reviews based on criteria such as usability, domain coverage, quality of content, and documentation and support. BioPortal also enables integrated search of biomedical data resources such as the Gene Expression Omnibus (GEO),, and ArrayExpress, through the annotation and indexing of these resources with ontologies in BioPortal. Thus, BioPortal not only provides investigators, clinicians, and developers 'one-stop shopping' to programmatically access biomedical ontologies, but also provides support to integrate data from a variety of biomedical resources.

    View details for DOI 10.1093/nar/gkp440

    View details for Web of Science ID 000267889100031

    View details for PubMedID 19483092

  • Computational neuroanatomy: ontology-based representation of neural components and connectivity BMC BIOINFORMATICS Rubin, D. L., Talos, I., Halle, M., Musen, M. A., Kikinis, R. 2009; 10


    A critical challenge in neuroscience is organizing, managing, and accessing the explosion in neuroscientific knowledge, particularly anatomic knowledge. We believe that explicit knowledge-based approaches to make neuroscientific knowledge computationally accessible will be helpful in tackling this challenge and will enable a variety of applications exploiting this knowledge, such as surgical planning.We developed ontology-based models of neuroanatomy to enable symbolic lookup, logical inference and mathematical modeling of neural systems. We built a prototype model of the motor system that integrates descriptive anatomic and qualitative functional neuroanatomical knowledge. In addition to modeling normal neuroanatomy, our approach provides an explicit representation of abnormal neural connectivity in disease states, such as common movement disorders. The ontology-based representation encodes both structural and functional aspects of neuroanatomy. The ontology-based models can be evaluated computationally, enabling development of automated computer reasoning applications.Neuroanatomical knowledge can be represented in machine-accessible format using ontologies. Computational neuroanatomical approaches such as described in this work could become a key tool in translational informatics, leading to decision support applications that inform and guide surgical planning and personalized care for neurological disease in the future.

    View details for DOI 10.1186/1471-2105-10-S2-S3

    View details for Web of Science ID 000265602500004

    View details for PubMedID 19208191

  • Ontology-driven indexing of public datasets for translational bioinformatics BMC BIOINFORMATICS Shah, N. H., Jonquet, C., Chiang, A. P., Butte, A. J., Chen, R., Musen, M. A. 2009; 10


    The volume of publicly available genomic scale data is increasing. Genomic datasets in public repositories are annotated with free-text fields describing the pathological state of the studied sample. These annotations are not mapped to concepts in any ontology, making it difficult to integrate these datasets across repositories. We have previously developed methods to map text-annotations of tissue microarrays to concepts in the NCI thesaurus and SNOMED-CT. In this work we generalize our methods to map text annotations of gene expression datasets to concepts in the UMLS. We demonstrate the utility of our methods by processing annotations of datasets in the Gene Expression Omnibus. We demonstrate that we enable ontology-based querying and integration of tissue and gene expression microarray data. We enable identification of datasets on specific diseases across both repositories. Our approach provides the basis for ontology-driven data integration for translational research on gene and protein expression data. Based on this work we have built a prototype system for ontology based annotation and indexing of biomedical data. The system processes the text metadata of diverse resource elements such as gene expression data sets, descriptions of radiology images, clinical-trial reports, and PubMed article abstracts to annotate and index them with concepts from appropriate ontologies. The key functionality of this system is to enable users to locate biomedical data resources related to particular ontology concepts.

    View details for DOI 10.1186/1471-2105-10-S2-S1

    View details for Web of Science ID 000265602500002

    View details for PubMedID 19208184

  • Creating mappings for ontologies in biomedicine: simple methods work. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Ghazvinian, A., Noy, N. F., Musen, M. A. 2009; 2009: 198-202


    Creating mappings between concepts in different ontologies is a critical step in facilitating data integration. In recent years, researchers have developed many elaborate algorithms that use graph structure, background knowledge, machine learning and other techniques to generate mappings between ontologies. We compared the performance of these advanced algorithms on creating mappings for biomedical ontologies with the performance of a simple mapping algorithm that relies on lexical matching. Our evaluation has shown that (1) most of the advanced algorithms are either not publicly available or do not scale to the size of biomedical ontologies today, and (2) for many biomedical ontologies, simple lexical matching methods outperform most of the advanced algorithms in both precision and recall. Our results have practical implications for biomedical researchers who need to create alignments for their ontologies.

    View details for PubMedID 20351849

  • Semantic Wiki Search SEMANTIC WEB: RESEARCH AND APPLICATIONS Haase, P., Herzig, D., Musen, M., Tran, T. 2009; 5554: 445-460
  • What Four Million Mappings Can Tell You about Two Hundred Ontologies SEMANTIC WEB - ISWC 2009, PROCEEDINGS Ghazvinian, A., Noy, N. F., Jonquet, C., Shah, N., Musen, M. A. 2009; 5823: 229-242
  • A Bayesian network model for analysis of detection performance in surveillance systems. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Izadi, M., Buckeridge, D., Okhmatovskaia, A., Tu, S. W., O'Connor, M. J., Nyulas, C., Musen, M. A. 2009; 2009: 276-280


    Worldwide developments concerning infectious diseases and bioterrorism are driving forces for improving aberrancy detection in public health surveillance. The performance of an aberrancy detection algorithm can be measured in terms of sensitivity, specificity and timeliness. However, these metrics are probabilistically dependent variables and there is always a trade-off between them. This situation raises the question of how to quantify this tradeoff. The answer to this question depends on the characteristics of the specific disease under surveillance, the characteristics of data used for surveillance, and the algorithmic properties of detection methods. In practice, the evidence describing the relative performance of different algorithms remains fragmented and mainly qualitative. In this paper, we consider the development and evaluation of a Bayesian network framework for analysis of performance measures of aberrancy detection algorithms. This framework enables principled comparison of algorithms and identification of suitable algorithms for use in specific public health surveillance settings.

    View details for PubMedID 20351864

  • Comparison of concept recognizers for building the Open Biomedical Annotator BMC BIOINFORMATICS Shah, N. H., Bhatia, N., Jonquet, C., Rubin, D., Chiang, A. P., Musen, M. A. 2009; 10


    The National Center for Biomedical Ontology (NCBO) is developing a system for automated, ontology-based access to online biomedical resources (Shah NH, et al.: Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics 2009, 10(Suppl 2):S1). The system's indexing workflow processes the text metadata of diverse resources such as datasets from GEO and ArrayExpress to annotate and index them with concepts from appropriate ontologies. This indexing requires the use of a concept-recognition tool to identify ontology concepts in the resource's textual metadata. In this paper, we present a comparison of two concept recognizers - NLM's MetaMap and the University of Michigan's Mgrep. We utilize a number of data sources and dictionaries to evaluate the concept recognizers in terms of precision, recall, speed of execution, scalability and customizability. Our evaluations demonstrate that Mgrep has a clear edge over MetaMap for large-scale service oriented applications. Based on our analysis we also suggest areas of potential improvements for Mgrep. We have subsequently used Mgrep to build the Open Biomedical Annotator service. The Annotator service has access to a large dictionary of biomedical terms derived from the United Medical Language System (UMLS) and NCBO ontologies. The Annotator also leverages the hierarchical structure of the ontologies and their mappings to expand annotations. The Annotator service is available to the community as a REST Web service for creating ontology-based annotations of their data.

    View details for DOI 10.1186/1471-2105-10-S9-S14

    View details for Web of Science ID 000270371700015

    View details for PubMedID 19761568

  • The open biomedical annotator. Summit on translational bioinformatics Jonquet, C., Shah, N. H., Musen, M. A. 2009; 2009: 56-60


    The range of publicly available biomedical data is enormous and is expanding fast. This expansion means that researchers now face a hurdle to extracting the data they need from the large numbers of data that are available. Biomedical researchers have turned to ontologies and terminologies to structure and annotate their data with ontology concepts for better search and retrieval. However, this annotation process cannot be easily automated and often requires expert curators. Plus, there is a lack of easy-to-use systems that facilitate the use of ontologies for annotation. This paper presents the Open Biomedical Annotator (OBA), an ontology-based Web service that annotates public datasets with biomedical ontology concepts based on their textual metadata ( The biomedical community can use the annotator service to tag datasets automatically with ontology terms (from UMLS and NCBO BioPortal ontologies). Such annotations facilitate translational discoveries by integrating annotated data.[1].

    View details for PubMedID 21347171

  • Understanding Detection Performance in Public Health Surveillance: Modeling Aberrancy-detection Algorithms JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Buckeridge, D. L., Okhmatovskaia, A., Tu, S., O'Connor, M., Nyulas, C., Musen, M. A. 2008; 15 (6): 760-769


    Statistical aberrancy-detection algorithms play a central role in automated public health systems, analyzing large volumes of clinical and administrative data in real-time with the goal of detecting disease outbreaks rapidly and accurately. Not all algorithms perform equally well in terms of sensitivity, specificity, and timeliness in detecting disease outbreaks and the evidence describing the relative performance of different methods is fragmented and mainly qualitative.We developed and evaluated a unified model of aberrancy-detection algorithms and a software infrastructure that uses this model to conduct studies to evaluate detection performance. We used a task-analytic methodology to identify the common features and meaningful distinctions among different algorithms and to provide an extensible framework for gathering evidence about the relative performance of these algorithms using a number of evaluation metrics. We implemented our model as part of a modular software infrastructure (Biological Space-Time Outbreak Reasoning Module, or BioSTORM) that allows configuration, deployment, and evaluation of aberrancy-detection algorithms in a systematic manner.We assessed the ability of our model to encode the commonly used EARS algorithms and the ability of the BioSTORM software to reproduce an existing evaluation study of these algorithms.Using our unified model of aberrancy-detection algorithms, we successfully encoded the EARS algorithms, deployed these algorithms using BioSTORM, and were able to reproduce and extend previously published evaluation results.The validated model of aberrancy-detection algorithms and its software implementation will enable principled comparison of algorithms, synthesis of results from evaluation studies, and identification of surveillance algorithms for use in specific public health settings.

    View details for DOI 10.1197/jamia.M2799

    View details for Web of Science ID 000260905500008

    View details for PubMedID 18755992

  • Reports of the AAAI 2008 Spring Symposia AI MAGAZINE Balduccini, M., Baral, C., Brodaric, B., Colton, S., Fox, P., Gutelius, D., Hinkelmann, K., Horswill, I., Huberman, B., Hudlicka, E., Lerman, K., Lisetti, C., McGuinness, D., Maher, M. L., Musen, M. A., Sahami, M., Sleeman, D., Thoenssen, B., Velasquez, J., Ventura, D. 2008; 29 (3): 107-115
  • Network analysis of intrinsic functional brain connectivity in Alzheimer's disease PLOS COMPUTATIONAL BIOLOGY Supekar, K., Menon, V., Rubin, D., Musen, M., Greicius, M. D. 2008; 4 (6)


    Functional brain networks detected in task-free ("resting-state") functional magnetic resonance imaging (fMRI) have a small-world architecture that reflects a robust functional organization of the brain. Here, we examined whether this functional organization is disrupted in Alzheimer's disease (AD). Task-free fMRI data from 21 AD subjects and 18 age-matched controls were obtained. Wavelet analysis was applied to the fMRI data to compute frequency-dependent correlation matrices. Correlation matrices were thresholded to create 90-node undirected-graphs of functional brain networks. Small-world metrics (characteristic path length and clustering coefficient) were computed using graph analytical methods. In the low frequency interval 0.01 to 0.05 Hz, functional brain networks in controls showed small-world organization of brain activity, characterized by a high clustering coefficient and a low characteristic path length. In contrast, functional brain networks in AD showed loss of small-world properties, characterized by a significantly lower clustering coefficient (p<0.01), indicative of disrupted local connectivity. Clustering coefficients for the left and right hippocampus were significantly lower (p<0.01) in the AD group compared to the control group. Furthermore, the clustering coefficient distinguished AD participants from the controls with a sensitivity of 72% and specificity of 78%. Our study provides new evidence that there is disrupted organization of functional brain networks in AD. Small-world metrics can characterize the functional organization of the brain in AD, and our findings further suggest that these network measures may be useful as an imaging-based biomarker to distinguish AD from healthy aging.

    View details for DOI 10.1371/journal.pcbi.1000100

    View details for Web of Science ID 000259786700013

    View details for PubMedID 18584043

  • iTools: A Framework for Classification, Categorization and Integration of Computational Biology Resources PLOS ONE Dinov, I. D., Rubin, D., Lorensen, W., Dugan, J., Ma, J., Murphy, S., Kirschner, B., Bug, W., Sherman, M., Floratos, A., Kennedy, D., Jagadish, H. V., Schmidt, J., Athey, B., Califano, A., Musen, M., Altman, R., Kikinis, R., Kohane, I., Delp, S., Parker, D. S., Toga, A. W. 2008; 3 (5)


    The advancement of the computational biology field hinges on progress in three fundamental directions--the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources--data, software tools and web-services. The iTools design, implementation and resource meta-data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page

    View details for DOI 10.1371/journal.pone.0002265

    View details for Web of Science ID 000262268500012

    View details for PubMedID 18509477

  • A prototype symbolic model of canonical functional neuroanatomy of the motor system JOURNAL OF BIOMEDICAL INFORMATICS Talos, I., Rubin, D. L., Halle, M., Musen, M., Kikinis, R. 2008; 41 (2): 251-263


    Recent advances in bioinformatics have opened entire new avenues for organizing, integrating and retrieving neuroscientific data, in a digital, machine-processable format, which can be at the same time understood by humans, using ontological, symbolic data representations. Declarative information stored in ontological format can be perused and maintained by domain experts, interpreted by machines, and serve as basis for a multitude of decision support, computerized simulation, data mining, and teaching applications. We have developed a prototype symbolic model of canonical neuroanatomy of the motor system. Our symbolic model is intended to support symbolic look up, logical inference and mathematical modeling by integrating descriptive, qualitative and quantitative functional neuroanatomical knowledge. Furthermore, we show how our approach can be extended to modeling impaired brain connectivity in disease states, such as common movement disorders. In developing our ontology, we adopted a disciplined modeling approach, relying on a set of declared principles, a high-level schema, Aristotelian definitions, and a frame-based authoring system. These features, along with the use of the Unified Medical Language System (UMLS) vocabulary, enable the alignment of our functional ontology with an existing comprehensive ontology of human anatomy, and thus allow for combining the structural and functional views of neuroanatomy for clinical decision support and neuroanatomy teaching applications. Although the scope of our current prototype ontology is limited to a particular functional system in the brain, it may be possible to adapt this approach for modeling other brain functional systems as well.

    View details for DOI 10.1016/j.jbi.2007.11.003

    View details for Web of Science ID 000255360000005

    View details for PubMedID 18164666

  • Developing biomedical ontologies collaboratively. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Noy, N. F., Tudorache, T., de Coronado, S., Musen, M. A. 2008: 520-524


    The development of ontologies that define entities and relationships among them has become essential for modern work in biomedicine. Ontologies are becoming so large in their coverage that no single centralized group of people can develop them effectively and ontology development becomes a community-based enterprise. In this paper we present Collaborative Protégé-a prototype tool that supports many aspects of community-based development, such as discussions integrated with ontology-editing process, chats, and annotation of changes. We have evaluated Collaborative Protégé in the context of the NCI Thesaurus development. Users have found the tool effective for carrying out discussions and recording design rationale.

    View details for PubMedID 18998901

  • Collecting Community-Based Mappings in an Ontology Repository SEMANTIC WEB - ISWC 2008 Noy, N. F., Griffith, N., Musen, M. A. 2008; 5318: 371-386
  • Supporting Collaborative Ontology Development in Protege SEMANTIC WEB - ISWC 2008 Tudorache, T., Noy, N. F., Tu, S., Musen, M. A. 2008; 5318: 17-32
  • A system for ontology-based annotation of biomedical data DATA INTEGRATION IN THE LIFE SCIENCES, PROCEEDINGS Jonquet, C., Musen, M. A., Shah, N. 2008; 5109: 144-152
  • A Generic Ontology for Collaborative Ontology-Development Workflows KNOWLEDGE ENGINEERING: PRACTICE AND PATTERNS, PROCEEDINGS Sebastian, A., Noy, N. F., Tudorache, T., Musen, M. A. 2008; 5268: 318-328
  • Calling on a million minds for community annotation in WikiProteins GENOME BIOLOGY Mons, B., Ashburner, M., Chichester, C., van Mulligen, E., Weeber, M., den Dunnen, J., van Ommen, G., Musen, M., Cockerill, M., Hermjakob, H., Mons, A., Packer, A., Pacheco, R., Lewis, S., Berkeley, A., Melton, W., Barris, N., Wales, J., Meijssen, G., Moeller, E., Roes, P. J., Borner, K., Bairoch, A. 2008; 9 (5)


    WikiProteins enables community annotation in a Wiki-based system. Extracts of major data sources have been fused into an editable environment that links out to the original sources. Data from community edits create automatic copies of the original data. Semantic technology captures concepts co-occurring in one sentence and thus potential factual statements. In addition, indirect associations between concepts have been calculated. We call on a 'million minds' to annotate a 'million concepts' and to collect facts from the literature with the reward of collaborative knowledge discovery. The system is available for beta testing at

    View details for DOI 10.1186/gb-2008-9-5-r89

    View details for Web of Science ID 000257564800019

    View details for PubMedID 18507872

  • Predicting outbreak detection in public health surveillance: quantitative analysis to enable evidence-based method selection. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Buckeridge, D. L., Okhmatovskaia, A., Tu, S., O'Connor, M., Nyulas, C., Musen, M. A. 2008: 76-80


    Public health surveillance is critical for accurate and timely outbreak detection and effective epidemic control. A wide range of statistical algorithms is used for surveillance, and important differences have been noted in the ability of these algorithms to detect outbreaks. The evidence about the relative performance of these algorithms, however, remains limited and mainly qualitative. Using simulated outbreak data, we developed and validated quantitative models for predicting the ability of commonly used surveillance algorithms to detect different types of outbreaks. The developed models accurately predict the ability of different algorithms to detect different types of outbreaks. These models enable evidence-based algorithm selection and can guide research into algorithm development.

    View details for PubMedID 18999264

  • BioPortal: ontologies and data resources with the click of a mouse. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Musen, M. A., Shah, N. H., Noy, N. F., Dai, B. Y., Dorf, M., Griffith, N., Buntrok, J., Jonquet, C., Montegut, M. J., Rubin, D. L. 2008: 1223-1224

    View details for PubMedID 18999306

  • Representing the NCI Thesaurus in OWL DL: Modeling tools help modeling languages. Applied ontology Noy, N. F., de Coronado, S., Solbrig, H., Fragoso, G., Hartel, F. W., Musen, M. A. 2008; 3 (3): 173-190


    The National Cancer Institute's (NCI) Thesaurus is a biomedical reference ontology. The NCI Thesaurus is represented using Description Logic, more specifically Ontylog, a Description logic implemented by Apelon, Inc. We are exploring the use of the DL species of the Web Ontology Language (OWL DL)-a W3C recommended standard for ontology representation-instead of Ontylog for representing the NCI Thesaurus. We have studied the requirements for knowledge representation of the NCI Thesaurus, and considered how OWL DL (and its implementation in Protégé-OWL) satisfies these requirements. In this paper, we discuss the areas where OWL DL was sufficient for representing required components, where tool support that would hide some of the complexity and extra levels of indirection would be required, and where language expressiveness is not sufficient given the representation requirements. Because many of the knowledge-representation issues that we encountered are very similar to the issues in representing other biomedical terminologies and ontologies in general, we believe that the lessons that we learned and the approaches that we developed will prove useful and informative for other researchers.

    View details for PubMedID 19789731

  • Comparison of ontology-based semantic-similarity measures. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Lee, W., Shah, N., Sundlass, K., Musen, M. 2008: 384-388


    Semantic-similarity measures quantify concept similarities in a given ontology. Potential applications for these measures include search, data mining, and knowledge discovery in database or decision-support systems that utilize ontologies. To date, there have not been comparisons of the different semantic-similarity approaches on a single ontology. Such a comparison can offer insight on the validity of different approaches. We compared 3 approaches to semantic similarity-metrics (which rely on expert opinion, ontologies only, and information content) with 4 metrics applied to SNOMED-CT. We found that there was poor agreement among those metrics based on information content with the ontology only metric. The metric based only on the ontology structure correlated most with expert opinion. Our results suggest that metrics based on the ontology only may be preferable to information-content-based metrics, and point to the need for more research on validating the different approaches.

    View details for PubMedID 18999312

  • UMLS-Query: a perl module for querying the UMLS. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Shah, N. H., Muse, M. A. 2008: 652-656


    The Metathesaurus from the Unified Medical Language System (UMLS) is a widely used ontology resource, which is mostly used in a relational database form for terminology research, mapping and information indexing. A significant section of UMLS users use a MySQL installation of the metathesaurus and Perl programming language as their access mechanism. We describe UMLS-Query, a Perl module that provides functions for retrieving concept identifiers, mapping text-phrases to Metathesaurus concepts and graph traversal in the Metathesaurus stored in a MySQL database. UMLS-Query can be used to build applications for semi-automated sample annotation, terminology based browsers for tissue sample databases and for terminology research. We describe the results of such uses of UMLS-Query and present the module for others to use.

    View details for PubMedID 18998805

  • Protege: A tool for managing and using terminology in radiology applications JOURNAL OF DIGITAL IMAGING Rubin, D. L., Noy, N. F., Musen, M. A. 2007; 20: 34-46


    The development of standard terminologies such as RadLex is becoming important in radiology applications, such as structured reporting, teaching file authoring, report indexing, and text mining. The development and maintenance of these terminologies are challenging, however, because there are few specialized tools to help developers to browse, visualize, and edit large taxonomies. Protégé ( ) is an open-source tool that allows developers to create and to manage terminologies and ontologies. It is more than a terminology-editing tool, as it also provides a platform for developers to use the terminologies in end-user applications. There are more than 70,000 registered users of Protégé who are using the system to manage terminologies and ontologies in many different domains. The RadLex project has recently adopted Protégé for managing its radiology terminology. Protégé provides several features particularly useful to managing radiology terminologies: an intuitive graphical user interface for navigating large taxonomies, visualization components for viewing complex term relationships, and a programming interface so developers can create terminology-driven radiology applications. In addition, Protégé has an extensible plug-in architecture, and its large user community has contributed a rich library of components and extensions that provide much additional useful functionalities. In this report, we describe Protégé's features and its particular advantages in the radiology domain in the creation, maintenance, and use of radiology terminology.

    View details for DOI 10.1007/s10278-007-9065-0

    View details for Web of Science ID 000250825300004

    View details for PubMedID 17687607

  • The SAGE guideline model: Achievements and overview JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Ru, S. W., Campbell, J. R., Glasgow, J., Nyman, M. A., McClure, R., McClay, J., Parker, C., Hrabak, K. M., Berg, D., Weida, T., Mansfield, J. G., Musen, M. A., Abarbanel, R. M. 2007; 14 (5): 589-598


    The SAGE (Standards-Based Active Guideline Environment) project was formed to create a methodology and infrastructure required to demonstrate integration of decision-support technology for guideline-based care in commercial clinical information systems. This paper describes the development and innovative features of the SAGE Guideline Model and reports our experience encoding four guidelines. Innovations include methods for integrating guideline-based decision support with clinical workflow and employment of enterprise order sets. Using SAGE, a clinician informatician can encode computable guideline content as recommendation sets using only standard terminologies and standards-based patient information models. The SAGE Model supports encoding large portions of guideline knowledge as re-usable declarative evidence statements and supports querying external knowledge sources.

    View details for DOI 10.1197/jamia.M2399

    View details for Web of Science ID 000249769700007

    View details for PubMedID 17600098

  • Annotation and query of tissue microarray data using the NCI Thesaurus BMC BIOINFORMATICS Shah, N. H., Rubin, D. L., Espinosa, I., Montgomery, K., Musen, M. A. 2007; 8


    The Stanford Tissue Microarray Database (TMAD) is a repository of data serving a consortium of pathologists and biomedical researchers. The tissue samples in TMAD are annotated with multiple free-text fields, specifying the pathological diagnoses for each sample. These text annotations are not structured according to any ontology, making future integration of this resource with other biological and clinical data difficult.We developed methods to map these annotations to the NCI thesaurus. Using the NCI-T we can effectively represent annotations for about 86% of the samples. We demonstrate how this mapping enables ontology driven integration and querying of tissue microarray data. We have deployed the mapping and ontology driven querying tools at the TMAD site for general use.We have demonstrated that we can effectively map the diagnosis-related terms describing a sample in TMAD to the NCI-T. The NCI thesaurus terms have a wide coverage and provide terms for about 86% of the samples. In our opinion the NCI thesaurus can facilitate integration of this resource with other biological data.

    View details for DOI 10.1186/1471-2105-8-296

    View details for Web of Science ID 000249734300001

    View details for PubMedID 17686183

  • OBO to OWL: a protege OWL tab to read/save OBO ontologies BIOINFORMATICS Moreira, D. A., Musen, M. A. 2007; 23 (14): 1868-1870


    The Open Biomedical Ontologies (OBO) format from the GO consortium is a very successful format for biomedical ontologies, including the Gene Ontology. But it lacks formal computational definitions for its constructs and tools, like DL reasoners, to facilitate ontology development/maintenance. We describe the OBO Converter, a Java tool to convert files from OBO format to Web Ontology Language (OWL) (and vice versa) that can also be used as a Protégé Tab plug-in. It uses the OBO to OWL mapping provided by the National Center for Biomedical Ontologies (NCBO) (a joint effort of OBO developers and OWL experts) and offers options to ease the task of saving/reading files in both data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btm258

    View details for Web of Science ID 000249248300030

    View details for PubMedID 17496317

  • Using semantic dependencies for consistency management of an ontology of brain-cortex anatomy ARTIFICIAL INTELLIGENCE IN MEDICINE Dameron, O., Musen, M. A., Gibaud, B. 2007; 39 (3): 217-225


    In the context of the Semantic Web, ontologies have to be usable by software agents as well as by humans. Therefore, they must meet explicit representation and consistency requirements. This article describes a method for managing the semantic consistency of an ontology of brain-cortex anatomy.The methodology relies on the explicit identification of the relationship properties and of the dependencies that might exist among concepts or relationships. These dependencies have to be respected for insuring the semantic consistency of the model. We propose a method for automatically generating all the dependent items. As a consequence, knowledge base updates are easier and safer.Our approach is composed of three main steps: (1) providing a realistic representation, (2) ensuring the intrinsic consistency of the model and (3) checking its incremental consistency. The corner stone of ontological modeling lies in the expressiveness of the model and in the sound principles that structure it. This part defines the ideal possibilities of the ontology and is called realism of representation. Regardless of how well a model represents reality, the intrinsic consistency of a model corresponds to its lack of contradiction. This step is particularly important as soon as dependencies between relationships or concepts have to be fulfilled. Eventually, the incremental consistency encompasses the respect of the two previous criteria during the successive updates of the ontology.The explicit representation of dependencies among concepts and relationships in an ontology can be helpfully used to assist in the management of the knowledge base and to ensure the model's semantic consistency.

    View details for DOI 10.1016/j.artmed.2006.09.004

    View details for Web of Science ID 000246657200004

    View details for PubMedID 17254759

  • Technology for Building Intelligent Systems: From Psychology to Engineering MODELING COMPLEX SYSTEMS Musen, M. A. 2007; 52: 145-184

    View details for Web of Science ID 000248483200006

    View details for PubMedID 17682334

  • Searching Ontologies Based on Content: Experiments in the Biomedical Domain K-CAP'07: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE Alani, H., Noy, N. F., Shah, N., Shadbolt, N., Musen, M. A. 2007: 55-62
  • Document-oriented views of guideline knowledge bases ARTIFICIAL INTELLIGENCE IN MEDICINE, PROCEEDINGS Tu, S. W., Condamoor, S., Mather, T., Hall, R., Jones, N., Musen, M. A. 2007; 4594: 431-440
  • Using semantic web technologies for knowledge-driven querying of biomedical data ARTIFICIAL INTELLIGENCE IN MEDICINE, PROCEEDINGS O'Connor, M., Shankar, R., Tu, S., Nyulas, C., Parrish, D., Musen, M., Das, A. 2007; 4594: 267-276
  • Efficiently querying relational databases using OWL and SWRL WEB REASONING AND RULE SYSTEMS, PROCEEDINGS O'Connor, M., Shankar, R., Tu, S., Nyulas, C., Das, A., Musen, M. 2007; 4524: 361-363
  • Querying the semantic web with SWRL ADVANCES IN RULE INTERCHANGE AND APPLICATIONS, PROCEEDINGS O'Connor, M., Tu, S., Nyulas, C., Das, A., Musen, M. 2007; 4824: 155-159
  • Knowledge Zone: A Public Repository of Peer-Reviewed Biomedical Ontologies MEDINFO 2007: PROCEEDINGS OF THE 12TH WORLD CONGRESS ON HEALTH (MEDICAL) INFORMATICS, PTS 1 AND 2 Supekar, K., Rubin, D., Noy, N., Musen, M. 2007; 129: 812-816


    Reuse of ontologies is important for achieving better interoperability among health systems and relieving knowledge engineers from the burden of developing ontologies from scratch. Most of the work that aims to facilitate ontology reuse has focused on building ontology libraries that are simple repositories of ontologies or has led to keyword-based search tools that search among ontologies. To our knowledge, there are no operational methodologies that allow users to evaluate ontologies and to compare them in order to choose the most appropriate ontology for their task. In this paper, we present, Knowledge Zone - a Web-based portal that allows users to submit their ontologies, to associate metadata with their ontologies, to search for existing ontologies, to find ontology rankings based on user reviews, to post their own reviews, and to rate reviews.

    View details for Web of Science ID 000272064000163

    View details for PubMedID 17911829

  • Interpretation errors related to the GO annotation file format. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Moreira, D. A., Shah, N. H., Musen, M. A. 2007: 538-542


    The Gene Ontology (GO) is the most widely used ontology for creating biomedical annotations. GO annotations are statements associating a biological entity with a GO term. These statements comprise a large dataset of biological knowledge that is used widely in biomedical research. GO Annotations are available as "gene association files" from the GO website in a tab-delimited file format (GO Annotation File Format) composed of rows of 15 tab-delimited fields. This simple format lacks the knowledge representation (KR) capabilities to represent unambiguously semantic relationships between each field. This paper demonstrates that this KR shortcoming leads users to interpret the files in ways that can be erroneous. We propose a complementary format to represent GO annotation files as knowledge bases using the W3C recommended Web Ontology Language (OWL).

    View details for PubMedID 18693894

  • Evaluating detection of an inhalational anthrax outbreak EMERGING INFECTIOUS DISEASES Buckeridge, D. L., Owens, D. K., Switzer, P., Frank, J., Musen, M. A. 2006; 12 (12): 1942-1949


    Timely detection of an inhalational anthrax outbreak is critical for clinical and public health management. Syndromic surveillance has received considerable investment, but little is known about how it will perform relative to routine clinical case finding for detection of an inhalational anthrax outbreak. We conducted a simulation study to compare clinical case finding with syndromic surveillance for detection of an outbreak of inhalational anthrax. After simulated release of 1 kg of anthrax spores, the proportion of outbreaks detected first by syndromic surveillance was 0.59 at a specificity of 0.9 and 0.28 at a specificity of 0.975. The mean detection benefit of syndromic surveillance was 1.0 day at a specificity of 0.9 and 0.32 days at a specificity of 0.975. When syndromic surveillance was sufficiently sensitive to detect a substantial proportion of outbreaks before clinical case finding, it generated frequent false alarms.

    View details for Web of Science ID 000242301900022

    View details for PubMedID 17326949

  • Using ontologies linked with geometric models to reason about penetrating injuries ARTIFICIAL INTELLIGENCE IN MEDICINE Rubin, D. L., Dameron, O., Bashir, Y., Grossman, D., Dev, P., Musen, M. A. 2006; 37 (3): 167-176


    Medical assessment of penetrating injuries is a difficult and knowledge-intensive task, and rapid determination of the extent of internal injuries is vital for triage and for determining the appropriate treatment. Physical examination and computed tomographic (CT) imaging data must be combined with detailed anatomic, physiologic, and biomechanical knowledge to assess the injured subject. We are developing a methodology to automate reasoning about penetrating injuries using canonical knowledge combined with specific subject image data.In our approach, we build a three-dimensional geometric model of a subject from segmented images. We link regions in this model to entities in two knowledge sources: (1) a comprehensive ontology of anatomy containing organ identities, adjacencies, and other information useful for anatomic reasoning and (2) an ontology of regional perfusion containing formal definitions of arterial anatomy and corresponding regions of perfusion. We created computer reasoning services ("problem solvers") that use the ontologies to evaluate the geometric model of the subject and deduce the consequences of penetrating injuries.We developed and tested our methods using data from the Visible Human. Our problem solvers can determine the organs that are injured given particular trajectories of projectiles, whether vital structures--such as a coronary artery--are injured, and they can predict the propagation of injury ensuing after vital structures are injured.We have demonstrated the capability of using ontologies with medical images to support computer reasoning about injury based on those images. Our methodology demonstrates an approach to creating intelligent computer applications that reason with image data, and it may have value in helping practitioners in the assessment of penetrating injury.

    View details for DOI 10.1016/j.artmed.2006.03.006

    View details for Web of Science ID 000238992500002

    View details for PubMedID 16730959

  • National Center for Biomedical Ontology: Advancing biomedicine through structured organization of scientific knowledge OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY Rubin, D. L., Lewis, S. E., Mungall, C. J., Misra, S., Westerfield, M., Ashburner, M., Sim, I., Chute, C. G., Solbrig, H., Storey, M., Smith, B., Day-Richter, J., Noy, N. F., Musen, M. A. 2006; 10 (2): 185-198


    The National Center for Biomedical Ontology is a consortium that comprises leading informaticians, biologists, clinicians, and ontologists, funded by the National Institutes of Health (NIH) Roadmap, to develop innovative technology and methods that allow scientists to record, manage, and disseminate biomedical information and knowledge in machine-processable form. The goals of the Center are (1) to help unify the divergent and isolated efforts in ontology development by promoting high quality open-source, standards-based tools to create, manage, and use ontologies, (2) to create new software tools so that scientists can use ontologies to annotate and analyze biomedical data, (3) to provide a national resource for the ongoing evaluation, integration, and evolution of biomedical ontologies and associated tools and theories in the context of driving biomedical projects (DBPs), and (4) to disseminate the tools and resources of the Center and to identify, evaluate, and communicate best practices of ontology development to the biomedical community. Through the research activities within the Center, collaborations with the DBPs, and interactions with the biomedical community, our goal is to help scientists to work more effectively in the e-science paradigm, enhancing experiment design, experiment execution, data analysis, information synthesis, hypothesis generation and testing, and understand human disease.

    View details for Web of Science ID 000240210900015

    View details for PubMedID 16901225

  • Use of declarative statements in creating and maintaining computer-interpretable knowledge bases for guideline-based care. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Tu, S. W., Hrabak, K. M., Campbell, J. R., Glasgow, J., Nyman, M. A., McClure, R., McClay, J., Abarbanel, R., Mansfield, J. G., Martins, S. M., Goldstein, M. K., Musen, M. A. 2006: 784-788


    Developing computer-interpretable clinical practice guidelines (CPGs) to provide decision support for guideline-based care is an extremely labor-intensive task. In the EON/ATHENA and SAGE projects, we formulated substantial portions of CPGs as computable statements that express declarative relationships between patient conditions and possible interventions. We developed query and expression languages that allow a decision-support system (DSS) to evaluate these statements in specific patient situations. A DSS can use these guideline statements in multiple ways, including: (1) as inputs for determining preferred alternatives in decision-making, and (2) as a way to provide targeted commentaries in the clinical information system. The use of these declarative statements significantly reduces the modeling expertise and effort required to create and maintain computer-interpretable knowledge bases for decision-support purpose. We discuss possible implications for sharing of such knowledge bases.

    View details for PubMedID 17238448

  • A framework for ontology evolution in collaborative environments SEMANTIC WEB - ISEC 2006, PROCEEDINGS Noy, N. F., Chugh, A., Liu, W., Musen, M. A. 2006; 4273: 544-558
  • Ontology-based annotation and query of tissue microarray data. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Shah, N. H., Rubin, D. L., Supekar, K. S., Musen, M. A. 2006: 709-713


    The Stanford Tissue Microarray Database (TMAD) is a repository of data amassed by a consortium of pathologists and biomedical researchers. The TMAD data are annotated with multiple free-text fields, specifying the pathological diagnoses for each tissue sample. These annotations are spread out over multiple text fields and are not structured according to any ontology, making it difficult to integrate this resource with other biological and clinical data. We developed methods to map these annotations to the NCI thesaurus and the SNOMED-CT ontologies. Using these two ontologies we can effectively represent about 80% of the annotations in a structured manner. This mapping offers the ability to perform ontology driven querying of the TMAD data. We also found that 40% of annotations can be mapped to terms from both ontologies, providing the potential to align the two ontologies based on experimental data. Our approach provides the basis for a data-driven ontology alignment by mapping annotations of experimental data.

    View details for PubMedID 17238433

  • Ontology-based representation of simulation models of physiology. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Rubin, D. L., Grossman, D., Neal, M., Cook, D. L., Bassingthwaighte, J. B., Musen, M. A. 2006: 664-668


    Dynamic simulation models of physiology are often represented as a set of mathematical equations. Such models are very useful for studying and understanding the dynamic behavior of physiological variables. However, the sheer number of equations and variables can make these models unwieldy, difficult to under-stand, and challenging to maintain. We describe a symbolic, ontologically-guided methodology for representing a physiological model of the circulation. We created an ontology describing the types of equations in the model as well as the anatomic components and how they are connected to form a circulatory loop. The ontology provided an explicit representation of the model, both its mathematical and anatomic content, abstracting and hiding much of the mathematical complexity. The ontology also provided a framework to construct a graphical representation of the model, providing a simpler visualization than the large set of mathematical equations. Our approach may help model builders to maintain, debug, and extend simulation models.

    View details for PubMedID 17238424

  • Wrestling with SUMO and bio-ontologies NATURE BIOTECHNOLOGY Musen, M. A., Lewis, S., Smith, B. 2006; 24 (1): 21-21

    View details for Web of Science ID 000234555800010

    View details for PubMedID 16404381

  • Wrestling with SUMO and bio-ontologies. Nature biotechnology Stoeckert, C., Ball, C., Brazma, A., Brinkman, R., Causton, H., Fan, L., Fostel, J., Fragoso, G., Heiskanen, M., Holstege, F., Morrison, N., Parkinson, H., Quackenbush, J., Rocca-Serra, P., Sansone, S. A., Sarkans, U., Sherlock, G., Stevens, R., Taylor, C., Taylor, R., Whetzel, P., White, J. 2006; 24 (1): 21-2; author reply 23

    View details for PubMedID 16404382

  • Identifying barriers to hypertension guideline adherence using clinician feedback at the point of care. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Lin, N. D., Martins, S. B., Chan, A. S., Coleman, R. W., Bosworth, H. B., Oddone, E. Z., Shankar, R. D., Musen, M. A., Hoffman, B. B., Goldstein, M. K. 2006: 494-498


    Factors contributing to low adherence to clinical guidelines by clinicians are not well understood. The user interface of ATHENA-HTN, a guideline-based decision support system (DSS) for hypertension, presents a novel opportunity to collect clinician feedback on recommendations displayed at the point of care. We analyzed feedback from 46 clinicians who received ATHENA advisories as part of a 15-month randomized trial to identify potential reasons clinicians may not intensify hypertension therapy when it is recommended. Among the 368 visits for which feedback was provided, clinicians commonly reported they did not follow recommendations because: recorded blood pressure was not representative of the patient's typical blood pressure; hypertension was not a clinical priority for the visit; or patients were nonadherent to medications. For many visits, current quality-assurance algorithms may incorrectly identify clinically appropriate decisions as guideline nonadherent due to incomplete capture of relevant information. We present recommendations for how automated DSSs may help identify "apparent" barriers and better target decision support.

    View details for PubMedID 17238390

  • Ontology-centered syndromic surveillance for bioterrorism IEEE INTELLIGENT SYSTEMS Crubezy, M., O'Connor, M., Pincus, Z., Musen, M. A., Buckeridge, D. L. 2005; 20 (5): 26-35
  • An evaluation model for syndromic surveillance: assessing the performance of a temporal algorithm. MMWR. Morbidity and mortality weekly report Buckeridge, D. L., Switzer, P., Owens, D., SIEGRIST, D., Pavlin, J., Musen, M. 2005; 54: 109-115


    Syndromic surveillance offers the potential to rapidly detect outbreaks resulting from terrorism. Despite considerable experience with implementing syndromic surveillance, limited evidence exists to describe the performance of syndromic surveillance systems in detecting outbreaks.To describe a model for simulating cases that might result from exposure to inhalational anthrax and then use the model to evaluate the ability of syndromic surveillance to detect an outbreak of inhalational anthrax after an aerosol release.Disease progression and health-care use were simulated for persons infected with anthrax. Simulated cases were then superimposed on authentic surveillance data to create test data sets. A temporal outbreak detection algorithm was applied to each test data set, and sensitivity and timeliness of outbreak detection were calculated by using syndromic surveillance.The earliest detection using a temporal algorithm was 2 days after a release. Earlier detection tended to occur when more persons were infected, and performance worsened as the proportion of persons seeking care in the prodromal disease state declined. A shorter median incubation state led to earlier detection, as soon as 1 day after release when the incubation state was < or =5 days.Syndromic surveillance of a respiratory syndrome using a temporal detection algorithm tended to detect an anthrax attack within 3-4 days after exposure if >10,000 persons were infected. The performance of surveillance (i.e., timeliness and sensitivity) worsened as the number of persons infected decreased.

    View details for PubMedID 16177701

  • EZPAL: Environment for composing constraint axioms by instantlating templates INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES Hou, C. S., Musen, M. A., Noy, N. F. 2005; 62 (5): 578-596
  • Challenges in converting frame-based ontology into OWL: the Foundational Model of Anatomy case-study. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Dameron, O., Rubin, D. L., Musen, M. A. 2005: 181-185


    A description logics representation of the Foundational Model of Anatomy (FMA) in the Web Ontology Language (OWL-DL) would allow developers to combine it with other OWL ontologies, and would provide the benefit of being able to access generic reasoning tools. However, the FMA is currently represented in a frame language. The differences between description logics and frames are not only syntactic, but also semantic. We analyze some theoretical and computational limitations of converting the FMA into OWL-DL. Namely, some of the constructs used in the FMA do not have a direct equivalent in description logics, and a complete conversion of the FMA in description logics is too large to support reasoning. Therefore, an OWL-DL representation of the FMA would have to be optimized for each application. We propose a solution based on OWL-Full, a superlanguage of OWL-DL, that meets the expressiveness requirements and remains application-independent. Specific simplified OWL-DL representations can then be generated from the OWL-Full model by applications. We argue that this solution is easier to implement and closer to the application needs than an integral translation, and that the latter approach would only make the FMA maintenance more difficult.

    View details for PubMedID 16779026

  • Supporting rule system interoperability on the semantic web with SWRL SEMANTIC WEB - ISWC 2005, PROCEEDINGS O'Connor, M. T., Knublauch, H., Tu, S., Grosof, B., Dean, M., Grosso, W., Musen, M. 2005; 3729: 974-986
  • Semantic clinical guideline documents. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Eriksson, H., Tu, S. W., Musen, M. 2005: 236-240


    Decision-support systems based on clinical practice guidelines can support physicians and other healthcare personnel in the process of following best practice consistently. A knowledge-based approach to represent guidelines makes it possible to encode computer-interpretable guidelines in a formal manner,perform consistency checks, and use the guidelines directly in decision-support systems.Decision-support authors and guideline users require guidelines in human-readable formats in addition to computer-interpretable ones (e.g., for guideline review and quality assurance). We propose a new document-oriented information architecture that combines knowledge-representation models with electronic and paper documents. The approach integrates decision-support modes with standard document formats to create a combined clinical-guideline model that supports on-line viewing, printing, and decision support.

    View details for PubMedID 16779037

  • Use of description logic classification to reason about consequences of penetrating injuries. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Rubin, D. L., Dameron, O., Musen, M. A. 2005: 649-653


    The consequences of penetrating injuries can be complex, including abnormal blood flow through the injury channel and functional impairment of organs if arteries supplying them have been severed. Determining the consequences of such injuries can be posed as a classification problem, requiring a priori symbolic knowledge of anatomy. We hypothesize that such symbolic knowledge can be modeled using ontologies, and that the reasoning task can be accomplished using knowl-edge representation in description logics (DL) and automatic classification. We demonstrate the capabilities of automated classification using the Web Ontology Language (OWL) to reason about the consequences of penetrating injuries. We created in OWL a knowledge model of chest and heart anatomy describing the heart structure and the surrounding anatomic compartments, as well as the perfusion of regions of the heart by branches of the coronary arteries. We then used a domain-independent classifier to infer ischemic regions of the heart as well as anatomic spaces containing ectopic blood secondary to the injuries. Our results highlight the advantages of posing reasoning problems as a classification task, and lever-aging the automatic classification capabilities of DL to create intelligent applications.

    View details for PubMedID 16779120

  • Using an Ontology of Human Anatomy to Inform Reasoning with Geometric Models MEDICINE MEETS VIRTUAL REALITY 13: THE MAGICAL NEXT BECOMES THE MEDICAL NOW Rubin, D. L., Bashir, Y., Grossman, D., Dev, P., Musen, M. A. 2005; 111: 429-435


    The Virtual Soldier project is a large effort on the part of the U.S. Defense Advanced Research Projects agency to explore using both general anatomical knowledge and specific computed tomographic (CT) images of individual soldiers to aid the rapid diagnosis and treatment of penetrating injuries. Our goal is to develop intelligent computer applications that use this knowledge to reason about the anatomic structures that are directly injured and to predict propagation of injuries secondary to primary organ damage. To accomplish this, we needed to develop an architecture to combine geometric data with anatomic knowledge and reasoning services that use this information to predict the consequences of injuries.

    View details for Web of Science ID 000273828700086

    View details for PubMedID 15718773

  • Ontology metadata to support the building of a library of biomedical ontologies. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Supekar, K., Musen, M. 2005: 1126-?

    View details for PubMedID 16779413

  • Translating research into practice: Organizational issues in implementing automated decision support for hypertension in three medical centers JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Goldstein, M. K., Coleman, R. W., Tu, S. W., Shankar, R. D., O'Connor, M. J., Musen, M. A., Martins, S. B., Lavori, P. W., Shlipak, M. G., Oddone, E., Advani, A. A., Gholami, P., Hoffman, B. B. 2004; 11 (5): 368-376


    Information technology can support the implementation of clinical research findings in practice settings. Technology can address the quality gap in health care by providing automated decision support to clinicians that integrates guideline knowledge with electronic patient data to present real-time, patient-specific recommendations. However, technical success in implementing decision support systems may not translate directly into system use by clinicians. Successful technology integration into clinical work settings requires explicit attention to the organizational context. We describe the application of a "sociotechnical" approach to integration of ATHENA DSS, a decision support system for the treatment of hypertension, into geographically dispersed primary care clinics. We applied an iterative technical design in response to organizational input and obtained ongoing endorsements of the project by the organization's administrative and clinical leadership. Conscious attention to organizational context at the time of development, deployment, and maintenance of the system was associated with extensive clinician use of the system.

    View details for Web of Science ID 000223898000005

    View details for PubMedID 15187064

  • Ontology versioning in an ontology management framework IEEE INTELLIGENT SYSTEMS Noy, N. F., Musen, M. A. 2004; 19 (4): 6-13
  • Pushing the envelope: challenges in a frame-based representation of human anatomy DATA & KNOWLEDGE ENGINEERING Noy, N. F., Musen, M. A., Mejino, J. L., Rosse, C. 2004; 48 (3): 335-359
  • Modeling guidelines for integration into clinical workflow MEDINFO 2004: PROCEEDINGS OF THE 11TH WORLD CONGRESS ON MEDICAL INFORMATICS, PT 1 AND 2 Tu, S. W., Musen, M. A., Shankar, R., Campbell, J., Hrabak, K., McClay, J., Huff, S. M., Mcclure, R., Parker, C., Rocha, R., ABARBANEL, R., Beard, N., Glasgow, J., Mansfield, G., Ram, P., Ye, Q., Mays, E., Weida, T., Chute, C. G., McDonald, K., Mohr, D., Nyman, M. A., Scheitel, S., Solbrig, H., Zill, D. A., Goldstein, M. K. 2004; 107: 174-178


    The success of clinical decision-support systems requires that they are seamlessly integrated into clinical workflow. In the SAGE project, which aims to create the technological infra-structure for implementing computable clinical practice guide-lines in enterprise settings, we created a deployment-driven methodology for developing guideline knowledge bases. It involves (1) identification of usage scenarios of guideline-based care in clinical workflow, (2) distillation and disambiguation of guideline knowledge relevant to these usage scenarios, (3) formalization of data elements and vocabulary used in the guideline, and (4) encoding of usage scenarios and guideline knowledge using an executable guideline model. This methodology makes explicit the points in the care process where guideline-based decision aids are appropriate and the roles of clinicians for whom the guideline-based assistance is intended. We have evaluated the methodology by simulating the deployment of an immunization guideline in a real clinical information system and by reconstructing the workflow context of a deployed decision-support system for guideline-based care. We discuss the implication of deployment-driven guideline encoding for sharability of executable guidelines.

    View details for Web of Science ID 000226723300036

    View details for PubMedID 15360798

  • Linking ontologies with three-dimensional models of anatomy to predict the effects of penetrating injuries PROCEEDINGS OF THE 26TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7 Rubin, D. L., Bashir, Y., Grossman, D., Dev, P., Musen, M. A. 2004; 26: 3128-3131
  • A knowledge-based framework for deploying surveillance problem solvers IKE '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGNINEERING Buckeridge, D. L., O'Connor, M. J., Xu, H. B., Musen, M. A. 2004: 28-32
  • Specifying ontology views by traversal SEMANTIC WEB - ISWC 2004, PROCEEDINGS Noy, N. F., Musen, M. A. 2004; 3298: 713-725
  • Tracking changes during ontology evolution SEMANTIC WEB - ISWC 2004, PROCEEDINGS Noy, N. F., Kunnatur, S., Klein, M., Musen, M. A. 2004; 3298: 259-273
  • The SAGE guideline modeling: Motivation and methodology COMPUTER-BASED SUPPORT FOR CLINICAL GUIDELINES AND PROTOCOLS Tu, S. W., Campbell, J., Musen, M. A. 2004; 101: 167-171


    The SAGE (Standards-Based Sharable Active Guideline Environment) project is a collaboration among research groups at six institutions in the US. The ultimate goal of the project is to create an infrastructure that will allow execution of standards-based clinical practice guidelines across heterogeneous clinical information systems. This paper describes the design goals of the SAGE guideline model in the context of the technological infrastructure and guideline modeling methodology that the project is developing.

    View details for Web of Science ID 000222294800021

    View details for PubMedID 15537222

  • Linking ontologies with three-dimensional models of anatomy to predict the effects of penetrating injuries. Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference Rubin, D. L., Bashir, Y., Grossman, D., Dev, P., Musen, M. A. 2004; 5: 3128-3131


    Rapid diagnosis of penetrating injuries is essential to increased chance of survival. Geometric models representing anatomic structures could be useful, but such models generally contain only information about the relationships of points in space as well as display properties. We describe an approach to predicting the anatomic consequences of penetrating injury by creating a geometric model of anatomy that integrates biomechanical and anatomic knowledge. We created a geometric model of the heart from the Visible Human image data set. We linked this geometric model of anatomy with an ontology of descriptive anatomic knowledge. A hierarchy of abstract geometric objects was created that represents organs and organ parts. These geometric objects contain information about organ identity, composition, adjacency, and tissue biomechanical properties. This integrated model can support anatomic reasoning. Given a bullet trajectory and a parametric representation of a cone of tissue damage, we can use our model to predict the organs and organ parts that are injured. Our model is extensible, being able to incorporate future information, such as physiological implications of organ injuries.

    View details for PubMedID 17270942

  • Evaluating provider adherence in a trial of a guideline-based decision support system for hypertension MEDINFO 2004: PROCEEDINGS OF THE 11TH WORLD CONGRESS ON MEDICAL INFORMATICS, PT 1 AND 2 Chan, A. S., Coleman, R. W., Martins, S. B., Advani, A., Musen, M. A., Bosworth, H. B., Oddone, E. Z., Shlipak, M. G., Hoffman, B. B., Goldstein, M. K. 2004; 107: 125-129


    Measurement of provider adherence to a guideline-based decision support system (DSS) presents a number of important challenges. Establishing a causal relationship between the DSS and change in concordance requires consideration of both the primary intention of the guideline and different ways providers attempt to satisfy the guideline. During our work with a guideline-based decision support system for hypertension, ATHENA DSS, we document a number of subtle deviations from the strict hypertension guideline recommendations that ultimately demonstrate provider adherence. We believe that understanding these complexities is crucial to any valid evaluation of provider adherence. We also describe the development of an advisory evaluation engine that automates the interpretation of clinician adherence with the DSS on multiple levels, facilitating the high volume of complex data analysis that is created in a clinical trial of a guideline-based DSS.

    View details for Web of Science ID 000226723300026

    View details for PubMedID 15360788

  • An intelligent case-adjustment algorithm for the automated design of population-based quality auditing protocols MEDINFO 2004: PROCEEDINGS OF THE 11TH WORLD CONGRESS ON MEDICAL INFORMATICS, PT 1 AND 2 Advani, A., Jones, N., Shahar, Y., Goldstein, M., Musen, M. A. 2004; 107: 1003-1007


    We develop a method and algorithm for deciding the optimal approach to creating quality-auditing protocols for guideline-based clinical performance measures. An important element of the audit protocol design problem is deciding which guide-line elements to audit. Specifically, the problem is how and when to aggregate individual patient case-specific guideline elements into population-based quality measures. The key statistical issue involved is the trade-off between increased reliability with more general population-based quality measures versus increased validity from individually case-adjusted but more restricted measures done at a greater audit cost. Our intelligent algorithm for auditing protocol design is based on hierarchically modeling incrementally case-adjusted quality constraints. We select quality constraints to measure using an optimization criterion based on statistical generalizability coefficients. We present results of the approach from a deployed decision support system for a hypertension guideline.

    View details for Web of Science ID 000226723300202

    View details for PubMedID 15360963

  • The PROMPT suite: interactive tools for ontology merging and mapping INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES Noy, N. F., Musen, M. A. 2003; 59 (6): 983-1024
  • Configuring online problem-solving resources with the internet reasoning service IEEE INTELLIGENT SYSTEMS Crubezy, M., Musen, M. A., Motta, E., Lu, W. J. 2003; 18 (2): 34-42
  • Developing quality indicators and auditing protocols from formal guideline models: knowledge representation and transformations. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Advani, A., Goldstein, M., Shahar, Y., Musen, M. A. 2003: 11-15


    Automated quality assessment of clinician actions and patient outcomes is a central problem in guideline- or standards-based medical care. In this paper we describe a model representation and algorithm for deriving structured quality indicators and auditing protocols from formalized specifications of guidelines used in decision support systems. We apply the model and algorithm to the assessment of physician concordance with a guideline knowledge model for hypertension used in a decision-support system. The properties of our solution include the ability to derive automatically context-specific and case-mix-adjusted quality indicators that can model global or local levels of detail about the guideline parameterized by defining the reliability of each indicator or element of the guideline.

    View details for PubMedID 14728124

  • Challenges in Medical Informatics. A Discipline Coming of Age. Yearbook of medical informatics Musen, M. A., van Bemmel, J. H. 2003: 209-210

    View details for PubMedID 27706339

  • UPML: The language and tool support for making the Semantic Web alive SPINNING THE SEMANTIC WEB Omelayenko, B., Crubezy, M., Fensel, D., Benjamins, R., Wielinga, B., Motta, E., Musen, M., Ding, Y. 2003: 141-170
  • BioSTORM: a system for automated surveillance of diverse data sources. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium O'Connor, M. J., Buckeridge, D. L., Choy, M., Crubezy, M., Pincus, Z., Musen, M. A. 2003: 1071-?


    Heightened concerns about bioterrorism are forcing changes to the traditional biosurveillance-model. Public health departments are under pressure to follow multiple, non-specific, pre-diagnostic indicators, often drawn from many data sources. As a result, there is a need for biosurveillance systems that can use a variety of analysis techniques to rapidly integrate and process multiple diverse data feeds using a variety of problem solving techniques to give timely analysis. To meet these requirements, we are developing a new system called BioSTORM (Biological Spatio-Temporal Outbreak Reasoning Module).

    View details for PubMedID 14728574

  • A knowledge-acquisition wizard to encode guidelines. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Shankar, R. D., Tu, S. W., Musen, M. A. 2003: 1007-?


    An important step in building guideline-based clinical care systems is encoding guidelines. Protégé-2000, developed in our laboratory, is a general-purpose knowledge-acquisition tool that facilitates domain experts and developers to record, browse and maintain domain knowledge in knowledge bases. In this poster we illustrate a knowledge-acquisition wizard that we built around Protégé-2000. The wizard provides an environment that is more intuitive to domain specialists to enter knowledge, and domain specialists and practitioners to review the knowledge entered.

    View details for PubMedID 14728510

  • Protégé-2000: an open-source ontology-development and knowledge-acquisition environment. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Noy, N. F., Crubezy, M., Fergerson, R. W., Knublauch, H., Tu, S. W., Vendetti, J., Musen, M. A. 2003: 953-?


    Protégé-2000 is an open-source tool that assists users in the construction of large electronic knowledge bases. It has an intuitive user interface that enables developers to create and edit domain ontologies. Numerous plugins provide alternative visualization mechanisms, enable management of multiple ontologies, allow the use of interference engines and problem solvers with Protégé ontologies, and provide other functionality. The Protégé user community has more than 7000 members.

    View details for PubMedID 14728458

  • Contextualizing heterogeneous data for integration and inference. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Pincus, Z., Musen, M. A. 2003: 514-518


    Systems that attempt to integrate and analyze data from multiple data sources are greatly aided by the addition of specific semantic and metadata "context" that explicitly describes what a data value means. In this paper, we describe a systematic approach to constructing models of data and their context. Our approach provides a generic "template" for constructing such models. For each data source, a developer creates a customized model by filling in the tem-plate with predefined attributes and value. This approach facilitates model construction and provides consistent syntax and semantics among models created with the template. Systems that can process the template structure and attribute values can reason about any model so described. We used the template to create a detailed knowledge base for syndromic surveillance data integration and analysis. The knowledge base provided support for data integration, translation, and analysis methods.

    View details for PubMedID 14728226

  • An analytic framework fo space-time aberrancy detection in public health surveillance data. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Buckeridge, D. L., Musen, M. A., Switzer, P., Crubézy, M. 2003: 120-124


    Public health surveillance is changing in response to concerns about bioterrorism, which have increased the pressure for early detection of epidemics. Rapid detection necessitates following multiple non-specific indicators and accounting for spatial structure. No single analytic method can meet all of these requirements for all data sources and all surveillance goals. Analytic methods must be selected and configured to meet a surveillance goal, but there are no uniform criteria to guide the selection and configuration process. In this paper, we describe work towards the development of an analytic framework for space-time aberrancy detection in public health surveillance data. The framework decomposes surveillance analysis into sub-tasks and identifies knowledge that can facilitate selection of methods to accomplish sub-tasks.

    View details for PubMedID 14728146

  • The evolution of Protege: an environment for knowledge-based systems development INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES Gennari, J. H., Musen, M. A., Fergerson, R. W., Grosso, W. E., Crubezy, M., Eriksson, H., Noy, N. F., Tu, S. W. 2003; 58 (1): 89-123
  • The structure of guideline recommendations: a synthesis. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Tu, S. W., Campbell, J., Musen, M. A. 2003: 679-683


    We propose that recommendations in a clinical guideline can be structured either as collections of decisions that are to be applied in specific situations or as processes that specify activities that take place over time. We formalize them as "recommendation sets" consisting of either Activity Graphs that represent guideline-directed processes or Decision Maps that represent atemporal recommendations or recommendations involving decisions made at one time point. We model guideline processes as specializations of workflow processes and provide possible computational models for decision maps. We evaluate the proposed formalism by showing how various guideline-modeling methodologies, including GLIF, EON, PRODIGY3, and Medical Logic Modules can be mapped into the proposed structures. The generality of the formalism makes it a candidate for standardizing the structure of recommendations for computer-interpretable guidelines.

    View details for PubMedID 14728259

  • Patient safety in guideline-based decision support for hypertension management: ATHENA DSS JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Goldstein, M. K., Hoffman, B. B., Coleman, R. W., Tu, S. W., Shankar, R. D., O'Connor, M., Martins, S., Advani, A., Musen, M. A. 2002; 9 (6): S11-S16
  • Medical quality assessment by scoring adherence to guideline intentions JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Advani, A., Shahar, Y., Musen, M. A. 2002; 9 (6): S92-S97
  • Bioterrorism preparedness and response: use of information technologies and decision support systems. Evidence report/technology assessment (Summary) Bravata, D. M., McDonald, K., Owens, D. K., Buckeridge, D., Haberland, C., Rydzak, C., Schleinitz, M., Smith, W. M., Szeto, H., Wilkening, D., Musen, M., Duncan, B. W., Nouri, B., Dangiolo, M. B., Liu, H., Shofer, S., Graham, J., Davies, S. 2002: 1-8

    View details for PubMedID 12154489

  • A framework for evidence-adaptive quality assessment that unifies guideline-based and performance-indicator approaches AMIA 2002 SYMPOSIUM, PROCEEDINGS Advani, A., Goldstein, M., Musen, M. A. 2002: 2-6


    Automated quality assessment of clinician actions and patient outcomes is a central problem in guideline- or standards-based medical care. In this paper we describe a unified model representation and algorithm for evidence-adaptive quality assessment scoring that can: (1) use both complex case-specific guidelines and single-step population-wide performance-indicators as quality measures; (2) score adherence consistently with quantitative population-based medical utilities of the quality measures where available; and (3) give worst-case and best-case scores for variations based on (a) uncertain knowledge of the best practice, (b) guideline customization to an individual patient or particular population, (c) physician practice style variation, or (d) imperfect reliability of the quality measure. Our solution uses fuzzy measure-theoretic scoring to handle the uncertain knowledge about best-practices and the ambiguity from practice variation. We show results of applying our method to retrospective data from a guideline project to improve the quality of hypertension care.

    View details for Web of Science ID 000189418100001

    View details for PubMedID 12463775

  • Challenges for Medical Informatics as an Academic Discipline: Workshop Report. Yearbook of medical informatics Musen, M. A., van Bemmel, J. H. 2002: 194-197

    View details for PubMedID 27706365

  • A typology for modeling processes in clinical guidelines and protocols AMIA 2002 SYMPOSIUM, PROCEEDINGS Tu, S. W., Johnson, P. D., Musen, M. A. 2002: 1181-1181
  • Standards-based sharable active guideline environment (SAGE): A project to develop a universal framework for encoding and disseminating electronic clinical practice guidelines AMIA 2002 SYMPOSIUM, PROCEEDINGS Beard, N., Campbell, J. R., Huff, S. M., Leon, M., MANSFIELD, J. G., Mays, E., McClay, J., Mohr, D. N., Musen, M. A., O'Brien, D., Rocha, R. A., Saulovich, A., Scheitel, S. M., Tu, S. W. 2002: 973-973
  • Use of Protege-2000 to encode clinical guidelines AMIA 2002 SYMPOSIUM, PROCEEDINGS Shankar, R. D., Tu, S. W., Musen, M. A. 2002: 1164-1164
  • Protocol design patterns: Domain-oriented abstractions to support the authoring of computer-executable clinical trials AMIA 2002 SYMPOSIUM, PROCEEDINGS Nguyen, J. H., Kahn, M. G., Broverman, C. A., Musen, M. A. 2002: 1114-1114
  • Conceptual heterogeneity complicates automated syndromic surveillance for bioterrorism AMIA 2002 SYMPOSIUM, PROCEEDINGS Graham, J., Buckeridge, D., Choy, M., Musen, M. 2002: 1030-1030
  • Knowledge-based bioterrorism surveillance AMIA 2002 SYMPOSIUM, PROCEEDINGS Buckeridge, D. L., Graham, J., O'Connor, M. J., Choy, M. K., Tu, S. W., Musen, M. A. 2002: 76-80


    An epidemic resulting from an act of bioterrorism could be catastrophic. However, if an epidemic can be detected and characterized early on, prompt public health intervention may mitigate its impact. Current surveillance approaches do not perform well in terms of rapid epidemic detection or epidemic monitoring. One reason for this shortcoming is their failure to bring existing knowledge and data to bear on the problem in a coherent manner. Knowledge-based methods can integrate surveillance data and knowledge, and allow for careful evaluation of problem-solving methods. This paper presents an argument for knowledge-based surveillance, describes a prototype of BioSTORM, a system for real-time epidemic surveillance, and shows an initial evaluation of this system applied to a simulated epidemic from a bioterrorism attack.

    View details for Web of Science ID 000189418100016

    View details for PubMedID 12463790

  • Medical informatics: Searching for underlying components METHODS OF INFORMATION IN MEDICINE Musen, M. A. 2002; 41 (1): 12-19


    To discuss unifying principles that can provide a theory for the diverse aspects of work in medical informatics. If medical informatics is to have academic credibility, it must articulate a clear theory that is distinct from that of computer science or of other related areas of study.The notions of reusable domain antologies and problem-solving methods provide the foundation for current work on second-generation knowledge-based systems. These abstractions are also attractive for defining the core contributions of basic research in informatics. We can understand many central activities within informatics in terms defining, refining, applying, and evaluating domain ontologies and problem-solving methods.Construing work in medical informatics in terms of actions involving ontologies and problem-solving methods may move us closer to a theoretical basis for our field.

    View details for Web of Science ID 000174503800004

    View details for PubMedID 11933757

  • The chronus II temporal database mediator AMIA 2002 SYMPOSIUM, PROCEEDINGS O'Connor, M. J., Tu, S. W., Musen, M. A. 2002: 567-571


    Clinical databases typically contain a significant amount of temporal information. This information is often crucial in medical decision-support systems. Although temporal queries are common in clinical systems, the medical informatics field has no standard means for representing or querying temporal data. Over the past decade, the temporal database community has made a significant amount of progress in temporal systems. Much of this research can be applied to clinical database systems. This paper outlines a temporal database mediator called Chronus II. Chronus II extends the standard relational model and the SQL query language to support temporal queries. It provides an expressive general-purpose temporal query language that is tuned to the querying requirements of clinical decision support systems. This paper describes how we have used Chronus II to tackle a variety of clinical problems in decision support systems developed by our group.

    View details for Web of Science ID 000189418100115

    View details for PubMedID 12474882

  • SYNCHRONUS: A reusable software module for temporal integration AMIA 2002 SYMPOSIUM, PROCEEDINGS Das, A. K., Musen, M. A. 2002: 195-199


    Querying time-stamped data in clinical databases is an essential step in the actuation of many decision-support rules. Since previous methods of temporal data management are not readily transferable among legacy databases, developers must create de novo querying methods that allow temporal integration of a decision-support program and existing database. In this paper, we outline four software-engineering principles that support a general, reusable approach to temporal integration. We then describe the design and implementation of SYNCHRONUS, a software module that advances our prior work on temporal querying. We show how this module satisfies the four principles for the task of temporal integration. SYNCHRONUS can help developers to overcome the software-engineering burden of temporal model heterogeneity within decision-support architectures.

    View details for Web of Science ID 000189418100040

    View details for PubMedID 12463814

  • Creating Semantic Web contents with Protege-2000 IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS Noy, N. F., Sintek, M., Decker, S., Crubezy, M., Fergerson, R. W., Musen, M. A. 2001; 16 (2): 60-71
  • Building an explanation function for a hypertension decision-support system Shankar, R. D., Martins, S. B., Tu, S. W., Goldstein, M. K., Musen, M. A. I O S PRESS. 2001: 538-542


    ATHENA DSS is a decision-support system that provides recommendations for managing hypertension in primary care. ATHENA DSS is built on a component-based architecture called EON. User acceptance of a system like this one depends partly on how well the system explains its reasoning and justifies its conclusions. We addressed this issue by adapting WOZ, a declarative explanation framework, to build an explanation function for ATHENA DSS. ATHENA DSS is built based on a component-based architecture called EON. The explanation function obtains its information by tapping into EON's components, as well as into other relevant sources such as the guideline document and medical literature. It uses an argument model to identify the pieces of information that constitute an explanation, and employs a set of visual clients to display that explanation. By incorporating varied information sources, by mirroring naturally occurring medical arguments and by utilizing graphic visualizations, ATHENA DSS's explanation function generates rich, evidence-based explanations.

    View details for Web of Science ID 000172901700127

    View details for PubMedID 11604798

  • A virtual medical record for guideline-based decision support Johnson, P. D., Tu, S. W., Musen, M. A., Purves, I. BMJ PUBLISHING GROUP. 2001: 294-298


    A major obstacle in deploying computer-based clinical guidelines at the point of care is the variability of electronic medical records and the consequent need to adapt guideline modeling languages, guideline knowledge bases, and execution engines to idiosyncratic data models in the deployment environment. This paper reports an approach, developed jointly by researchers at Newcastle and Stanford, where guideline models are encoded assuming a uniform virtual electronic medical record and guideline-specific concept ontologies. For implementing a guideline-based decision-support system in multiple deployment environments, we created mapping knowledge bases to link terms in the concept ontology with the terminology used in the deployment systems. Mediation components use these mapping knowledge bases to map data in locally deployed medical record architectures to the virtual medical record. We discuss the possibility of using the HL7 Reference Information Model (RIM) as the basis for a standardized virtual medical record, showing how this approach also complies with the European pre-standard ENV13606 for electronic healthcare record communication.

    View details for Web of Science ID 000172263400061

    View details for PubMedID 11825198

  • Representation of structural relationships in the Foundational Model of anatomy Mejino, J. L., Noy, N. F., Musen, M. A., Brinkley, J. F., Rosse, C. BMJ PUBLISHING GROUP. 2001: 973-973
  • Patient safety in guideline-based decision support for hypertension management: ATHENA DSS Goldstein, M. K., Hoffman, B. B., Coleman, R. W., Tu, S. W., Shankar, R. D., O'Connor, M., Martins, S., Advani, A., Musen, M. A. BMJ PUBLISHING GROUP. 2001: 214-218


    The Institute of Medicine recently issued a landmark report on medical error.1 In the penumbra of this report, every aspect of health care is subject to new scrutiny regarding patient safety. Informatics technology can support patient safety by correcting problems inherent in older technology; however, new information technology can also contribute to new sources of error. We report here a categorization of possible errors that may arise in deploying a system designed to give guideline-based advice on prescribing drugs, an approach to anticipating these errors in an automated guideline system, and design features to minimize errors and thereby maximize patient safety. Our guideline implementation system, based on the EON architecture, provides a framework for a knowledge base that is sufficiently comprehensive to incorporate safety information, and that is easily reviewed and updated by clinician-experts.

    View details for Web of Science ID 000172263400045

    View details for PubMedID 11825183

  • A client-server framework for deploying a decision-support system in a resource-constrained environment O'Connor, M. J., Shankar, R. D., Tu, S. W., Advani, A., Goldstein, M. K., Coleman, R. W., Musen, M. A. BMJ PUBLISHING GROUP. 2001: 986-986
  • Modeling data and knowledge in the EON guideline architecture Tu, S. W., Musen, M. A. I O S PRESS. 2001: 280-284


    Compared to guideline representation formalisms, data and knowledge modeling for clinical guidelines is a relatively neglected area. Yet it has enormous impact on the format and expressiveness of decision criteria that can be written, on the inferences that can be made from patient data, on the ease with which guidelines can be formalized, and on the method of integrating guideline-based decision-support services into implementation sites' information systems. We clarify the respective roles that data and knowledge modeling play in providing patient-specific decision support based on clinical guidelines. We show, in the context of the EON guideline architecture, how we use the Protégé-2000 knowledge-engineering environment to build (1) a patient-data information model, (2) a medical-specialty model, and (3) a guideline model that formalizes the knowledge needed to generate recommendations regarding clinical decisions and actions. We show how the use of such models allows development of alternative decision-criteria languages and allows systematic mapping of the data required for guideline execution from patient data contained in electronic medical record systems.

    View details for Web of Science ID 000172901700062

    View details for PubMedID 11604749

  • A formal method to resolve temporal mismatches in clinical databases Das, A. K., Musen, M. A. BMJ PUBLISHING GROUP. 2001: 130-134


    Overcoming data heterogeneity is essential to the transfer of decision-support programs to legacy databases and to the integration of data in clinical repositories. Prior methods have focused primarily on problems of differences in terminology and patient identifiers, and have not addressed formally the problem of temporal data heterogeneity, even though time is a necessary element in storing, manipulating, and reasoning about clinical data. In this paper, we present a method to resolve temporal mismatches present in clinical databases. This method is based on a foundational model of time that can formalize various temporal representations. We use this temporal model to define a novel set of twelve operators that can map heterogeneous time-stamped data into a uniform temporal scheme. We present an algorithm that uses these mapping operators, and we discuss our implementation and evaluation of the method as a software program called Synchronus.

    View details for Web of Science ID 000172263400028

    View details for PubMedID 11825168

  • Medical quality assessment by scoring adherence to guideline intentions Advani, A., Shahar, Y., Musen, M. A. BMJ PUBLISHING GROUP. 2001: 2-6


    Quality assessment of clinician actions and patient outcomes is a central problem in guideline- or standards-based medical care. In this paper we describe an approach for evaluating and consistently scoring clinician adherence to medical guidelines using the intentions of guideline authors. We present the Quality Indicator Language (QUIL) that may be used to formally specify quality constraints on physician behavior and patient outcomes derived from medical guidelines. We present a modeling and scoring methodology for consistently evaluating multi-step and multi-choice guideline plans based on guideline intentions and their revisions.

    View details for Web of Science ID 000172263400002

    View details for PubMedID 11825146

  • RASTA: A distributed temporal abstraction system to facilitate knowledge-driven monitoring of clinical databases O'Connor, M. J., Grosso, W. E., Tu, S. W., Musen, M. A. I O S PRESS. 2001: 508-512


    The time dimension is very important for applications that reason with clinical data. Unfortunately, this task is inherently computationally expensive. As clinical decision support systems tackle increasingly varied problems, they will increase the demands on the temporal reasoning component, which may lead to slow response times. This paper addresses this problem. It describes a temporal reasoning system called RASTA that uses a distributed algorithm that enables it to deal with large data sets. The algorithm also supports a variety of configuration options, enabling RASTA to deal with a range of application requirements.

    View details for Web of Science ID 000172901700121

    View details for PubMedID 11604792

  • Integration of textual guideline documents with formal guideline knowledge bases Shankar, R. D., Tu, S. W., Martins, S. B., Fagan, L. M., Goldstein, M. K., Musen, M. A. BMJ PUBLISHING GROUP. 2001: 617-621


    Numerous approaches have been proposed to integrate the text of guideline documents with guideline-based care systems. Current approaches range from serving marked up guideline text documents to generating advisories using complex guideline knowledge bases. These approaches have integration problems mainly because they tend to rigidly link the knowledge base with text. We are developing a bridge approach that uses an information retrieval technology. The new approach facilitates a versatile decision-support system by using flexible links between the formal structures of the knowledge base and the natural language style of the guideline text.

    View details for Web of Science ID 000172263400126

    View details for PubMedID 11825260

  • Integration and beyond: Linking information from disparate sources and into workflow JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Stead, W. W., MILLER, R. A., Musen, M. A., Hersh, W. R. 2000; 7 (2): 135-145


    The vision of integrating information-from a variety of sources, into the way people work, to improve decisions and process-is one of the cornerstones of biomedical informatics. Thoughts on how this vision might be realized have evolved as improvements in information and communication technologies, together with discoveries in biomedical informatics, and have changed the art of the possible. This review identified three distinct generations of "integration" projects. First-generation projects create a database and use it for multiple purposes. Second-generation projects integrate by bringing information from various sources together through enterprise information architecture. Third-generation projects inter-relate disparate but accessible information sources to provide the appearance of integration. The review suggests that the ideas developed in the earlier generations have not been supplanted by ideas from subsequent generations. Instead, the ideas represent a continuum of progress along the three dimensions of workflow, structure, and extraction.

    View details for Web of Science ID 000085723800004

    View details for PubMedID 10730596

  • Implementing clinical practice guidelines while taking account of changing evidence: ATHENA DSS, an easily modifiable decision-support system for managing hypertension in primary care Goldstein, M. K., Hoffman, B. B., Coleman, R. W., Musen, M. A., Tu, S. W., Advani, A., Shankar, R., O'Connor, M. HANLEY & BELFUS INC. 2000: 300-304


    This paper describes the ATHENA Decision Support System (DSS), which operationalizes guidelines for hypertension using the EON architecture. ATHENA DSS encourages blood pressure control and recommends guideline-concordant choice of drug therapy in relation to comorbid diseases. ATHENA DSS has an easily modifiable knowledge base that specifies eligibility criteria, risk stratification, blood pressure targets, relevant comorbid diseases, guideline-recommended drug classes for patients with comorbid disease, preferred drugs within each drug class, and clinical messages. Because evidence for best management of hypertension evolves continually, ATHENA DSS is designed to allow clinical experts to customize the knowledge base to incorporate new evidence or to reflect local interpretations of guideline ambiguities. Together with its database mediator Athenaeum, ATHENA DSS has physical and logical data independence from the legacy Computerized Patient Record System (CPRS) supplying the patient data, so it can be integrated into a variety of electronic medical record systems.

    View details for Web of Science ID 000170207500062

    View details for PubMedID 11079893

  • The knowledge model of Protege-2000: Combining interoperability and flexibility KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT, PROCEEDINGS Noy, N. F., Fergerson, R. W., Musen, M. A. 2000; 1937: 17-32
  • Knowledge representation and tool support for critiquing clinical trial protocols JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Rubin, D. L., Gennari, J., Musen, M. A. 2000: 724-728


    The increasing complexities of clinical trials have led to increasing costs for investigators and organizations that author and administer those trials. The process of authoring a clinical trial protocol, the document that specifies the details of the study, is usually a manual task, and thus authors may introduce subtle errors in medical and procedural content. We have created a protocol inspection and critiquing tool (PICASSO) that evaluates the procedural aspects of a clinical trial protocol. To implement this tool, we developed a knowledge base for clinical trials that contains knowledge of the medical domain (diseases, drugs, lab tests, etc.) and of specific requirements for clinical trial protocols (eligibility criteria, patient treatments, and monitoring activities). We also developed a set of constraints, expressed in a formal language, that describe appropriate practices for authoring clinical trials. If a clinical trial designed with PICASSO violates any of these constraints, PICASSO generates a message to the user and a list of inconsistencies for each violated constraint. To test our methodology, we encoded portions of a hypothetical protocol and implemented designs consistent and inconsistent with known clinical trial practice. Our hope is that this methodology will be useful for standardizing new protocols and improving their quality.

    View details for Web of Science ID 000170207500148

    View details for PubMedID 11079979

  • A case study in using Protege-2000 as a tool for CommonKADS KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT, PROCEEDINGS Schreiber, G., Crubezy, M., Musen, M. 2000; 1937: 33-48
  • Explanations for a hypertension decision-support system Shankar, R. D., Tu, S. W., Goldstein, M. K., Musen, M. A. HANLEY & BELFUS INC. 2000: 1136-1136
  • Representation of temporal indeterminacy in clinical databases O'Connor, M. J., Tu, S. W., Musen, M. A. HANLEY & BELFUS INC. 2000: 615-619


    Temporal indeterminancy is common in clinical medicine because the time of many clinical events is frequently not precisely known. Decision support systems that reason with clinical data may need to deal with this indeterminancy. This indeterminacy support must have a sound foundational model so that other system components may take advantage of it. In particular, it should operate in concert with temporal abstraction, a feature that is crucial in several clinical decision support systems that our group has developed. We have implemented a temporal query system called Tzolkin that provides extensive support for the temporal indeterminancies found in clinical medicine, and have integrated this support with our temporal abstraction mechanism. The resulting system provides a simple, yet powerful approach for dealing with temporal indeterminancy and temporal abstraction.

    View details for Web of Science ID 000170207500126

    View details for PubMedID 11079957

  • From guideline modeling to guideline execution: Defining guideline-based decision-support services Tu, S. W., Musen, M. A. HANLEY & BELFUS INC. 2000: 863-867


    We describe our task-based approach to defining the guideline-based decision-support services that the EON system provides. We categorize uses of guidelines in patient-specific decision support into a set of generic tasks--making of decisions, specification of work to be performed, interpretation of data, setting of goals, and issuance of alert and reminders--that can be solved using various techniques. Our model includes constructs required for representing the knowledge used by these techniques. These constructs form a toolkit from which developers can select modeling solutions for guideline task. Based on the tasks and the guideline model, we define a guideline-execution architecture and a model of interactions between a decision-support server and clients that invoke services provided by the server. These services use generic interfaces derived from guideline tasks and their associated modeling constructs. We describe two implementations of these decision-support services and discuss how this work can be generalized. We argue that a well-defined specification of guideline-based decision-support services will facilitate sharing of tools that implement computable clinical guidelines.

    View details for Web of Science ID 000170207500176

    View details for PubMedID 11080007

  • Ontology acquisition from on-line knowledge sources Li, Q., Shilane, P., Noy, N. F., Musen, M. A. HANLEY & BELFUS INC. 2000: 497-501


    Electronic knowledge representation is becoming more and more pervasive both in the form of formal ontologies and less formal reference vocabularies, such as UMLS. The developers of clinical knowledge bases need to reuse these resources. Such reuse requires a new generation of tools for ontology development and management. Medical experts with little or no computer science experience need tools that will enable them to develop knowledge bases and provide capabilities for directly importing knowledge not only from formal knowledge bases but also from reference terminologies. The portions of knowledge bases that are imported from disparate resources then need to be merged or aligned to one another in order to link corresponding terms, to remove redundancies, to resolve logical conflicts. We discuss the requirements for ontology-management tools that will enable interoperability of disparate knowledge sources. Our group is developing a suite of tools for knowledge-base management based on the Protégé-2000 environment for ontology development and knowledge acquisition. We describe one such tool in detail here: an application for incorporating information from remote knowledge sources such as UMLS into a Protégé knowledge base.

    View details for Web of Science ID 000170207500102

    View details for PubMedID 11079933

  • Design and use of clinical ontologies: Curricular goals for the education of health-telematics professionals Musen, M. A. I O S PRESS. 2000: 40-47


    In computer science, the notion of a domain ontology--a formal specification of the concepts and of the relationships among concepts that characterize an application are a--has received considerable attention. In human-computer interaction, ontologies play a key role in defining the terms with which users and computer systems communicate. Such ontologies either implicitly or explicitly drive all dialogs between the computer and the user. In the construction of health-telematics applications, professionals need to understand how to design and apply domain ontologies to ensure effective communication with end-users. We currently are revising our training program in Medical Information Sciences at Stanford University to teach professional students in health telematics how to develop effective domain ontologies. Instruction concerning the construction and application of clinical domain ontologies should become an integral component of all health telematics curricula.

    View details for Web of Science ID 000086539700007

    View details for PubMedID 11010333

  • The impact of displayed awards on the credibility and retention of Web site information Shon, J., Marshall, J., Musen, M. A. HANLEY & BELFUS INC. 2000: 794-798


    Ratings systems and awards for medical Web sites have proliferated, but the validity and utility of the systems has not been well established. This study examined the effect of awards on the perceived credibility and retention of health information on a Web page. We recruited study participants from Internet newsgroups and presented them with information on the claimed health benefits of shark cartilage. Participants were randomized to receive health information with and without a medical award present on the page. We subsequently asked them to evaluate the credibility of the Web page and posed multiple-choice questions regarding the content of the pages. 137 completed responses were included for analysis. Our results show that the presentation of awards has no significant effect on the credibility or retention of health information on a Web page. Significantly, the highly educated participants in our study found inaccurate and misleading information on shark cartilage to be slightly believable.

    View details for Web of Science ID 000170207500162

    View details for PubMedID 11079993

  • Scalable software architectures for decision support Musen, M. A. SCHATTAUER GMBH-VERLAG MEDIZIN NATURWISSENSCHAFTEN. 1999: 229-238


    Interest in decision-support programs for clinical medicine soared in the 1970s. Since that time, workers in medical informatics have been particularly attracted to rule-based systems as a means of providing clinical decision support. Although developers have built many successful applications using production rules, they also have discovered that creation and maintenance of large rule bases is quite problematic. In the 1980s, several groups of investigators began to explore alternative programming abstractions that can be used to build decision-support systems. As a result, the notions of "generic tasks" and of reusable problem-solving methods became extremely influential. By the 1990s, academic centers were experimenting with architectures for intelligent systems based on two classes of reusable components: (1) problem-solving methods--domain-independent algorithms for automating stereotypical tasks--and (2) domain ontologies that captured the essential concepts (and relationships among those concepts) in particular application areas. This paper highlights how developers can construct large, maintainable decision-support systems using these kinds of building blocks. The creation of domain ontologies and problem-solving methods is the fundamental end product of basic research in medical informatics. Consequently, these concepts need more attention by our scientific community.

    View details for Web of Science ID 000084637800002

    View details for PubMedID 10805007

  • Semi-automated entry of clinical temporal-abstraction knowledge JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Shahar, Y., Chen, H., Stites, D. P., Basso, L. V., Kaizer, H., Wilson, D. M., Musen, M. A. 1999; 6 (6): 494-511


    The authors discuss the usability of an automated tool that supports entry, by clinical experts, of the knowledge necessary for forming high-level concepts and patterns from raw time-oriented clinical data.Based on their previous work on the RESUME system for forming high-level concepts from raw time-oriented clinical data, the authors designed a graphical knowledge acquisition (KA) tool that acquires the knowledge required by RESUME. This tool was designed using Protégé, a general framework and set of tools for the construction of knowledge-based systems. The usability of the KA tool was evaluated by three expert physicians and three knowledge engineers in three domains-the monitoring of children's growth, the care of patients with diabetes, and protocol-based care in oncology and in experimental therapy for AIDS. The study evaluated the usability of the KA tool for the entry of previously elicited knowledge.The authors recorded the time required to understand the methodology and the KA tool and to enter the knowledge; they examined the subjects' qualitative comments; and they compared the output abstractions with benchmark abstractions computed from the same data and a version of the same knowledge entered manually by RESUME experts.Understanding RESUME required 6 to 20 hours (median, 15 to 20 hours); learning to use the KA tool required 2 to 6 hours (median, 3 to 4 hours). Entry times for physicians varied by domain-2 to 20 hours for growth monitoring (median, 3 hours), 6 and 12 hours for diabetes care, and 5 to 60 hours for protocol-based care (median, 10 hours). An increase in speed of up to 25 times (median, 3 times) was demonstrated for all participants when the KA process was repeated. On their first attempt at using the tool to enter the knowledge, the knowledge engineers recorded entry times similar to those of the expert physicians' second attempt at entering the same knowledge. In all cases RESUME, using knowledge entered by means of the KA tool, generated abstractions that were almost identical to those generated using the same knowledge entered manually.The authors demonstrate that the KA tool is usable and effective for expert physicians and knowledge engineers to enter clinical temporal-abstraction knowledge and that the resulting knowledge bases are as valid as those produced by manual entry.

    View details for Web of Science ID 000083688300007

    View details for PubMedID 10579607

  • Use of a domain model to drive an interactive knowledge-editing tool INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES Musen, M. A., Fagan, L. M., Combs, D. M., Shortliffe, E. H. 1999; 51 (2): 479-495
  • Integration of temporal reasoning and temporal-data maintenance into a reusable database mediator to answer abstract, time-oriented queries: The Tzolkin system JOURNAL OF INTELLIGENT INFORMATION SYSTEMS Nguyen, J. H., Shahar, Y., Tu, S. W., Das, A. K., Musen, M. A. 1999; 13 (1-2): 121-145
  • Integrating a modern knowledge-based system architecture with a legacy VA database: The ATHENA and EON projects at Stanford Advani, A., Tu, S., O'Connor, M., Coleman, R., Goldstein, M. K., Musen, M. BMJ PUBLISHING GROUP. 1999: 653-657


    We present a methodology and database mediator tool for integrating modern knowledge-based systems, such as the Stanford EON architecture for automated guideline-based decision-support, with legacy databases, such as the Veterans Health Information Systems & Technology Architecture (VISTA) systems, which are used nation-wide. Specifically, we discuss designs for database integration in ATHENA, a system for hypertension care based on EON, at the VA Palo Alto Health Care System. We describe a new database mediator that affords the EON system both physical and logical data independence from the legacy VA database. We found that to achieve our design goals, the mediator requires two separate mapping levels and must itself involve a knowledge-based component.

    View details for Web of Science ID 000170207300134

    View details for PubMedID 10566440

  • Tool support for authoring eligibility criteria for cancer trials Rubin, D. L., Gennari, J. H., Srinivas, S., Yuen, A., Kaizer, H., Musen, M. A., Silva, J. S. BMJ PUBLISHING GROUP. 1999: 369-373


    A critical component of authoring new clinical trial protocols is assembling a set of eligibility criteria for patient enrollment. We found that clinical protocols in three different cancer domains can be categorized according to a set of clinical states that describe various clinical scenarios for that domain. Classifying protocols in this manner revealed similarities among the eligibility criteria and permitted some standardization of criteria based on clinical state. We have developed an eligibility criteria authoring tool which uses a standard set of eligibility criteria and a diagram of the clinical states to present the relevant eligibility criteria to the protocol author. We demonstrate our ideas with phase-3 protocols from breast cancer, prostate cancer, and non-small cell lung cancer. Based on measurements of redundancy and percentage coverage of criteria included in our tool, we conclude that our model reduces redundancy in the number of criteria needed to author multiple protocols, and it allows some eligibility criteria to be authored automatically based on the clinical state of interest for a protocol.

    View details for Web of Science ID 000170207300077

    View details for PubMedID 10566383

  • EON 2.0: Enhanced middleware for automation of protocol-directed therapy Musen, M. A., Tu, S. W., Shankar, R. D., O'Connor, M. J., Advani, A. BMJ PUBLISHING GROUP. 1999: 1215-1215
  • Representing the digital anatomist foundational model as a protege ontology Hahn, J. S., Burnside, E., Brinkley, J. F., Rosse, C., Musen, M. A. BMJ PUBLISHING GROUP. 1999: 1070-1070
  • Applying temporal joins to clinical databases O'Connor, M. J., Tu, S. W., Musen, M. A. BMJ PUBLISHING GROUP. 1999: 335-339


    Clinical databases typically contain a significant amount of temporal information, information that is often crucial in medical decision-support systems. Most recent clinical information systems use the relational model when working with this information. Although these systems have reasonably well-defined semantics for temporal queries on a single relational table, many do not fully address the complex semantics of operations involving multiple temporal tables. Such operations can arise frequently in queries on clinical databases. This paper describes the issues encountered when joining a set of temporal tables, and outlines how such joins are far more complex than non-temporal ones. We describe the semantics of temporal joins in a query management system called Chronus II, a system we have developed to assist in evaluating patients for clinical trials.

    View details for Web of Science ID 000170207300070

    View details for PubMedID 10566376

  • Representation of change in controlled medical terminologies ARTIFICIAL INTELLIGENCE IN MEDICINE Oliver, D. E., Shahar, Y., Shortliffe, E. H., Musen, M. A. 1999; 15 (1): 53-76


    Computer-based systems that support health care require large controlled terminologies to manage names and meanings of data elements. These terminologies are not static, because change in health care is inevitable. To share data and applications in health care, we need standards not only for terminologies and concept representation, but also for representing change. To develop a principled approach to managing change, we analyze the requirements of controlled medical terminologies and consider features that frame knowledge-representation systems have to offer. Based on our analysis, we present a concept model, a set of change operations, and a change-documentation model that may be appropriate for controlled terminologies in health care. We are currently implementing our modeling approach within a computational architecture.

    View details for Web of Science ID 000078040100004

    View details for PubMedID 9930616

  • The low availability of metadata elements for evaluating the quality of medical information on the World Wide Web Shon, J., Musen, M. A. BMJ PUBLISHING GROUP. 1999: 945-949


    A great barrier to the use of Internet resources for patient education is the concern over the quality of information available. We conducted a study to determine what information was available in Web pages, both within text and metadata source code, that could be used in the assessment of information quality. Analysis of pages retrieved from 97 unique sites using a simple keyword search for "breast cancer treatment" on a generic and a health-specific search engine revealed that basic publishing elements were present in low frequency: authorship (20%), attribution/references (32%), disclosure (41%), and currency (35%). Only one page retrieved contained all four elements. Automated extraction of metadata elements from the source code of 822 pages retrieved from five popular generic search engines revealed even less information. We discuss the design of a metadata-based system for the evaluation of quality of medical content on the World Wide Web that addresses current limitations in ensuring quality.

    View details for Web of Science ID 000170207300194

    View details for PubMedID 10566500

  • A flexible approach to guideline modeling Tu, S. W., Musen, M. A. BMJ PUBLISHING GROUP. 1999: 420-424


    We describe a task-oriented approach to guideline modeling that we have been developing in the EON project. We argue that guidelines seek to change behaviors by making statements involving some or all of the following tasks: (1) setting of goals or constraints, (2) making decisions among alternatives, (3) sequencing and synchronization of actions, and (4) interpreting data. Statements about these tasks make assumptions about models of time and of data abstractions, and about degree of uncertainty, points of view, and exception handling. Because of this variability in guideline tasks and assumptions, monolithic models cannot be custom tailored to the requirements of different classes of guidelines. Instead, we have created a core model that defines a set of basic concepts and relations and that uses different submodels to account for differing knowledge requirements. We describe the conceptualization of the guideline domain that underlies our approach, discuss components of the core model and possible submodels, and give three examples of specialized guideline models to illustrate how task-specific guideline models can be specialized and assembled to better match modeling requirements of different guidelines.

    View details for Web of Science ID 000170207300087

    View details for PubMedID 10566393

  • Justification of automated decision-making: Medical explanations as medical arguments Shankar, R. D., Musen, M. A. BMJ PUBLISHING GROUP. 1999: 395-399


    People use arguments to justify their claims. Computer systems use explanations to justify their conclusions. We are developing WOZ, an explanation framework that justifies the conclusions of a clinical decision-support system. WOZ's central component is the explanation strategy that decides what information justifies a claim. The strategy uses Toulmin's argument structure to define pieces of information and to orchestrate their presentation. WOZ uses explicit models that abstract the core aspects of the framework such as the explanation strategy. In this paper, we present the use of arguments, the modeling of explanations, and the explanation process used in WOZ. WOZ exploits the wealth of naturally occurring arguments, and thus can generate convincing medical explanations.

    View details for Web of Science ID 000170207300082

    View details for PubMedID 10566388

  • Domain ontologies in software engineering: Use of protege with the EON architecture METHODS OF INFORMATION IN MEDICINE Musen, M. A. 1998; 37 (4-5): 540-550


    Domain ontologies are formal descriptions of the classes of concepts and the relationships among those concepts that describe an application area. The Protégé software-engineering methodology provides a clear division between domain ontologies and domain-independent problem-solvers that, when mapped to domain ontologies, can solve application tasks. The Protégé approach allows domain ontologies to inform the total software-engineering process, and for ontologies to be shared among a variety of problem-solving components. We illustrate the approach by describing the development of EON, a set of middleware components that automate various aspects of protocol-directed therapy. Our work illustrates the organizing effect that domain ontologies can have on the software-development process. Ontologies, like all formal representations, have limitations in their ability to capture the semantics of application areas. Nevertheless, the capability of ontologies to encode clinical distinctions not usually captured by controlled medical terminologies provides significant advantages for developers and maintainers of clinical software applications.

    View details for Web of Science ID 000077676800026

    View details for PubMedID 9865052

  • Reuse, CORBA, and knowledge-based systems INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES Gennari, J. H., Cheng, H. N., Altman, R. B., Musen, M. A. 1998; 49 (4): 523-546
  • Episodic refinement of episodic skeletal-plan refinement INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES Tu, S. W., Musen, M. A. 1998; 48 (4): 475-497
  • A declarative explanation framework that uses a collection of visualization agents JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Shankar, R. D., Tu, S. W., Musen, M. A. 1998: 602-606


    User acceptance of a knowledge-based system depends partly on how effective the system is in explaining its reasoning and justifying its conclusions. The WOZ framework provides effective explanations for component-based decision-support systems. It represents explanation using explicit models, and employs a collection of visualization agents. It blends the strong features of existing explanation strategies, component-based systems, graphical visualizations, and explicit models. We illustrate the features of WOZ with the help of a component-based medical therapy system. We describe the explanation strategy, the roles of the visualization agents and components, and the communication structure. The integration of existing and new visualization applications, the domain-independent framework, and the incorporation of varied knowledge sources for explanation can result in a flexible explanation facility.

    View details for Web of Science ID 000171768600117

    View details for PubMedID 9929290

  • VM-in-Protege: A study of software reuse Park, J. Y., Musen, M. A. I O S PRESS. 1998: 644-648


    Protégé is a system that encompasses a suite of graphical tools and a methodology for applying them to the task of creating and maintaining knowledge-based systems. One of our key goals for Protégé is to facilitate reuse on new problems of components of previously developed solutions. We investigated this reusability by applying preexisting library components in the Protégé system to a reconstruction of VM, a well-known rule-based system for ventilator management. The formal steps of the Protégé methodology-ontology creation, problem-solving method selection, knowledge engineering, and mapping-relation instantiation-were followed, and a working system with much of the reasoning capability of the original VM was created. The work illuminated important lessons regarding aspects of component reusability.

    View details for Web of Science ID 000077613500128

    View details for PubMedID 10384534

  • Therapy planning as constraint satisfaction: A computer-based antiretroviral therapy advisor for the management of HIV JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Smith, D. S., Park, J. Y., Musen, M. A. 1998: 627-631


    We applied the Protégé methodology for building knowledge-based systems to the domain of antiretroviral therapy. We modeled the task of prescribing drug therapy for HIV, abstracting the essential characteristics of the problem solving. We mapped our model of the antiretroviral-therapy domain to the class of constraint-satisfaction problems, and reused the propose-and-revise problem-solving method, from the Protégé library of methods, to build an antiretroviral therapy advisor, ART Critic. Careful modeling and using Protégé allowed us to build a useful and extensible knowledge-based application rapidly.

    View details for Web of Science ID 000171768600122

    View details for PubMedID 9929295

  • Modern architectures for intelligent systems: Reusable ontologies and problem-solving methods JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Musen, M. A. 1998: 46-52


    When interest in intelligent systems for clinical medicine soared in the 1970s, workers in medical informatics became particularly attracted to rule-based systems. Although many successful rule-based applications were constructed, development and maintenance of large rule bases remained quite problematic. In the 1980s, an entire industry dedicated to the marketing of tools for creating rule-based systems rose and fell, as workers in medical informatics began to appreciate deeply why knowledge acquisition and maintenance for such systems are difficult problems. During this time period, investigators began to explore alternative programming abstractions that could be used to develop intelligent systems. The notions of "generic tasks" and of reusable problem-solving methods became extremely influential. By the 1990s, academic centers were experimenting with architectures for intelligent systems based on two classes of reusable components: (1) domain-independent problem-solving methods-standard algorithms for automating stereotypical tasks--and (2) domain ontologies that captured the essential concepts (and relationships among those concepts) in particular application areas. This paper will highlight how intelligent systems for diverse tasks can be efficiently automated using these kinds of building blocks. The creation of domain ontologies and problem-solving methods is the fundamental end product of basic research in medical informatics. Consequently, these concepts need more attention by our scientific community.

    View details for Web of Science ID 000171768600008

    View details for PubMedID 9929181

  • Sequential versus standard neural networks for pattern recognition: An example using the domain of coronary heart disease COMPUTERS IN BIOLOGY AND MEDICINE OHNOMACHADO, L., Musen, M. A. 1997; 27 (4): 267-281


    The goal of this study was to compare standard and sequential neural network models for recognition of patterns of disease progression. Medical researchers who perform prognostic modeling usually oversimplify the problem by choosing a single point in time to predict outcomes (e.g. death in 5 years). This approach not only fails to differentiate patterns of disease progression, but also wastes important information that is usually available in time-oriented research data bases. The adequate use of sequential neural networks can improve the performance of prognostic systems if the interdependencies among prognoses at different intervals of time are explicitly modeled. In such models, predictions for a certain interval of time (e.g. death within 1 year) are influenced by predictions made for other intervals, and prognostic survival curves that provide consistent estimates for several points in time can be produced. We developed a system of neural network models that makes use of time-oriented data to predict development of coronary heart disease (CHD), using a set of 2594 patients. The output of the neural network system was a prognostic curve representing survival without CHD, and the inputs were the values of demographic, clinical, and laboratory variables. The system of neural networks was trained by backpropagation and its results were evaluated in test sets of previously unseen cases. We showed that, by explicitly modeling time in the neural network architecture, the performance of the prognostic index, measured by the area under the receiver operating characteristic (ROC) curve, was significantly improved (p < 0.05).

    View details for Web of Science ID A1997XV85800002

    View details for PubMedID 9303265

  • A foundational model of time for heterogeneous clinical databases JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Das, A. K., Musen, M. A. 1997: 106-110


    Differences among the database representations of clinical data are a major barrier to the integration of databases and to the sharing of decision-support applications across databases. Prior research on resolving data heterogeneity has not addressed specifically the types of mismatches found in various timestamping approaches for clinical data. Such temporal mismatches, which include time-unit differences among timestamps, must be overcome before many applications can use these data to reason about diagnosis, therapy, or prognosis. In this paper, we present an analysis of the types of temporal mismatches that exist in databases. To formalize these various approaches to timestamping, we provide a foundational model of time. This model gives us the semantics necessary to encode the temporal dimensions of clinical data in legacy databases and to transform such heterogeneous data into a uniform temporal representation suitable for decision support. We have implemented this foundational model as an extension to our Chronus system, which provides clinical decision-support applications the ability to match temporal patterns in clinical databases. We discuss the uniqueness of our approach in comparison with other research on representing and querying clinical data with varying timestamp representations.

    View details for Web of Science ID 000171774300023

    View details for PubMedID 9357598

  • EON: CORBA-based middleware for automation of protocol-directed therapy Musen, M. A., Tu, S. W., Advani, A., Das, A. K., Hasan, Z., Nguyen, J., Shahar, Y. BMJ PUBLISHING GROUP. 1997: 1025-1025
  • A temporal database mediator for protocol-based decision support JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Nguyen, J. H., Shahar, Y., Tu, S. S., Das, A. K., Musen, M. A. 1997: 298-302


    To meet the data-processing requirements for protocol-based decision support, a clinical data-management system must be capable of creating high-level summaries of time-oriented patient data, and of retrieving those summaries in a temporally meaningful fashion. We previously described a temporal-abstraction module (RESUME) and a temporal-querying module (Chronus) that can be used together to perform these tasks. These modules had to be coordinated by individual applications, however, to resolve the temporal queries of protocol planners. In this paper, we present a new module that integrates the previous two modules and that provides for their coordination automatically. The new module can be used as a standalone system for retrieving both primitive and abstracted time-oriented data, or can be embedded in a larger computational framework for protocol-based reasoning.

    View details for Web of Science ID 000171774300061

    View details for PubMedID 9357636

  • Knowledge-based temporal abstraction in clinical domains ARTIFICIAL INTELLIGENCE IN MEDICINE Shahar, Y., Musen, M. A. 1996; 8 (3): 267-298


    We have defined a knowledge-based framework for the creation of abstract, interval-based concepts from time-stamped clinical data, the knowledge-based temporal-abstraction (KBTA) method. The KBTA method decomposes its task into five subtasks; for each subtask we propose a formal solving mechanism. Our framework emphasizes explicit representation of knowledge required for abstraction of time-oriented clinical data, and facilitates its acquisition, maintenance, reuse and sharing. The RESUME system implements the KBTA method. We tested RESUME in several clinical-monitoring domains, including the domain of monitoring patients who have insulin-dependent diabetes. We acquired from a diabetes-therapy expert diabetes-therapy temporal-abstraction knowledge. Two diabetes-therapy experts (including the first one) created temporal abstractions from about 800 points of diabetic-patients' data. RESUME generated about 80% of the abstractions agreed by both experts; about 97% of the generated abstractions were valid. We discuss the advantages and limitations of the current architecture.

    View details for Web of Science ID A1996UW89100005

    View details for PubMedID 8830925

  • Reusable ontologies, knowledge-acquisition tools, and performance systems: PROTEGE-II solutions to Sisyphus-2 INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES Rothenfluh, T. E., Gennari, J. H., Eriksson, H., Puerta, A. R., Tu, S. W., Musen, M. A. 1996; 44 (3-4): 303-332
  • The EON model of intervention protocols and guidelines. Proceedings : a conference of the American Medical Informatics Association / ... AMIA Annual Fall Symposium. AMIA Fall Symposium Tu, S. W., Musen, M. A. 1996: 587-591


    We present a computational model of treatment protocols abstracted from implemented systems that we have developed previously. In our framework, a protocol is modeled as a hierarchical plan where high-level protocol steps are decomposed into descriptions of more specific actions. The clinical algorithms embodied in a protocol are represented by procedures that encode the sequencing, looping, and synchronization of protocol steps. The representation allows concurrent and optional protocol steps. We define the semantics of a procedure in terms of an execution model that specifies how the procedure should be interpreted. We show that the model can be applied to an asthma guideline different from the protocols for which the model was originally constructed.

    View details for PubMedID 8947734

  • Toward reusable software components at the point of care. Proceedings : a conference of the American Medical Informatics Association / ... AMIA Annual Fall Symposium. AMIA Fall Symposium Tuttle, M. S., Sherertz, D. D., Olson, N. E., Nelson, S. J., Erlbaum, M. S., Keck, K. D., Davis, A. N., Suarez-Munist, O. N., Lipow, S. S., Cole, W. G., Fagan, L. M., ACUFF, R. D., Crangle, C. E., Musen, M. A., Tu, S. W., Wiederhold, G. C., Carlson, R. W. 1996: 150-154


    An architecture built from five software components -a Router, Parser, Matcher, Mapper, and Server -fulfills key requirements common to several point-of-care information and knowledge processing tasks. The requirements include problem-list creation, exploiting the contents of the Electronic Medical Record for the patient at hand, knowledge access, and support for semantic visualization and software agents. The components use the National Library of Medicine Unified Medical Language System to create and exploit lexical closure-a state in which terms, text and reference models are represented explicitly and consistently. Preliminary versions of the components are in use in an oncology knowledge server.

    View details for PubMedID 8947646

  • Conceptual and formal specifications of problem-solving methods INTERNATIONAL JOURNAL OF EXPERT SYSTEMS Fensel, D., Eriksson, H., Musen, M. A., Studer, R. 1996; 9 (4): 507-532
  • Making generic guidelines site-specific. Proceedings : a conference of the American Medical Informatics Association / ... AMIA Annual Fall Symposium. AMIA Fall Symposium Fridsma, D. B., Gennari, J. H., Musen, M. A. 1996: 597-601


    Health care providers are more likely to follow a clinical guideline if the guideline's recommendations are consistent with the way in which their organization does its work. Unfortunately, developing guidelines that are specific to an organization is expensive, and limits the ability to share guidelines among different institutions. We describe a methodology that separates the site-independent information of guidelines from site-specific information, and that facilitates the development of site-specific guidelines from generic guidelines. We have used this methodology in a prototype system that assists developers in creating generic guidelines that are sharable across different sites. When combined with site information, generic guidelines can be used to generate site-specific guidelines that are responsive to organizational change and that can be implemented at a level of detail that makes site-specific computer-based workflow management and simulation possible.

    View details for PubMedID 8947736

  • Knowledge acquisition for temporal abstraction. Proceedings : a conference of the American Medical Informatics Association / ... AMIA Annual Fall Symposium. AMIA Fall Symposium Stein, A., Musen, M. A., Shahar, Y. 1996: 204-208


    Temporal abstraction is the task of detecting relevant patterns in data over time. The knowledge-based temporal-abstraction method uses knowledge about a clinical domain's contexts, external events, and parameters to create meaningful interval-based abstractions from raw time-stamped clinical data. In this paper, we describe the acquisition and maintenance of domain-specific temporal-abstraction knowledge. Using the PROTEGE-II framework, we have designed a graphical tool for acquiring temporal knowledge directly from expert physicians, maintaining the knowledge in a sharable form, and converting the knowledge into a suitable format for use by an appropriate problem-solving method. In initial tests, the tool offered significant gains in our ability to rapidly acquire temporal knowledge and to use that knowledge to perform automated temporal reasoning.

    View details for PubMedID 8947657

  • Task modeling with reusable problem-solving methods ARTIFICIAL INTELLIGENCE Eriksson, H., Shahar, Y., Tu, S. W., Puerta, A. R., Musen, M. A. 1995; 79 (2): 293-326
  • Computer-based screening of patients with HIV/AIDS for clinical-trial eligibility. The Online journal of current clinical trials Carlson, R. W., Tu, S. W., Lane, N. M., Lai, T. L., Kemper, C. A., Musen, M. A., Shortliffe, E. H. 1995; Doc No 179: [3347 words, 32 paragraphs]


    To assess the potential effect of a computer-based system on accrual to clinical trials, we have developed methodology to identify retrospectively and prospectively patients who are eligible or potentially eligible for protocols.Retrospective chart abstraction with computer screening of data for potential protocol eligibility.A county-operated clinic serving human immunodeficiency virus (HIV) positive patients with or without acquired immune deficiency syndrome (AIDS).A randomly selected group of 60 patients who were HIV-infected, 30 of whom had an AIDS-defining diagnosis.Using a computer-based eligibility screening system, for each clinic visit and hospitalization, patients were categorized as eligible, potentially eligible, or ineligible for each of the 17 protocols active during the 7-month study period. Reasons for ineligibility were categorized.None of the patients was enrolled on a clinical trial during the 7-month period. Thirteen patients were identified as eligible for protocol; three patients were eligible for two different protocols; and one patient was eligible for the same protocol during two different time intervals. Fifty-four patients were identified as potentially eligible for a total of 165 accrual opportunities, but important information, such as the result of a required laboratory test, was missing, so that eligibility could not be determined unequivocally. Ineligibility for protocol was determined in 414 (35%) potential opportunities based only on conditions that were amenable to modification, such as the use of concurrent medications; 194 (17%) failed only laboratory tests or subjective determinations not routinely performed; and 346 (29%) failed only routine laboratory tests.There are substantial numbers of eligible and potentially eligible patients who are not enrolled or evaluated for enrollment in prospective clinical trials. Computer-based eligibility screening when coupled with a computer-based medical record offers the potential to identify patients eligible or potentially eligible for clinical trial, to assist in the selection of protocol eligibility criteria, and to make accrual estimates.

    View details for PubMedID 7719564



    The developers of reviewing systems that rely on computer-based patient-record systems as a source of data need to model reviewing knowledge and medical knowledge. We simulate how the same medical knowledge could be entered in four different systems: CARE, the Arden syntax, Essential-attending and HyperCritic. We subsequently analyze how the original knowledge is represented in the symbols or syntax used by these systems. We conclude that these systems provide different alternatives in dealing with the vocabulary provided by the computer-based patient records. In addition, the use of computer-based patient records for review poses new challenges for the content of that record: to facilitate review, the reasoning of the physician needs to be captured in addition to the actions of the physician.

    View details for Web of Science ID A1995QT06800016

    View details for PubMedID 9082122

  • A comparison of the temporal expressiveness of three database query methods. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Das, A. K., Musen, M. A. 1995: 331-337


    Time is a multifaceted phenomenon that developers of clinical decision-support systems can model at various levels of complexity. An unresolved issue for the design of clinical databases is whether the underlying data model should support interval semantics. In this paper, we examine whether interval-based operations are required for querying protocol-based conditions. We report on an analysis of a set of 256 eligibility criteria that the T-HELPER system uses to screen patients for enrollment in eight clinical-trial protocols for HIV disease. We consider three data-manipulation methods for temporal querying: the consensus query representation Arden Syntax, the commercial standard query language SQL, and the temporal query language TimeLineSQL (TLSQL). We compare the ability of these three query methods to express the eligibility criteria. Seventy nine percent of the 256 criteria require operations on time stamps. These temporal conditions comprise four distinct patterns, two of which use interval-based data. Our analysis indicates that the Arden Syntax can query the two non-interval patterns, which represent 54% of the temporal conditions. Timepoint comparisons formulated in SQL can instantiate the two non-interval patterns and one interval pattern, which encompass 96% of the temporal conditions. TLSQL, which supports an interval-based model of time, can express all four types of temporal patterns. Our results demonstrate that the T-HELPER system requires simple temporal operations for most protocol-based queries. Of the three approaches tested, TLSQL is the only query method that is sufficiently expressive for the temporal conditions in this system.

    View details for PubMedID 8563296

  • A component-based architecture for automation of protocol-directed therapy Musen, M. A., Tu, S. W., Das, A. K., Shahar, Y. SPRINGER-VERLAG BERLIN. 1995: 3-13
  • CALIPER: individualized-growth curves for the monitoring of children's growth. Medinfo. MEDINFO Kuilboer, M. M., Wilson, D. M., Musen, M. A., Wit, J. M. 1995; 8: 1686-?


    Monitoring children's growth is a fundamental part of pediatric care. Deviation from the expected growth pattern can be an important sign of disease and often results in parental anxiety. Most preprinted growth curves are based on cross-sectional data derived from population-based studies of normal children. Since the age of the pubertal growth spurt varies substantially among the normal curves, these curves don't adequately reflect the expected growth pattern of an individual child. In addition, any preprinted growth curve based on the general population becomes less useful when the maturation of a child and the heights of it's parents differ substantially from the average. Established methods exist to adjust the general reference-growth curves for parental height. However, these methods generally are too time consuming to be used in clinic. Only heuristic methods are known to us to adjust the general-reference curves for maturation. We have developed the decision-support system CALIPER, that enables and standardizes the generation of individualized reference-growth curves. CALIPER consists of a graphical interface for data entry, a progress-report generator, and a module for the interactive, dynamic display of general-reference curves and individualized-reference curves. Preference settings such as ethnic background and gender determine the required population curves and individualization method. Individualization can be based on parental height and/or maturation. Maturation is based on an assessment of a child's bone age and/or pubertal stage. The bone age can be assessed by different methods. We have performed an evaluation of CALIPER's methodology by assessing the effect of individualization on the reference growth curves for 466 normally growing children. The individualized-reference curves reflect the growth pattern of children significantly better than the general-reference curves. CALIPER can be used on a case by case base as an aid in clinic (assessment of children's growth and communication with patient and parents) or as a tool to investigate current clinical questions concerning the relation of bone age, pubertal stage, and growth pattern for any part of the population. Besides providing for decision support by the interactive graphical representation of individualized-reference curves and growth data, CALIPER will be linked to a module that can provide automatic interpretation of the data (Kuilboer et al, SCAMC-93). CALIPER runs on a Macintosh, and requires 600K of memory. A color monitor is preferable, but not required. We will demonstrate several cases that will illustrate the clinical problem and CALIPER's potential.

    View details for PubMedID 8591546

  • PROTEGE-II: computer support for development of intelligent systems from libraries of components. Medinfo. MEDINFO Musen, M. A., Gennari, J. H., Eriksson, H., Tu, S. W., Puerta, A. R. 1995; 8: 766-770


    PROTEGE-II is a suite of tools that facilitates the development of intelligent systems. A tool called MAiTRE allows system builders to create and refine abstract models (ontologies) of application domains. A tool called DASH takes as input a modified domain ontology and generates automatically a knowledge-acquisition tool that application specialists can use to enter the detailed content knowledge required to define particular applications. The domain-dependent knowledge entered into the knowledge-acquisition tool is used by assemblies of domain-independent problem-solving methods that provide the computational strategies required to solve particular application tasks. The result is an architecture that offers a divide-and-conquer approach that separates system-building tasks that require skill in domain analysis and modeling from those that require simple entry of content knowledge. At the same time, applications can be constructed from libraries of component--of both domain ontologies and domain-independent problem-solving methods--allowing the reuse of knowledge and facilitating ongoing system maintenance. We have used PROTEGE-II to construct a number of knowledge-based systems, including the reasoning components of T-Helper, which assists physicians in the protocol-based care of patients who have HIV infection.

    View details for PubMedID 8591322

  • Knowledge-based temporal abstraction in diabetes therapy. Medinfo. MEDINFO Shahar, Y., Das, A. K., Tu, S. W., Kraemer, F. B., Basso, L. V., Musen, M. A. 1995; 8: 852-856


    We suggest a general framework for solving the task of creating abstract, interval-based concepts from time-stamped clinical data. We refer to this problem-solving framework as the knowledge-based temporal-abstraction (KBTA) method. The KBTA method emphasizes explicit representation, acquisition, maintenance, reuse, and the sharing of knowledge required for abstraction of time-oriented clinical data. We describe the subtasks into which the KBTA method decomposes its task, the problem-solving mechanisms that solve these subtasks, and the knowledge necessary for instantiating these mechanisms in a particular clinical domain. We have implemented the KBTA method in the RESUME system and have applied it to the task of monitoring the care of insulin-dependent diabetics.

    View details for PubMedID 8591345

  • Hierarchical neural networks for survival analysis. Medinfo. MEDINFO Ohno-Machado, L., Walker, M. G., Musen, M. A. 1995; 8: 828-832


    Neural networks offer the potential of providing more accurate predictions of survival time than do traditional methods. Their use in medical applications has, however, been limited, especially when some data is censored or the frequency of events is low. To reduce the effect of these problems, we have developed a hierarchical architecture of neural networks that predicts survival in a stepwise manner. Predictions are made for the first time interval, then for the second, and so on. The system produces a survival estimate for patients at each interval, given relevant covariates, and is able to handle continuous and discrete variables, as well as censored data. We compared the hierarchical system of neural networks with a nonhierarchical system for a data set of 428 AIDS patients. The hierarchical model predicted survival more accurately than did the nonhierarchical (although both had low sensitivity). The hierarchical model could also learn the same patterns in less than half the time required by the nonhierarchical model. These results suggest that the use of hierarchical systems is advantageous when censored data is present, the number of events is small, and time-dependent variables are necessary.

    View details for PubMedID 8591339

  • A web-based architecture for a medical vocabulary server. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Gennari, J. H., Oliver, D. E., Pratt, W., Rice, J., Musen, M. A. 1995: 275-279


    For health care providers to share computing resources and medical application programs across different sites, those applications must share a common medical vocabulary. To construct a common vocabulary, researchers must have an architecture that supports collaborative, networked development. In this paper, we present a web-based server architecture for the collaborative development of a medical vocabulary: a system that provides network services in support of medical applications that need a common, controlled medical terminology. The server supports vocabulary browsing and editing and can respond to direct programmatic queries about vocabulary terms. We have tested the programmatic query-response capability of the vocabulary server with a medical application that determines when patients who have HIV infection may be eligible for certain clinical trials. Our emphasis in this paper is not on the content of the vocabulary, but rather on the communication protocol and the tools that enable collaborative improvement of the vocabulary by any network-connected user.

    View details for PubMedID 8563284

  • A comparison of two computer-based prognostic systems for AIDS. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Ohno-Machado, L., Musen, M. A. 1995: 737-741


    We compare the performances of a Cox model and a neural network model that are used as prognostic tools for a cohort of people living with AIDS. We modeled disease progression for patients who had AIDS (according to the 1993 CDC definition) in a cohort of 588 patients in California, using data from the ATHOS project. We divided the study population into 10 training and 10 test sets and evaluated the prognostic accuracy of a Cox proportional hazards model and of a neural network model by determining the number of predicted deaths, the sensitivities, specificities, positive predictive values, and negative predictive values for intervals of one year following the diagnosis of AIDS. For the Cox model, we further tested the agreement between a series of binary observations, representing death in one, two, and three years, and a set of estimates which define the probability of survival for those intervals. Both models were able to provide accurate numbers on how many patients were likely to die at each interval, and reasonable individualized estimates for the two- and three-year survival of a given patient, but failed to provide reliable predictions for the first year after diagnosis. There was no evidence that the Cox model performed better than did the neural network model or vice-versa, but the former method had the advantage of providing some insight on which variables were most influential for prognosis. Nevertheless, it is likely that the assumptions required by the Cox model may not be satisfied in all data sets, justifying the use of neural networks in certain cases.

    View details for PubMedID 8563387

  • A rational reconstruction of INTERNIST-I using PROTEGE-II. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Musen, M. A., Gennari, J. H., Wong, W. W. 1995: 289-293


    PROTEGE-II is a methodology and a suite of tools that allow developers to build and maintain knowledge-based systems in a principled manner. We used PROTEGE-II to reconstruct the well-known INTERNIST-I system, demonstrating the role of a domain ontology (a framework for specification of a model of an application area), a reusable problem-solving method, and declarative mapping relations in creating a new, working program. PROTEGE-II generates automatically a domain-specific knowledge-acquisition tool, which, in the case of the INTERNIST-I reconstruction, has much of the functionality of the QMR-KAT knowledge-acquisition tool. This study provides a means to understand better both the PROTEGE-II methodology and the models that underlie INTERNIST-I.

    View details for PubMedID 8563287

  • The development of a controlled medical terminology: identification, collaboration, and customization. Medinfo. MEDINFO Miller, E. T., WIECKERT, K. E., Fagan, L. M., Musen, M. A. 1995; 8: 148-152


    An increasing focus in health care is the development and use of electronic medical record systems to capture and store patient information. T-HELPER is an electronic medical record system that health care providers use to record ambulatory-care patient progress notes. These data are stored in an on-line database and analyzed by T-HELPER to provide users with decision support regarding patient eligibility for clinical trial protocols and assistance with continued protocol-based care. Our goal is to provide a system that enhances the process of identifying patients who are potentially eligible for clinical trials of experimental therapies in a clinic that is limited by the existence of a singular clinical trial coordinator. Effective implementation of such a system requires the development of a meaningful controlled medical terminology that satisfies the needs of a diverse community of providers all of who contribute to the health care process. The development of a controlled medical terminology is a process of identification, collaboration, and customization. We enlisted the help of collaborators familiar with the proposed work environment to identify user needs, to collaborate with our development team to construct the preliminary terminology, and to customize the controlled medical terminology to make it meaningful and acceptable to the clinic users.

    View details for PubMedID 8591141



    Chronus is a query system that supports temporal extensions to the Structured Query Language (SQL) for relational databases. Although the relational data model can store time-stamped data and can permit simple temporal-comparison operations, it does not provide either a closed or a sufficient algebra for manipulating temporal data. In this paper, we outline an algebra that maintains a consistent relational representation of temporal data and that allows the type of temporal queries needed for protocol-directed decision support. We also discuss how Chronus can translate between our temporal algebra and the relational algebra used for SQL queries. We have applied our system to the task of screening patients for clinical trials. Our results demonstrate that Chronous can express sufficiently all required temporal queries, and that the search time of such queries is similar to that of standard SQL.

    View details for Web of Science ID A1994PN89900007

    View details for PubMedID 7799812



    A general framework for representation of clinical data that provides a declarative semantics of terms and that allows developers to define explicitly the relationships among both terms and combinations of terms.Use of conceptual graphs as a standard representation of logic and of an existing standardized vocabulary, the Systematized Nomenclature of Medicine (SNOMED International), for lexical elements. Concepts such as time, anatomy, and uncertainty must be modeled explicitly in a way that allows relation of these foundational concepts to surface-level clinical descriptions in a uniform manner.The proposed framework was used to model a simple radiology report, which included temporal references.Formal logic provides a framework for formalizing the representation of medical concepts. Actual implementations will be required to evaluate the practicality of this approach.

    View details for Web of Science ID A1994QE63900002

    View details for PubMedID 7719805


    View details for Web of Science ID A1994QF21600265

    View details for PubMedID 7949900

  • A computer-based approach to quality improvement for telephone triage in a community AIDS clinic. Nursing administration quarterly Henry, S. B., Borchelt, D., SCHREINER, J. G., Musen, M. A. 1994; 18 (2): 65-73


    Observation of the current procedure for telephone triage in a community-based acquired immunodeficiency syndrome (AIDS) clinic and a retrospective chart audit identified opportunities for improvement in the process for the management of telephone triage encounters. Specifically, it pointed out that the nurses faced difficulties in accessing relevant clinical data, and that a large number of data were missing in the documentation for the encounter. Five design goals for a computer-based system to improve the management of the telephone triage encounter were generated by an interdisciplinary project team. A computer-based approach to management of the telephone triage encounter complemented by the development of performance standards and guidelines has the potential to improve both the process of telephone triage and the documentation of the triage encounter.

    View details for PubMedID 8159333

  • PATIENT-CARE APPLICATIONS ON INTERNET Barnett, O., SHORTLIFFE, E., Chueh, H., PIGGINS, J., Greenes, R., Cimino, J., Musen, M., Clayton, P., Humphreys, B., KINGSLAND, L. BMJ PUBLISHING GROUP. 1994: 1060-1060

    View details for Web of Science ID A1994QF21600260

    View details for PubMedID 7949895



    We have developed a general method that solves the task of creating abstract, interval-based concepts from time-stamped clinical data. We refer to this method as knowledge-based temporal-abstraction (KBTA). In this paper, we focus on the knowledge representation, acquisition, maintenance, reuse and sharing aspects of the KBTA method. We describe five problem-solving mechanisms that solve the five subtasks into which the KBTA method decomposes its task, and four types of knowledge necessary for instantiating these mechanisms in a particular domain. We present an example of instantiating the KBTA method in the clinical area of monitoring insulin-dependent-diabetes patients.

    View details for Web of Science ID A1994QF21600123

    View details for PubMedID 7950015



    The inability of many clinical decision-support applications to integrate with existing databases limits the wide-scale deployment of such systems. To overcome this obstacle, we have designed a data-interpretation module that can be embedded in a general architecture for protocol-based reasoning and that can support the fundamental task of detecting temporal abstractions. We have developed this software module by coupling two existing systems--RESUME and Chronus--that provide complementary temporal-abstraction techniques at the application and the database levels, respectively. Their encapsulation into a single module thus can resolve the temporal queries of protocol planners with the domain-specific knowledge needed for the temporal-abstraction task and with primary time-stamped data stored in autonomous clinical databases. We show that other computer methods for the detection of temporal abstractions do not scale up to the data- and knowledge-intensive environments of protocol-based decision-support systems.

    View details for Web of Science ID A1994QF21600058

    View details for PubMedID 7949943



    The task of determining patients' eligibility for clinical trials is knowledge and data intensive. In this paper, we present a model for the task of eligibility determination, and describe how a computer system can assist clinical researchers in performing that task. Qualitative and probabilistic approaches to computing and summarizing the eligibility status of potentially eligible patients are described. The two approaches are compared, and a synthesis that draws on the strengths of each approach is proposed. The result of applying these techniques to a database of HIV-positive patient cases suggests that computer programs such as the one described can increase the accrual rate of eligible patients into clinical trials. These methods may also be applied to the task of determining from electronic patient records whether practice guidelines apply in particular clinical situations.

    View details for Web of Science ID A1993LQ71400012

    View details for PubMedID 8412828



    RESUME is a system that performs temporal abstraction of time-stamped data. The temporal-abstraction task is crucial for planning treatment, for executing treatment plans, for identifying clinical problems, and for revising treatment plans. The RESUME system is based on a model of three basic temporal-abstraction mechanisms: point temporal abstraction, a mechanism for abstracting the values of several parameters into a value of another parameter; temporal inference, a mechanism for inferring sound logical conclusions over a single interval or two meeting intervals; and temporal interpolation, a mechanism for bridging nonmeeting temporal intervals. Making explicit the knowledge required for temporal abstraction supports the acquisition and the sharing of that knowledge. We have implemented the RESUME system using the CLIPS knowledge-representation shell. The RESUME system emphasizes the need for explicit representation of temporal-abstraction knowledge, and the advantages of modular, task-specific but domain-independent architectures for building medical knowledge-based systems.

    View details for Web of Science ID A1993LG73100006

    View details for PubMedID 8325005

  • METATOOLS FOR KNOWLEDGE ACQUISITION IEEE SOFTWARE Eriksson, H., Musen, M. 1993; 10 (3): 23-29


    We recently have shown that a computer system, known as HyperCritic, can successfully audit general practitioners' treatment of hypertension by analyzing computer-based patient records. HyperCritic reviews the electronic medical records and offers unsolicited advice. To determine which unsolicited advice might be perceived as inappropriate, builders of programs such as HyperCritic need insight into providers' responses to computer-generated critique of their patient care. Twenty medical charts, describing in total 243 visits of patients with hypertension, were audited by 8 human reviewers and by the critiquing-system HyperCritic. A panel of 14 general practitioners subsequently judged the relevance of those critiques on a five-point scale ranging from relevant critique to erroneous or harmful critique. The panel judged reviewers' comments to be either relevant or somewhat relevant in 61 to 68% of cases, and either erroneous or possibly erroneous in 15 to 18%; the panel judged HyperCritic's comments to be either relevant or somewhat relevant in 65% of cases, and either erroneous or possibly erroneous in 16%. Comparison of individual members of the panel showed large differences; for example, the portion of HyperCritic's comments judged relevant ranged from 0 to 82%. We conclude that, from the perspective of general practitioners, critiques generated by the critiquing system HyperCritic are perceived equally beneficial as critiques generated by human reviewers. Different general practitioners, however, judge the critiques differently. Before auditing systems based on computer-based patient records that are acceptable to practitioners can be introduced, additional studies are needed to evaluate the reasons a physician may have for judging critiques to be irrelevant, and to evaluate the effect of critiques on physician behavior.

    View details for Web of Science ID A1993LA06800009

    View details for PubMedID 8321133

  • A computer-based tool for generation of progress notes. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Campbell, K. E., WIECKERT, K., Fagan, L. M., Musen, M. A. 1993: 284-288


    IVORY, a computer-based tool that uses clinical findings as the basic unit for composing progress notes, generates progress notes more efficiently than does a character-based word processor. IVORY's clinical findings are contained within a structured vocabulary that we developed to support generation of both prose progress notes and SNOMED III codes. Observational studies of physician participation in the development of IVORY's structured vocabulary have helped us to identify areas where changes are required before IVORY will be acceptable for routine clinical use.

    View details for PubMedID 8130479

  • T-HELPER - AUTOMATED SUPPORT FOR COMMUNITY-BASED CLINICAL RESEARCH Musen, M. A., Carlson, R. W., Fagan, L. M., Deresinski, S. C., Shortliffe, E. H. MCGRAW-HILL BOOK CO. 1993: 719-723
  • Knowledge reuse: temporal-abstraction mechanisms for the assessment of children's growth. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Kuilboer, M. M., Shahar, Y., Wilson, D. M., Musen, M. A. 1993: 449-453


    Currently, many workers in the field of medical informatics realize the importance of knowledge reuse. The PROTEGE-II project seeks to develop and implement a domain-independent framework that allows system builders to create custom-tailored role-limiting methods from generic reusable components. These new role-limiting methods are used to create domain- and task-specific knowledge-acquisition tools with which an application expert can generate domain- and task-specific decision-support systems. One required set of reusable components embodies the problem-solving knowledge to generate temporal abstractions. Previously, members of the PROTEGE-II project have used these temporal-abstraction mechanisms to infer the presence of myelotoxicity in patients with AIDS. In this paper, we show that these mechanisms are reusable in the domain of assessment of children's growth.

    View details for PubMedID 8130514

  • Automated modeling of medical decisions. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care EGAR, J. W., Musen, M. A. 1993: 424-428


    We have developed a graph grammar and a graph-grammar derivation system that, together, generate decision-theoretic models from unordered lists of medical terms. The medical terms represent considerations in a dilemma that confronts the patient and the health-care provider. Our current grammar ensures that several desirable structural properties are maintained in all derived decision models.

    View details for PubMedID 8130509

  • AIDS2: a decision-support tool for decreasing physicians' uncertainty regarding patient eligibility for HIV treatment protocols. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Ohno-Machado, L., Parra, E., Henry, S. B., Tu, S. W., Musen, M. A. 1993: 429-433


    We have developed a decision-support tool, the AIDS Intervention Decision-Support System (AIDS2), to assist in the task of matching patients to therapy-related research protocols. The purposes of AIDS2 are to determine the initial eligibility status of HIV-infected patients for therapy-related research protocols, and to suggest additional data-gathering activities that will decrease uncertainty related to the eligibility status. AIDS2 operates in either a patient-driven or protocol-driven mode. We represent the system knowledge in three combined levels: a classification level, where deterministic knowledge is represented; a belief-network level, where probabilistic knowledge is represented; and a control level, where knowledge about the system's operation is stored. To determine whether the design specifications were met, we presented a series of 10 clinical cases based on actual patients to the system. AIDS2 provided meaningful advice in all cases.

    View details for PubMedID 8130510

  • An extended SQL for temporal data management in clinical decision-support systems. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Das, A. K., Tu, S. W., Purcell, G. P., Musen, M. A. 1992: 128-132


    We are developing a database implementation to support temporal data management for the T-HELPER physician workstation, an advice system for protocol-based care of patients who have HIV disease. To understand the requirements for the temporal database, we have analyzed the types of temporal predicates found in clinical-trial protocols. We extend the standard relational data model in three ways to support these querying requirements. First, we incorporate timestamps into the two-dimensional relational table to store the temporal dimension of both instant- and interval-based data. Second, we develop a set of operations on timepoints and intervals to manipulate timestamped data. Third, we modify the relational query language SQL so that its underlying algebra supports the specified operations on timestamps in relational tables. We show that our temporal extension to SQL meets the temporal data-management needs of protocol-directed decision support.

    View details for PubMedID 1482853

  • Graph-grammar productions for the modeling of medical dilemmas. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care EGAR, J. W., Musen, M. A. 1992: 349-353


    We introduce graph-grammar production rules, which can guide physicians to construct models for normative decision making. A physician describes a medical decision problem using standard terminology, and the graph-grammar system matches a graph-manipulation rule to each of the standard terms. With minimal help from the physician, these graph-manipulation rules can construct an appropriate Bayesian probabilistic network. The physician can then assess the necessary probabilities and utilities to arrive at a rational decision. The grammar relies on prototypical forms that we have observed in models of medical dilemmas. We have found graph grammars to be a concise and expressive formalism for describing prototypical forms, and we believe such grammars can greatly facilitate the modeling of medical dilemmas and medical plans.

    View details for PubMedID 1482895

  • Representation of clinical data using SNOMED III and conceptual graphs. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Campbell, K. E., Musen, M. A. 1992: 354-358


    None of the coding schemes currently contained within the Unified Medical Language System (UMLS) is sufficiently expressive to represent medical progress notes adequately. Some coding schemes suffer from domain incompleteness, others suffer from the inability to represent modifiers and time references, and some suffer from both problems. The recently released version of the Systematized Nomenclature of Medicine (SNOMED III) is a potential solution to the data-representation problem because it is relatively domain complete, and because it uses a generative coding scheme that will allow the construction of codes that contain modifiers and time references. SNOMED III does have an important weakness, however. SNOMED III lacks a formalized system for using its codes; thus, it fails to ensure consistency in its use across different institutions. Application of conceptual-graph formalisms to SNOMED III can ensure such consistency of use. Conceptual-graph formalisms will also allow mapping of the resulting SNOMED III codes onto relational data models and onto other formal systems, such as first-order predicate calculus.

    View details for PubMedID 1482897

  • A temporal-abstraction system for patient monitoring. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Shahar, Y., Musen, M. A. 1992: 121-127


    RESUME is a system that performs temporal abstraction of time-stamped data. RESUME is based on a model of three temporal-abstraction mechanisms: point temporal abstraction (a mechanism for abstracting values of several parameters into a value of another parameter); temporal inference (a mechanism for inferring sound logical conclusions over a single interval or two meeting intervals); and temporal interpolation (a mechanism for bridging nonmeeting temporal intervals). Making explicit the knowledge required for temporal abstraction supports the acquisition of that knowledge.

    View details for PubMedID 1482852

  • A needs analysis for computer-based telephone triage in a community AIDS clinic. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Henry, S. B., SCHREINER, J. G., Borchelt, D., Musen, M. A. 1992: 59-63


    This study describes the complexity of the telephone-triage task in a community-based AIDS clinic. We identify deficiencies related to the data management for and documentation of the telephone-triage encounter, including inaccessibility of the medical record and failure to document required data elements. Our needs analysis suggests five design criteria for a computer-based system that assists nurses with the telephone-triage task: (1) online accessibility of the medical record, (2) ability to move among modules of the medical record and the triage-encounter module, (3) ease of data entry, (4) compliance with standards for documentation, and (5) notification of the primary-care physician in an appropriate and timely manner.

    View details for PubMedID 1482941

  • T-HELPER: automated support for community-based clinical research. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Musen, M. A., Carlson, R. W., Fagan, L. M., Deresinski, S. C., Shortliffe, E. H. 1992: 719-723


    There are increasing expectations that community-based physicians who care for people with HIV infection will offer their patients opportunities to enroll in clinical trials. The information-management requirements of clinical investigation, however, make it unrealistic for most providers who do not practice in academic centers to participate in clinical research. Our T-HELPER computer system offers community-based physicians the possibility of enrolling patients in clinical trials as a component of primary care. T-HELPER facilitates data management for patients with HIV disease, and can offer patient-specific and situation-specific advice concerning new protocols for which patients may be eligible and the treatment required by those protocols in which patients currently are enrolled. We are installing T-HELPER at three county-operated AIDS clinics in the San Francisco Bay Area, and plan a comprehensive evaluation of the system and its influence on clinical research.

    View details for PubMedID 1482965



    Computer programs that automatically review decisions can help physicians provide better patient care. In the Netherlands, the ELIAS computer information system has replaced paper medical records in some general practices. We have written a computer program called 'HyperCritic' that audits general practitioners' management of patients with essential hypertension by taking patient-specific data from the ELIAS system. We investigated whether the computer-based medical records contain sufficient information to generate critiques, and compared the limitations of audit by hypercritic with those of review by a panel of eight physicians. Hypercritic and the physicians independently reviewed the medical records of 20 randomly selected patients with hypertension and commented on the decisions made at each of 243 patient visits. Of 468 comments on patient management, 260 were judged correct by six or more of the physicians; hypercritic also made 118 of these 260 comments. The main reasons why the program did not produce the other 142 comments were: insufficient data in the computer-based medical record; absence of sufficient medical consensus; and omissions in the database of hypercritic. Calculation of an "index of merit" ([sensitivity + specificity] - 1) for individual reviewers showed that hypercritic performed better (index of merit 0.62) in its limited domain than did physician reviewers (0.3-0.56). At least in hypertension management, automated review of computer-based medical records compares favourably with review by physicians. Further development of computer-aided clinical audit requires the introduction of computer-based medical records that capture the reasoning of physicians, and of widely accepted practice guidelines.

    View details for Web of Science ID A1991GV07700015

    View details for PubMedID 1683929



    We describe the design of a critiquing system, HyperCritic, that relies on automated medical records for its data input. The purpose of the system is to advise general practitioners who are treating patients who have hypertension. HyperCritic has access to the data stored in a primary-care information system that supports a fully automated medical record. Hyper-Critic relies on data in the automated medical record to critique the management of hypertensive patients, avoiding a consultation-style interaction with the user. The first step in the critiquing process involves the interpretation of the medical record in an attempt to discover the physician's actions and decisions. After detecting the relevant events in the medical record, HyperCritic views the task of critiquing as the assignment of critiquing statements to these patient-specific events. Critiquing statements are defined as recommendations involving one or more suggestions for possible modifications in the actions of the physician. The core of the model underlying HyperCritic is that the process of generating the critiquing statements is viewed as the application of a limited set of abstract critiquing tasks. We distinguish four categories of critiquing tasks: preparation tasks, selection tasks, monitoring tasks, and responding tasks. The execution of these critiquing tasks requires specific medical factual knowledge. This factual knowledge is separated from the critiquing tasks and is stored in a medical fact base. The principal advantage demonstrated by HyperCritic is the adaption of a domain-independent critiquing structure. We show how this domain-independent critiquing structure can be used to facilitate knowledge acquisition and maintenance of the system.

    View details for Web of Science ID A1991FX17100004

    View details for PubMedID 1889202

  • Temporal-abstraction mechanisms in management of clinical protocols. Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care Shahar, Y., Tu, S. W., Musen, M. A. 1991: 629-633


    We have identified several general temporal-abstraction mechanisms needed for reasoning about time-stamped data, such as are needed in management of patients being treated on clinical protocols: simple temporal abstraction (a mechanism for abstracting several parameter values into one class), temporal inference (a mechanism for inferring sound logical conclusions over a single interval or two meeting intervals), and temporal interpolation (a mechanism for bridging non-meeting temporal intervals). Making explicit the knowledge required for temporal abstractions supports the acquisition of planning knowledge, the identification of clinical problems, and the formulation of clinical-management-plan revisions.

    View details for PubMedID 1807678

  • EPISODIC SKELETAL-PLAN REFINEMENT BASED ON TEMPORAL DATA COMMUNICATIONS OF THE ACM Tu, S. W., Kahn, M. G., Musen, M. A., Ferguson, J. C., Shortliffe, E. H., Fagan, L. M. 1989; 32 (12): 1439-1455


    Developers of computer-based decision-support tools frequently adopt either pattern recognition or artificial intelligence techniques as the basis for their programs. Because these developers often choose to accentuate the differences between these alternative approaches, the more fundamental similarities are frequently overlooked. The principal challenge in the creation of any clinical consultation program - regardless of the methodology that is used - lies in creating a computational model of the application domain. The difficulty in generating such a model manifests itself in symptoms that workers in the expert systems community have labeled "the knowledge-acquisition bottleneck" and "the problem of brittleness". This paper explores these two symptoms and shows how the development of consultation programs based on pattern-recognition techniques is subject to analogous difficulties. The expert systems and pattern recognition communities must recognize that they face similar challenges, and must unite to develop methods that assist with the process of building of models of complex application tasks.

    View details for Web of Science ID A1989T523500006

    View details for PubMedID 2649771


    View details for Web of Science ID A1987J707100003

    View details for PubMedID 3670103



    ONCOCIN is an expert system that provides advice to physicians who are treating cancer patients enrolled in clinical trials. The process of encoding oncology protocol knowledge for the system has revealed serious omissions and unintentional ambiguities in the protocol documents. We have also discovered that many protocols allow for significant latitude in treating patients and that even when protocol guidelines are explicit, physicians often choose to apply their own judgment on the assumption that the specifications are incomplete. Computer-based tools offer the possibility of insuring completeness and reproducibility in the definition of new protocols. One goal of our automated protocol authoring environment, called OPAL, is to help physicians develop protocols that are free of ambiguity and thus to assure better compliance and standardization of care.

    View details for Web of Science ID A1987J346100007

    View details for PubMedID 3620734


Footer Links:

Stanford Medicine Resources: