Structure of String Principal Element
1. String Theory? (Lexical Records)
|
String |
Individual or deliberately clustered words or phrases, including numbers, letters, etc. |
<String type="textual">
<Entry class="word" language="French">
<Name>écorché</Name>
</Entry>
<Description>
<Notation type="annotation">
<Type substitute="Singular" set="Note Type">Definition</Type>
<Value>A painting or sculpture of a human or animal depicted
without skin in order to expose the muscles for anatomical study.</Value>
</Notation>
</Description>
</String>
String, despite somewhat of a chicken and egg dilemma vis à vis Concept, offers similar benefits with more or less control adjustable.String values are mostly individual words–instantiations of the various values of the Principal Element Language, but also of Concept, e.g. "Words" or "Palindromes".While linguistic variants may be "lumped" on a Lexical Record, the intent is to link records for words in different languages. Inter-language relationships have the potential for use in aiding translation and in broadening searches selectively to include variants in other languages.
String has an optional 'type' attribute with values:textual, numeric, or mixed.Its Entry has four optional attributes:'class' with values word or phrase, 'language' with values from Language authorities and dependent 'transliteration', and 'grammar' with a freetext value.They also apply to Variant. The 'grammar' attribute permits designation of parts of speech other than the default assumption of noun, or irregular plural with default assumption of singular/regular plurals, not explicitly recorded.These are working definitions.Further categorization relies on Varia and Relationships, e.g. a Variant with <Type set="Lexical Type">Misspelling</Type>. See Type under Generic Elements for details.As with most details of XOBIS, these decisions are tentative.
Lexical Records provide authority control to individual words and phrases, including letters, numbers, their symbols, etc. and may be thought of as terms, keywords, freetext, textwords, lexical entries, etc.Clusters of closely related values result from adding equivalent Varia, cf. above under Generic Elements, with thesaural support coming from recorded Relationships, cf. the Relationships section below. Two sample word strings hint at the possibilities:
| czar | czarina | ||
| Variant: | tsar | Variant: | tsarina |
| Variant: | tzar | Variant: | tzarina |
| Related: | czarevich | Related: | czarevna |
| Related: | czarina | Related: | czar |
| Related: | king | Related: | queen |
| Broader: | sovereign | Broader: | sovereign |
The examples below indicate the broad range and potential of the String element. The values given for Relationship and Type are for illustration. Consult Varia, Relationships, and Type sections for how these cases would be marked up.The Language section poses the issue of their serving in relationships. There are many pragmatic challenges in constituting a Lexical Record, choosing the Entry value, determining relationships, etc.Historical spelling reforms (e.g. Portuguese/Brazilian) present additional challenges. Consulting a dictionary reveals even more possibilities.Batch loads and sharing database development would improve the feasibility of building such a resource.
| String | Variant | Type |
| airborne | airbourne | Spelling variant |
| airborne | air-borne | Variant (hyphenation) |
| Altertum | Alterthum | Archaic/obsolete(German) |
| ancient | antient | Archaic/obsolete |
| antenna | antennae | Variant (irregular plural) |
| borborygmus | borborygmi | Variant (irregular plural) |
| deoxyribonucleic acid | DNA | Acronym/initialism |
| dog | dawg | Slang |
| fever | febrile | Variant (adjective) |
| health care | healthcare | Variant (word elision) |
| hiccup | hiccough | Spelling variant |
| misspelling | mispelling | Misspelling |
| preoperative | preop | Informal usage |
| radar | radio detection and ranging | Expansion |
| supersede | supercede | Misspelling |
| two | 2 | Symbol |
| university | univeristy | Typographical error |
| String | String | Relationship |
| apple tree | jabloň | Czech ? |
| car | automobile | Synonym |
| car | vehicle | Broader |
| mountebank | quack | Related |
| Electra complex | Oedipus complex | Related |
| tea | thé | French ? |
| truck | lorry | English(British Usage) ? |
String is potentially concurrent with Concept.Their relative merits suggest the need for comparative study. They may work well reciprocally (one or the other in particular situations) or redundantly (both) separately. Lexical Records could link a word not established as a topical subject to related concepts, and/or vice versa. Integration of String with a keyword index is an implementational issue.The Indexing section below treats combining entries from different Principal Elements in the same index.By including relationships to Conceptual Records, a keyword search could broaden or narrow access to formally controlled vocabulary terms. For example,
| String | Relationship | Concept |
| stevia | Related: | Sweetening Agents |
| Herbs | ||
| Plant Extracts |
The String element accommodates emerging concepts well, and thus may be viewed as a transitional home or spawning ground for new concepts, perhaps based on frequency of occurrence or use in queries.
Otherwise, String is limited to entries not in the scope of other Principal Elements. Inclusion of strings with their associated Principal Element eliminates redundancy and promotes homogeneity. These often represent a Variant of a proper noun, but may just be abbreviations, codes, or variants representing concepts covered by another Principal Element, e.g.:
| Principal Element | Excluded String | Element/Variant's Type Value |
| Place | D.C. | Abbrev |
| Language | fre | Code |
| Being | Henrietta | Forename |
| Work (authority) | MeSH | Code |
| Organization | NATO | Acronym/initialism |
| Work (authority) | XML | Acronym/initialism |
| Time | Y2K | Slang ? |
The Qualifiers element discussed under Generic Elements is available to String for clarification or disambiguation.How such distinctions are handled in indexing would be an implementational issue.Automated or semi-automated functions of an indexer and/or editor program could simplify this process, if posting to the specific entries is undertaken. Alternatively, Lexical Records could act as a filter before searches are passed to a traditional keyword index. It might also be useful to suppress display of minor variants to avoid index clutter unless the variant matches the search query.
| String (Qualifier) | Qualifier is: |
| base (Chemistry) | Concept |
| base (Military Art and Science) | Concept |
| invalid (infirm person) | String |
| invalid (not valid) | String |
Sometimes a typographical error in one word results in the correct spelling of another (casual/causal). The Relationships section provides more information on Relationships that are dissociative, and Varia, under Generic Elements, covers equivalence relationships.
Lexical Records may only exist to provide definitions for selected, "uncontrolled" terminology. See Description under Generic Elements.An example of this appears at the beginning of this section.Adding a relationship to the String from a Work that does not include the text enriches access.Search access to the definition would aid both users and catalogers in identifying a search term that might not be known or easily recalled. Lane Medical Library's current public authority file demonstrates this functionality (48).
Fictional words can be thought of as instantiations of the Fictional Words collective Concept. They have a categorical relationship to the concept.While belonging to this category, they also represent relationships to a Language. This distinction prevents their being confused as "real" words. Whether qualifiers for fictional, fictitious or imaginary elements are justified routinely may depend on how adequately relationships serve in this regard.We have chosen the value Fictional to present such relationships.
| String | Reationship | Language |
| mimsy | Fictional: | English |
| mümsige | Fictional: | German |
| enmîmés | Fictional: | French |
| xivilization | Fictional: | English |
The possibility of using languages to present lexical relationships presents another opportunity to investigate XOBIS' broad potential, cf. the discussion under Entry Substitutes in the Generic Elements section above.
| sweet sorrow | Czech: | krasosmutněnÍ |
| heart | Spanish: | corazon |
| heart | German: | Herz |
String was envisioned to support enhanced keyword retrieval due to the prevalence and popularity of this type of searching on the Web.This formalized infrastructure could support automatic and/or interactive inclusion of synonyms and variants to expand or hone keyword searches to help prevent errors of omission. The resulting improvement in recall complements XML's inherent support for limiting searches to prescribed elements, improving precision.Lexical Records would underpin automatic bursting of designated search strings, offering choices of known variants to users, and providing cross-referencing in a browsable keyword index.Inclusion of definitions, etc. could incorporate dictionary features into indexes, parallel to including scope notes for Conceptual Records. The structure also could provide support for search formulation and enhancement prior to transmitting a request to other systems, including non-XOBIS databases.As part of an integrated interface, this could help harness problems in searching across heterogeneous information environments.
String provides a mechanism for integration of important lexical aspects of searching with mainstream bibliographic control.The specificity and relationships are tantalizing.However, the concept of lexical authority records needs more investigation, and we have not had time to fully explore the many possibilities. Issues such as directionality and how this might work in conjunction with algorithmic transformations, or other techniques is not known.Standalone products, such as the influential WordNet, HyperDic, LexicalFreenet, Wordsmyth, and the NLM's pioneering Specialist Lexicon, illustrate the real potential (49-53). The idea is also manifest in Lane's textword authorities that have accumulated almost 1,000 records mostly in the health sciences over several years with this purpose in mind (48). The next step is to integrate these with keyword indexing.