Professor Emeritus, Biology
In 1962, I proposed a model for integration of lambda prophage into the bacterial chromosome. The model postulated two steps (i) circularization of the linear DNA molecule that had been injected into the cell from the phage particle; (ii) reciprocal recombination between phage and bacterial DNA at specific sites on both partners. This resulted in a cyclic permutation of gene order going from phage to prophage. This contrasted with integration models current at the time, which postulated that the prophage was not inserted into the continuity of the chromosome but rather laterally attached or synapsed with it. This chapter summarizes some of the steps leading up to the model including especially the genetic characterization of specialized transducing phages (lambdagal) by recombinational rescue of conditionally lethal mutations. The serendipitous discovery of the conditional lethals is also described.
View details for DOI 10.1146/annurev.genet.41.110306.130240
View details for Web of Science ID 000252359500001
View details for PubMedID 17474874
Bacterial, archaeal, yeast, and fly genomes are compared with respect to predicted highly expressed (PHX) genes and several genomic properties. There is a striking difference in the status of PHX ribosomal protein (RP) genes where the archaeal genome generally encodes more RP genes and fewer PHX RPs compared with bacterial genomes. The increase in RPs in archaea and eukaryotes compared with that in bacteria may reflect a more complex set of interactions in archaea and eukaryotes in regulating translation, e.g., differences in structure requiring scaffolding of longer rRNA molecules, expanded interactions with the chaperone machinery, and, in eukaryotic interactions with endoplasmic reticulum components. The yeast genome is similar to fast-growing bacteria in PHX genes but also features several cytoskeletal genes, including actin and tropomyosin, and several signal transduction regulatory proteins from the 14.3.3 family. The most PHX genes of Drosophila encode cytoskeletal and exoskeletal proteins. We found that the preference of a microorganism for an anaerobic metabolism correlates with the number of PHX enzymes of the glycolysis pathway that well exceeds the number of PHX enzymes acting in the tricarboxylic acid cycle. Conversely, if the number of PHX enzymes of the tricarboxylic acid cycle well exceeds the PHX enzymes of glycolysis, an aerobic metabolism is preferred. Where the numbers are approximately commensurate, a facultative growth behavior prevails.
View details for DOI 10.1073/pnas.0502314102
View details for Web of Science ID 000229292200046
View details for PubMedID 15883367
Predicted highly expressed (PHX) genes in five currently available high G+C complete alpha-proteobacterial genomes are analyzed. These include: the nitrogen-fixing plant symbionts Sinorhizobium meliloti (SINME) and Mesorhizobium loti (MESLO), the nonpathogenic aquatic bacterium Caulobacter crescentus (CAUCR), the plant pathogen Agrobacterium tumefaciens (AGRTU), and the mammalian pathogen Brucella melitensis (BRUME). Three of these genomes, SINME, AGRTU, and BRUME, contain multiple chromosomes or megaplasmids (>1 Mb length). PHX genes in these genomes are concentrated mainly in the major (largest) chromosome with few PHX genes found in the secondary chromosomes and megaplasmids. Tricarboxylic acid cycle and aerobic respiration genes are strongly PHX in all five genomes, whereas anaerobic pathways of glycolysis and fermentation are mostly not PHX. Only in MESLO (but not SINME) and BRUME are most glycolysis genes PHX. Many flagellar genes are PHX in MESLO and CAUCR, but mostly are not PHX in SINME and AGRTU. The nonmotile BRUME also carries many flagellar genes but these are generally not PHX and all but one are located in the second chromosome. CAUCR stands out among available prokaryotic genomes with 25 PHX TonB-dependent receptors. These are putatively involved in uptake of iron ions and other nonsoluble compounds.
View details for DOI 10.1073/pnas.1232298100
View details for Web of Science ID 000183493500075
View details for PubMedID 12775761
After an illustrious history as one of the primary tools that established the foundations of molecular biology, bacteriophage research is now undergoing a renaissance in which the primary focus is on the phages themselves rather than the molecular mechanisms that they explain. Studies of the evolution of phages and their role in natural ecosystems are flourishing. Practical questions, such as how to use phages to combat human diseases that are caused by bacteria, how to eradicate phage pests in the food industry and what role they have in the causation of human diseases, are receiving increased attention. Phages are also useful in the deeper exploration of basic molecular and biophysical questions.
View details for Web of Science ID 000183202600016
View details for PubMedID 12776216
Insertion of viral DNA into host chromosomes is an ancient process essential for propagation in the proviral form. Many present-day bacteriophages insert at specific sites on the host chromosome. Insertion by two coliphage families (lambdoid and P4-like) is compared. For both families, insertion sites frequently lie within tRNA genes. The lambdoid phages insert at anticodon loops, whereas the p4-like phages insert in the TpsiC loops downstream from them. The association of both groups with tRNA genes suggests that the primordial insertion site of both groups may have been within a tRNA gene. The integrase proteins used in phage insertion may have originated at that stage, with subsequent diversification of specificity.
View details for DOI 10.1016/S0923-2508(03)00071-8
View details for Web of Science ID 000184207600009
View details for PubMedID 12798232
The lambda-related (lambdoid) coliphages are related to one another by frequent natural recombination and maintain a high level of functional polymorphism for several activities of the phages. Arguments are presented that the polymorphism of the integration module results from selection (presumably frequency-dependent) for new (not improved) specificities of site recognition. Analysis of phages lambda and HK022 by Weisberg and collaborators previously showed that changes in five noncontiguous amino acids could switch site recognition specificity. Phage 21 and defective element e14, which integrate at the same site, differ in recognition specificity for both core and arm sites. In vitro assays of e14 and 21 insertion and excision confirm this conclusion. Inhibition by ds arm site oligonucleotides defines the sequence specificity more precisely.
View details for Web of Science ID 000179842200003
View details for PubMedID 12468081
This work assesses relationships for 30 complete prokaryotic genomes between the presence of the Shine-Dalgarno (SD) sequence and other gene features, including expression levels, type of start codon, and distance between successive genes. A significant positive correlation of the presence of an SD sequence and the predicted expression level of a gene based on codon usage biases was ascertained, such that predicted highly expressed genes are more likely to possess a strong SD sequence than average genes. Genes with AUG start codons are more likely than genes with other start codons, GUG or UUG, to possess an SD sequence. Genes in close proximity to upstream genes on the same coding strand in most genomes are significantly higher in SD presence. In light of these results, we discuss the role of the SD sequence in translation initiation and its relationship with predicted gene expression levels and with operon structure in both bacterial and archaeal genomes.
View details for DOI 10.1128/JB.184.20.5733-5745.2002
View details for Web of Science ID 000178279900024
View details for PubMedID 12270832
All known lambdoid prophages of Escherichia coli have the same orientation with respect to direction of chromosomal replication. This includes 12 prophages that are replicated in one direction and five in the other. Among candidate explanations, the most amenable to experimental study is an effect on dif site function in assuring chromosomal segregation. This is but one of numerous examples of strand bias in the E. coli genome, all of which may interact with one another.
View details for DOI 10.1006/tpbi.2002.1604
View details for Web of Science ID 000177739500013
View details for PubMedID 12167370
Physical and genetic studies verify that the DNA binding domain of protein gpNu1 (which initiates packaging of phage lambda DNA) is a winged helix-turn-helix (w HTH) and that gpNu1 dimers bind sites that are brought close through DNA bending.
View details for Web of Science ID 000175967100005
View details for PubMedID 12049730
The attachment site (attlambda) of bacteriophage lambda was examined in wild strains of Escherichia coli. Although the att region is non-coding, the DNA sequence was invariant in the 13 strains examined. Two other non-coding regions showed nine changes, all associated with a single strain. In four of 33 strains, sequences were inserted in or near the attlambda site and in two of these the insert was related to lambda. Among strains that can be lysogenized by lambda, integration was via the attlambda site in all cases. Some resistant strains can be lysogenized, and these have been termed "lenient." Most of these fail to give normal phage yield after induction. In some cases rare lysogens have been formed in cells that belong to a mutant subpopulation.
View details for Web of Science ID 000171788700002
View details for PubMedID 11677620
Predicted highly expressed (PHX) genes are characterized for the completely sequenced genomes of the four fast-growing bacteria Escherichia coli, Haemophilus influenzae, Vibrio cholerae, and Bacillus subtilis. Our approach to ascertaining gene expression levels relates to codon usage differences among certain gene classes: the collection of all genes (average gene), the ensemble of ribosomal protein genes, major translation/transcription processing factors, and genes for polypeptides of chaperone/degradation complexes. A gene is predicted highly expressed (PHX) if its codon frequencies are close to those of the ribosomal proteins, major translation/transcription processing factor, and chaperone/degradation standards but strongly deviant from the average gene codon frequencies. PHX genes identified by their codon usage frequencies among prokaryotic genomes commonly include those for ribosomal proteins, major transcription/translation processing factors (several occurring in multiple copies), and major chaperone/degradation proteins. Also PHX genes generally include those encoding enzymes of essential energy metabolism pathways of glycolysis, pyruvate oxidation, and respiration (aerobic and anaerobic), genes of fatty acid biosynthesis, and the principal genes of amino acid and nucleotide biosyntheses. Gene classes generally not PHX include most repair protein genes, virtually all vitamin biosynthesis genes, genes of two-component sensor systems, most regulatory genes, and most genes expressed in stationary phase or during starvation. Members of the set of PHX aminoacyl-tRNA synthetase genes contrast sharply between genomes. There are also subtle differences among the PHX energy metabolism genes between E. coli and B. subtilis, particularly with respect to genes of the tricarboxylic acid cycle. The good agreement of PHX genes of E. coli and B. subtilis with high protein abundances, as assessed by two-dimensional gel determination, is verified. Relationships of PHX genes with stoichiometry, multifunctionality, and operon structures are also examined. The spatial distribution of PHX genes within each genome reveals clusters and significantly long regions without PHX genes.
View details for Web of Science ID 000170349200012
View details for PubMedID 11489855
Our basic observation is that each genome has a characteristic "signature" defined as the ratios between the observed dinucleotide frequencies and the frequencies expected if neighbors were chosen at random (dinucleotide relative abundances). The remarkable fact is that the signature is relatively constant throughout the genome; i.e. , the patterns and levels of dinucleotide relative abundances of every 50-kb segment of the genome are about the same. Comparison of the signatures of different genomes provides a measure of similarity which has the advantage that it looks at all the DNA of an organism and does not depend on the ability to align homologous sequences of specific genes. Genome signature comparisons show that plasmids, both specialized and broad-range, and their hosts have substantially compatible (similar) genome signatures. Mammalian mitochondrial (Mt) genomes are very similar, and animal and fungal Mt are generally moderately similar, but they diverge significantly from plant and protist Mt sets. Moreover, Mt genome signature differences between species parallel the corresponding nuclear genome signature differences, despite large differences between Mt and host nuclear signatures. In signature terms, we find that the archaea are not a coherent clade. For example, Sulfolobus and Halobacterium are extremely divergent. There is no consistent pattern of signature differences among thermophiles. More generally, grouping prokaryotes by environmental criteria (e.g., habitat propensities, osmolarity tolerance, chemical conditions) reveals no correlations in genome signature.
View details for Web of Science ID 000081835500075
View details for PubMedID 10430917
We provide data and analysis to support the hypothesis that the ancestor of animal mitochondria (Mt) and many primitive amitochondrial (a-Mt) eukaryotes was a fusion microbe composed of a Clostridium-like eubacterium and a Sulfolobus-like archaebacterium. The analysis is based on several observations: (i) The genome signatures (dinucleotide relative abundance values) of Clostridium and Sulfolobus are compatible (sufficiently similar) and each has significantly more similarity in genome signatures with animal Mt sequences than do all other available prokaryotes. That stable fusions may require compatibility in genome signatures is suggested by the compatibility of plasmids and hosts. (ii) The expanded energy metabolism of the fusion organism was strongly selective for cementing such a fusion. (iii) The molecular apparatus of endospore formation in Clostridium serves as raw material for the development of the nucleus and cytoplasm of the eukaryotic cell.
View details for Web of Science ID 000081835500076
View details for PubMedID 10430918
A new measure for assessing codon bias of one group of genes with respect to a second group of genes is introduced. In this formulation, codon bias correlations for Escherichia coli genes are evaluated for level of expression, for contrasts along genes, for genes in different 200 kb (or longer) contigs around the genome, for effects of gene size, for variation over different function classes, for codon bias in relation to possible lateral transfer and for dicodon bias for some gene classes. Among the function classes, codon biases of ribosomal proteins are the most deviant from the codon frequencies of the average E. coli gene. Other classes of 'highly expressed genes' (e.g. amino acyl tRNA synthetases, chaperonins, modification genes essential to translation activities) show less extreme codon biases. Consistently for genes with experimentally determined expression rates in the exponential growth phase, those of highest molar abundances are more deviant from the average gene codon frequencies and are more similar in codon frequencies to the average ribosomal protein gene. Independent of gene size, the codon biases in the 5' third of genes deviate by more than a factor of two from those in the middle and 3' thirds. In this context, there appear to be conflicting selection pressures imposed by the constraints of ribosomal binding, or more generally the early phase of protein synthesis (about the first 50 codons) may be more biased than the complete nascent polypeptide. In partitioning the E. coli genome into 10 equal lengths, pronounced differences in codon site 3 G+C frequencies accumulate. Genes near to oriC have 5% greater codon site 3 G+C frequencies than do genes from the ter region. This difference also is observed between small (100-300 codons) and large (>800 codons) genes. This result contrasts with that for eukaryotic genomes (including human, Caenorhabditis elegans and yeast) where long genes tend to have site 3 more AT rich than short genes. Many of the above results are special for E. coli genes and do not apply to genes of most bacterial genomes. A gene is defined as alien (possibly horizontally transferred) if its codon bias relative to the average gene exceeds a high threshold and the codon bias relative to ribosomal proteins is also appropriately high. These are identified, including four clusters (operons). The bulk of these genes have no known function.
View details for Web of Science ID 000076116100003
View details for PubMedID 9781873
We review concepts and methods for comparative analysis of complete genomes including assessments of genomic compositional contrasts based on dinucleotide and tetranucleotide relative abundance values, identifications of rare and frequent oligonucleotides, evaluations and interpretations of codon biases in several large prokaryotic genomes, and characterizations of compositional asymmetry between the two DNA strands in certain bacterial genomes. The discussion also covers means for identifying alien (e.g. laterally transferred) genes and detecting potential specialization islands in bacterial genomes.
View details for Web of Science ID 000078222700008
View details for PubMedID 9928479
It was shown previously that phage 21 and the defective element e14 integrate at the same site within the icd gene of Escherichia coli K-12 but that 21 integrase and excisionase excise e14 in vivo very infrequently compared to excision of 21. We show here that the reverse is also true: e14 excises itself much better than it excises an adjacent 21 prophage. In vitro integrase assays with various attP substrates delimit the minimal attP site as somewhere between 366 and 418 bp, where the outer limits would include the outermost repeated dodecamers suggested as arm recognition sites by S. J. Schneider (Ph.D. dissertation, Stanford University, Stanford, Calif., 1992). We speculate that the reason 21 attP is larger than lambda attP (240 bp) is because it must include a 209-bp sequence homologous to the 3' end of the icd transcript in order to allow icd expression in lysogens. Alteration of portions of 21 attP to their e14 counterparts shows that 21 requires both the arm site and core site sequences of 21 but that replacements by e14 sequences function in some positions. Consistent with Schneider's in vivo results, and like all other known integrases from lambdoid phages, 21 requires integration host factor for activity.
View details for Web of Science ID A1997XV69900008
View details for PubMedID 9294425
We compare and contrast genome-wide compositional biases and distributions of short oligonucleotides across 15 diverse prokaryotes that have substantial genomic sequence collections. These include seven complete genomes (Escherichia coli, Haemophilus influenzae, Mycoplasma genitalium, Mycoplasma pneumoniae, Synechocystis sp. strain PCC6803, Methanococcus jannaschii, and Pyrobaculum aerophilum). A key observation concerns the constancy of the dinucleotide relative abundance profiles over multiple 50-kb disjoint contigs within the same genome. (The profile is rhoXY* = fXY*/fX*fY* for all XY, where fX* denotes the frequency of the nucleotide X and fY* denotes the frequency of the dinucleotide XY, both computed from the sequence concatenated with its inverted complementary sequence.) On the basis of this constancy, we refer to the collection [rhoXY*] as the genome signature. We establish that the differences between [rhoXY*] vectors of 50-kb sample contigs of different genomes virtually always exceed the differences between those of the same genomes. Various di- and tetranucleotide biases are identified. In particular, we find that the dinucleotide CpG=CG is underrepresented in many thermophiles (e.g., M. jannaschii, Sulfolobus sp., and M. thermoautotrophicum) but overrepresented in halobacteria. TA is broadly underrepresented in prokaryotes and eukaryotes, but normal counts appear in Sulfolobus and P. aerophilum sequences. More than for any other bacterial genome, palindromic tetranucleotides are underrepresented in H. influenzae. The M. jannaschii sequence is unprecedented in its extreme underrepresentation of CTAG tetranucleotides and in the anomalous distribution of CTAG sites around the genome. Comparative analysis of numbers of long tetranucleotide microsatellites distinguishes H. influenzae. Dinucleotide relative abundance differences between bacterial sequences are compared. For example, in these assessments of differences, the cyanobacteria Synechocystis, Synechococcus, and Anabaena do not form a coherent group and are as far from each other as general gram-negative sequences are from general gram-positive sequences. The difference of M. jannaschii from low-G+C gram-positive proteobacteria is one-half of the difference from gram-negative proteobacteria. Interpretations and hypotheses center on the role of the genome signature in highlighting similarities and dissimilarities across different classes of prokaryotic species, possible mechanisms underlying the genome signature, the form and level of genome compositional flux, the use of the genome signature as a chronometer of molecular phylogeny, and implications with respect to the three putative eubacterial, archaeal, and eukaryote domains of life and to the origin and early evolution of eukaryotes.
View details for Web of Science ID A1997XE30000011
View details for PubMedID 9190805
The complete Haemophilus influenzae genome (1.83 Mb, Rd strain) provides opportunities for characterizing global genomic inhomogeneities and for detecting important sequence signals. Along these lines, new methods for identifying frequent words (oligonucleotides and/or peptides) and their distributions are applied to the H.influenzae genome with some comparisons and contrasts made with frequent words of other bacterial genomes. Three major classes of frequent oligonucleotides stand out: (i) oligos related to the familiar uptake signal sequences (USSs), AAGTGCGGT (USS+) and its inverted complement (USS-), (ii) multiple tetranucleotide iterations and (iii) intergenic dyad sequences (ISDs) found as AAGCCCACCCTAC and its dyad form. The USS+ and USS- occur in almost equal counts, are remarkably evenly spaced around the genome, and appear predominantly in the same reading frame of protein coding domains (USS+ translated to Ser-Ala-Val, USS- translated to Thr-Ala-Leu). These observations suggest that USSs contribute to global genomic functions, for example, in replication and/or repair processes, or as membrane attachment sites, or as sequences helping to pack DNA. The long tetranucleotide iterations, virtually unique to H.influenzae (i.e., unknown in other prokaryotes), through polymerase slippage during replication and/or homologous recombination may produce subpopulations expressing alternative proteins. The 13 bp frequent IDS words, invariably intergenic, occur mostly in clusters and provide potential for complex secondary structures suggesting that these sequences may be important signals for regulating the activity of their flanking genes. The frequent oligopeptides of H.influenzae are principally of two kinds--those induced by oligonucleotide frequent words (USSs, tetranucleotide iterations), and those associated with ATP or GTP binding sites that are generally composed of three motifs: the A-box which contributes to delineating the binding pocket; the B-box which functions in hydrolysis; and the C-box whose function is unknown. The A-box occurs fairly universally in prokaryotes and eukaryotes. The B- and C-motifs appear to be specialized to various functional groups (e.g., transport, recombination, chaperone activity). Other putative motifs correspond to homologs of Escherichia coli motifs, for example, are associated with proteins of transcriptional processing, aminoacyl-tRNA synthetases and proteins functioning in electron transfer.
View details for Web of Science ID A1996VT24900025
View details for PubMedID 8932382
Genomic similarities and contrasts are investigated in a collection of 23 bacteriophages, including phages with temperate, lytic, and parasitic life histories, with varied sequence organizations and with different hosts and with different morphologies. Comparisons use relative abundances of di-, tri-, and tetranucleotides from entire genomes. We highlight several specific findings. (i) As previously shown for cellular genomes, each viral genome has a distinctive signature of short oligonucleotide abundances that pervade the entire genome and distinguish it from other genomes. (ii) The enteric temperate double-stranded (ds) phages, like enterobacteria, exhibit significantly high relative abundances of GpC = GC and significantly low values of TA, but no such extremes exist in ds lytic phages. (iii) The tetranucleotide CTAG is of statistically low relative abundance in most phages. (iv) The DAM methylase site GATC is of statistically low relative abundance in most phages, but not in P1. This difference may relate to controls on replication (e.g., actions of the host SeqA gene product) and to MutH cleavage potential of the Escherichia coli DAM mismatch repair system. (v) The enteric temperate dsDNA phages form a coherent group: they are relatively close to each other and to their bacteria] hosts in average differences of dinucleotide relative abundance values. By contrast, the lytic dsDNA phages do not form a coherent group. This difference may come about because the temperate phages acquire more sequence characteristics of the host because they use the host replication and repair machinery, whereas the analyzed lytic phages are replicated by their own machinery. (vi) The nonenteric temperate phages with mycoplasmal and mycobacterial hosts are relatively close to their respective hosts and relatively distant from any of the enteric hosts and from the other phages. (vii) The single-stranded RNA phages have dinucleotide relative abundance values closest to those for random sequences, presumably attributable to the mutation rates of RNA phages being much greater than those of DNA phages.
View details for Web of Science ID A1996UQ45500034
View details for PubMedID 8650182
A gene at 42 min on the Escherichia coli chromosome, identified as the locus of pseudoreversion of knockout mutations in the biotin sulfoxide reductase gene, bisC, has 64% base sequence identity with bisC. This makes it a member of a multigene family of molybdopterin enzymes that includes genes for anaerobic reduction of trimethylamine oxide (torA) and dimethylsulfoxide (dmsA). Disruption of this gene eliminates the background activity of biotin sulfoxide reduction observed in bisC mutants. Sequence comparison of the new gene (bisZ) with bisC indicates that certain ts mutants of bisC arise by gene conversion between the two loci.
View details for Web of Science ID A1996UC69800003
View details for PubMedID 8919859
We present considerable data supporting the hypothesis that a Sulfolobus- or Mycoplasma-like endosymbiont, rather than an alpha-proteobacterium, is the ancestor of animal mitochondrial genomes. This hypothesis is based on pronounced similarities in oligonucleotide relative abundance extremes common to animal mtDNA, Sulfolobus, and Mycoplasma capricolum and pronounced discrepancies of these relative abundance values with respect to alpha-proteobacteria. In addition, genomic dinucleotide relative abundance measures place Sulfolobus and M. capricolum among the closest to animal mitochondrial genomes, whereas the classical eubacteria, especially the alpha-proteobacteria, are at excessive distances. There are also considerable molecular and cellular phenotypic analogies among mtDNA, Sulfolobus, and M. capricolum.
View details for Web of Science ID A1994PY29400101
View details for PubMedID 7809132
Lambdoid phages are natural relatives of phage lambda. As a group, they are highly polymorphic in DNA sequence and biological specificity. Specificity differences have played a key role in identifying the specific sequences recognized by the N and Q antitermination proteins, the initiator O for DNA synthesis, the terminase system Nv1-A for cutting DNA during packaging, and the cI repressor protein. Variations that go beyond specificity differences are seen in packaging mechanism (headful in P22, specific cutting in lambdoid coliphages), in early control (terminator protein and phage-independent antitermination in HK022, phage-specific antitermination in lambda), in repression control (antirepressor operon in P22, absent in other lambdoid phages) and murein-degrading enzymes (transglycosylase in lambda, lysozyme in other lambdoid phages). Sequence comparisons indicate that recombination among lambdoid phages is frequent in nature.
View details for Web of Science ID A1994PM44600008
View details for PubMedID 7826005
The present status of some general questions about DNA recombination is assessed. Topics include the mechanisms of synapsis and strand exchange, and the functions of recombination in nature.
View details for Web of Science ID A1993MQ34800021
Most of the well-characterized prokaryotic genomes consist of double-stranded DNA organized as a single circular chromosome 0.6-10 Mb in length and one or more circular plasmid species of 2 kb-1.7 Mb. The past few years, however, have revealed some major variations in genome organization. In addition, a recent accumulation of data has shown that the location and orientation of the genes and repeated sequences (including prophages and transposons) on and among these elements is not always random. Some of the non-randomness is probably the result of unique historical events; in other cases it reflects selection for the optimization of function.
View details for PubMedID 8118207
View details for PubMedID 8122898
Counts and spacings of all 4- and 6-bp palindromes in DNA sequences from a broad range of organisms were investigated. Both 4- and 6-bp average palindrome counts were significantly low in all bacteriophages except one, probably as a means of avoiding restriction enzyme cleavage. The exception, T4 of normal 4- and 6-palindrome counts, putatively derives protection from modification of cytosine to hydroxymethylcytosine plus glycosylation. The counts and distributions of 4-bp and of 6-bp restriction sites in bacterial species are variable. Bacterial cells with multiple restriction systems for 4-bp or 6-bp target specificities are low in aggregate 4- or 6-bp palindrome counts/kb, respectively, but bacterial cells lacking exact 4-cutter enzymes generally show normal or high counts of 4-bp palindromes when compared with random control sequences of comparable nucleotide frequencies. For example, E. coli, apparently without an exact 4-bp target restriction endonuclease (see text), contains normal aggregate 4-palindrome counts/kb, while B. subtilis, which abounds with 4-bp restriction systems, shows a significant under-representation of 4-palindrome counts. Both E. coli and B. subtilis have many 6-bp restriction enzymes and concomitantly diminished aggregate 6-palindrome counts/kb. Eukaryote, viral, and organelle sequences generally have aggregate 4- and 6-palindromic counts/kb in the normal range. Interpretations of these results are given in terms of restriction/methylation regimes, recombination and transcription processes, and possible structural and regulatory roles of 4- and 6-bp palindromes.
View details for Web of Science ID A1992HM42000028
View details for PubMedID 1313968
Strand-symmetric relative abundance functionals for di-, tri-, and tetranucleotides are introduced and applied to sequences encompassing a broad phylogenetic range to discern tendencies and anomalies in the occurrences of these short oligonucleotides within and between genomic sequences. For dinucleotides, TA is almost universally under-represented, with the exception of vertebrate mitochondrial genomes, and CG is strongly under-represented in vertebrates and in mitochondrial genomes. The traditional methylation/deamination/mutation hypothesis for the rarity of CG does not adequately account for the observed deficiencies in certain sequences, notably the mitochondrial genomes, yeast, and Neurospora crassa, which lack the standard CpG methylase. Homodinucleotides (AA.TT, CC.GG) and larger homooligonucleotides are over-represented in many organisms, perhaps due to polymerase slippage events. For trinucleotides, GCA.TGC tends to be under-represented in phage, human viral, and eukaryotic sequences, and CTA.TAG is strongly under-represented in many prokaryotic, eukaryotic, and viral sequences. The CCA.TGG triplet is ubiquitously over-represented in human viral and eukaryotic sequences. Among the tetranucleotides, several four-base-pair palindromes tend to be under-represented in phage sequences, probably as a means of restriction avoidance. The tetranucleotide CTAG is observed to be rare in virtually all bacterial genomes and some phage genomes. Explanations for these over- and under-representations in terms of DNA/RNA structures and regulatory mechanisms are considered.
View details for Web of Science ID A1992HE60600044
View details for PubMedID 1741388
The lambdoid phages are a group of related temperate bacteriophages that lysogenize by site-specific recombination with the bacterial chromosome. Various members of the group have different specific chromosomal insertion sites, despite the fact that the enzymes catalyzing the insertion (integrases) appear to be all descended from a common ancestor. Insertion sites are not located randomly on the E. coli chromosome but are restricted to one segment of the map; also, most prophages are oriented in the same direction along the chromosome. Lambdoid phage 21 inserts within the isocitrate dehydrogenase gene and introduces an alternative 165 bp 3' end for that gene. A defective element (e14) inserts at the same position. We suggest that this mode of insertion arose from insertion of an ancestral phage to the right of icd which then picked up part of the icd gene by abnormal excision and speculate that, at an earlier time, phages may have arrived at their present locations by a process of chromosomal walking.
View details for Web of Science ID A1992JX41200020
View details for PubMedID 1468648
Comparison of the nucleotide sequence of the integrase genes of lambdoid phages 21 and 434 with the published sequences of phages HK022 and lambda shows that lambda and 434 are very closely related (98% base sequence identity), whereas HK022 and 21 respectively show 73% and 48% identity to lambda. It is likely that several homologous recombination events occurred in the int gene and flanking DNA among the progenitors of these phages. Sequence divergence to different alternative sequences at a common site (tL4) suggests that tL4 has been repeatedly used as a recombination site, despite the very limited homology it provides. A minor constitutive transcript that terminates at tL4 of lambda has been identified. We propose that the principal selective force acting to conserve tL4 is for terminator function, but that the use of tL4 as a recombination site has allowed the formation of selectively favored recombinants. By extension, we suggest that conservation of microhomologies at functional sites serves to keep lambdoid phages within a common gene pool despite extensive drift and divergence.
View details for Web of Science ID A1991GK71800012
View details for PubMedID 1715186
Clones of the Escherichia coli bisC locus have been isolated by complementing a bisC mutant for growth with d-biotin d-sulfoxide as a biotin source. The complementation properties of deletions and Tn5 insertions located the bisC gene to a 3.7-kilobase-pair (kbp) segment, 3.3 kbp of which has been sequenced. A single open reading frame of 2,178 bp, capable of encoding a polypeptide of molecular weight 80,905, was found. In vitro transcription of plasmids carrying the wild-type sequence and deletion and insertion mutants showed that BisC complementation correlated perfectly with production of a polypeptide whose measured molecular weight (79,000) does not differ significantly from 80,905.
View details for Web of Science ID A1990CW01800070
View details for PubMedID 2180922
View details for PubMedID 2854524
The bio operons of Citrobacter freundii and Escherichia coli K-12 (strain C600) were isolated by screening lambda banks for complementation of E. coli bio mutants. These were compared with the previously isolated bio operon of Salmonella typhimurium and previous data on E. coli K-12. The restriction maps of the operon are very different in the three species, but no difference in gene order was found. Operator-promoter DNA, identified by repressible titration and by biotin-repressible transcription in E. coli, was sequenced and compared to the published E. coli K-12 sequence. In the segment previously identified as operator/bioB promoter, C. freundii and S. typhimurium DNA are identical and differ from E. coli only by 2 bp. The DNA to the right of this segment (indicated by previous data to be the bioA promoter of E. coli) has diverged in all three species, and only E. coli has a sequence resembling a consensus promoter.
View details for Web of Science ID A1988P387100006
View details for PubMedID 2971595
When Escherichia coli cells lysogenic for bacteriophage lambda are induced with ultraviolet light, cells carrying cryptic lambda prophages are occasionally found among the apparently cured survivors. The lambda variant crypticogen (lambda crg) carries an insertion of the transposable element IS2, which increases the frequency of cryptic lysogens to about 50% of cured cells: 43 of these cryptic prophages have been characterized. They all contain substitutions that replace the early segment of the prophage genome (from the IS2 to near the cos site) with a duplicate copy of a large segment of the host chromosome. The right end of the substitution always results from recombination between the nin-QSR-cos region of the prophage and the homologous incomplete lambdoid prophage Qsr' at 12.5 minutes in the E. coli chromosome. The left end of the substitution is usually a crossover that recombines the IS2 element in the prophage with an E. coli IS2 at 8.5 minutes, near the lac gene, or with a second IS2 located counterclockwise from leu at 2 minutes, generating duplications of at least 200,000 bases. Five cryptic lysogens derived from cells lysogenic for a reference strain of lambda (which lacks the IS2 present in lambda crg) have been characterized. They contain substitutions whose right termini are generated by a crossover with the Qsr' prophage. The left termini of these substitutions are formed either by a crossover between the lambda exo gene and a short exo-homologous segment of Qsr' (2/5), or by a crossover between sequences to the left of attL and an unmapped distant region of the host chromosome (3/5). The large duplications carried by these cryptic lysogens are stable, unlike tandem duplications, and so may significantly influence the cell's evolutionary potential.
View details for Web of Science ID A1987L184500004
View details for PubMedID 2828640
A 1488-bp restriction fragment of bacteriophage 434 DNA contains the integrase promoter and an adjacent nucleotide sequence (t'I) resembling a Rho-independent terminator. To identify and quantitate transcription termination, DNA segments were cloned into a plasmid between the galactose promoter and assayable galactokinase gene and tested for termination. Whereas the entire fragment effectively terminated transcription, a 331-bp restriction fragment containing the t'I terminator had only weak terminator activity. Random sequential deletions of the 434 DNA segment defined a strong terminator 650-bp upstream from t'I. This proposed Rho-independent terminator called tL4 consists of a 7-bp stem and 6-nt loop followed by a uridine-rich region in the RNA. Phage lambda contains an even stronger tL4 terminator that differs in 4 nt from 434 tL4. Thus, despite some sequence divergence, terminator activity has been conserved in these phages. The 434 DNA segment was also tested for promoter activity. Rightward promoter activity (opposite to pL in the phage) was located about 200 bp to the right of tL4 and was followed by an open reading frame (ORF) capable of encoding a 91 amino acid protein. Promoter activity in the same approximate location was also found in phage lambda. Thus the rightward promoter, the tL4 and t'I terminators, and ORF-55 all are elements in this segment of the genome that are conserved for function despite sequence divergence.
View details for Web of Science ID A1987M084400002
View details for PubMedID 2965063