145
1 Aix-Marseille Université Faculté de Médecine de Marseille Ecole Doctorale des Sciences de la Vie et de la Santé THESE DE DOCTORAT présentée et soutenue le 18 Décembre 2013 par Mano Joseph MATHEW En vue de l'obtention du grade de docteur de l'Université Aix-Marseille Spécialité : Pathologie humaine et Maladies Infectieuses ______________________________________________________________________________ Insight into intracellular bacterial genome repertoire using comparative genomics ______________________________________________________________________________ Composition du jury : M. le Professeur Jérôme ETIENNE Rapporteur M. le Professeur Max MAURIN Rapporteur M. le Professeur Jean-Louis MEGE Président du Jury M. le Professeur Didier RAOULT Directeur de Thèse Unité de Recherche sur les Maladies Infectieuses Tropicales et Emergentes (URMITE), UM 63 CNRS 7278 IRD 198 INSERM 1095

Insight into intracellular bacterial genome repertoire using

  • Upload
    vuxuyen

  • View
    234

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Insight into intracellular bacterial genome repertoire using

1

Aix-Marseille Université

Faculté de Médecine de Marseille

Ecole Doctorale des Sciences de la Vie et de la Santé

THESE DE DOCTORAT

présentée et soutenue le 18 Décembre 2013 par

Mano Joseph MATHEW

En vue de l'obtention du grade de docteur de l'Université Aix-Marseille

Spécialité : Pathologie humaine et Maladies Infectieuses

______________________________________________________________________________

Insight into intracellular bacterial genome repertoire

using comparative genomics

______________________________________________________________________________

Composition du jury :

M. le Professeur Jérôme ETIENNE Rapporteur

M. le Professeur Max MAURIN Rapporteur

M. le Professeur Jean-Louis MEGE Président du Jury

M. le Professeur Didier RAOULT Directeur de Thèse

Unité de Recherche sur les Maladies Infectieuses Tropicales et

Emergentes (URMITE), UM 63 CNRS 7278 IRD 198 INSERM 1095

Page 2: Insight into intracellular bacterial genome repertoire using

2

Page 3: Insight into intracellular bacterial genome repertoire using

3

To my Lord, precious family and friends…

Page 4: Insight into intracellular bacterial genome repertoire using

4

Page 5: Insight into intracellular bacterial genome repertoire using

5

Preamble

Le format de présentation de cette thèse correspond à une

recommandation de la spécialité Maladies Infectieuses et Microbiologie,

à l'intérieur du Master de Sciences de la Vie et de la Santé qui dépend de

l'Ecole Doctorale des Sciences de la Vie de Marseille. Le candidat est

amené à respecter des règles qui lui sont imposées et qui comportent un

format de thèse utilisé dans le Nord de l'Europe permettant un meilleur

rangement que les thèses traditionnelles. Par ailleurs, la partie

introduction et bibliographie est remplacée par une revue envoyée dans

un journal an de permettre une évaluation extérieure de la qualité de la

revue et de permettre à l'étudiant de le commencer le plus tôt possible

une bibliographie exhaustive sur le domaine de cette thèse. La thèse est

présentée sur article publié, accepté ou soumis associé d'un bref

commentaire donnant le sens général du travail. Cette forme de

présentation a paru plus en adéquation avec les exigences de la

compétition internationale et permet de se concentrer sur des travaux

qui bénéficieront d'une diffusion internationale.

Professeur Didier RAOULT

Page 6: Insight into intracellular bacterial genome repertoire using

6

Abstract

Prokaryotic microorganisms are prevalent in all the

environments on Earth. Given their ecological ubiquity, it is not

surprising to find many prokaryotic species in close relationships

with members of many eukaryotic taxa, often establishing a

persistent association, which is known as symbiosis. Conforming to

the fitness effects on the members of the symbiotic relationship,

associations can be referred to as parasitism, mutualism or

commensalism and, depending on the location of the symbiont with

respect to host cells, as ectosymbiosis or endosymbiosis. Genome

sequencing, especially using Next Generation Sequencing (NGS) has

changed radically the face of microbiology and has helped to discern

how the diverse group of intracellular bacteria evolved to survive

and replicate in host cells. Therefore, the initial purpose of my thesis

is to understand with the help of comparative genomics, genomic

variations based on coexistence, by examining data on the ancient

existence of intracellular bacteria, their host adaptation and the

differences between sympatry and allopatry. The first part of my

thesis is a review giving insight into intracellular bacterial genome

repertoire and symbionts. The goal of this review is to explore how

intracellular microbes acquire their specific lifestyle. Due to their

different evolutionary trajectories, these bacteria have different

genomic compositions. We reviewed data on the ancient existence

of intracellular bacteria, their host adaptation and the differences

between sympatry and allopatry. Furthermore, we elaborate on the

genomic repertoire to understand the phenomenon of gene loss in

intracellular bacteria. To understand the genomic repertoire and its

composition in intracellular bacteria, it is essential to understand

specialization in bacteria with respect to their niches. A comparison

of the genomic contents of bacteria with certain lifestyles revealed

the bacterial capacity to exchange genes to different extents,

Page 7: Insight into intracellular bacterial genome repertoire using

7

depending on the ecosystem. Moreover, genomics has provided

important clues to the mechanisms driving the genome-reduction

process, the functions that are retained when a species becomes

intracellular, and the role of the host in molding the genomic

composition of intracellular bacteria. The second part of my thesis

present about the genome sequence of Diplorickettsia massiliensis

strain 20B which is an obligate intracellular, gram negative

bacterium isolated from Ixodes ricinus ticks collected from Slovak. In

the third part, we investigated the genome repertoire of

Diplorickettsia massiliensis compared to closely related bacteria

according to its niche, revealing its allopatric lifestyle. In this study,

we compared the genomic features of Diplorickettsia massiliensis

with twenty-nine sequenced Gammaproteobacteria species

(Legionella strains, Coxiella burnetii strains, Francisella tularensis

strains and Rickettsiella grylli) using multi-genus pangenomic

approach. This thesis work provides original data and sheds light on

intracellular bacterial diversity.

Keywords : Intracellular bacteria, Diplorickettsia massiliensis,

genome repertoire, allopatry, sympatry, pangenome,

gammaprotebacteria

Page 8: Insight into intracellular bacterial genome repertoire using

8

Résumé

Les microorganismes sont présents dans presque tous les

habitats de la planète. Compte tenu de leur ubiquité écologique, il

n'est pas surprenant de trouver de nombreuses espèces procaryotes

en relations étroites avec des membres de nombreux taxons

eucaryotes, établissant souvent une association persistante appelée

symbiose. En fonction des interactions entre les partenaires au sein

de cette relation symbiotique, celle ci peut être considérée comme

du parasitisme, du mutualisme ou du commensalisme. Et selon

l'emplacement du symbiote par rapport aux cellules de l'hôte,

comme de l'ectosymbiose ou de l'endosymbiose. Le séquençage des

génomes, en particulier le séquençage à haut débit (NGS), a

YミoヴマYマeミt aマYlioヴY ミotヴe IoマpヴYheミsioミ de lげY┗olutioミ des différents groupes de bactéries intracellulaires et de leur survie au

sein des cellules hôtes. LげoHjeItif de Iette thXse est doミI de comprendre, avec l'aide de la génomique comparative, les variations

génomiques liées à la coexistence, en examinant les données

concernant l'existence ancienne de bactéries intracellulaires, leur

adaptation à leur hôte et les différences entre sympatrie et

allopatrie. La première partie de ma thèse est une revue donnant un

aperçu du répertoire génomique des bactéries intracellulaires et de

leurs symbiotes. L'objectif de cette étude est d'explorer le processus

permettant aux bactéries intracellulaires d'acquérir leur mode de

vie spécifique. En raison de leurs différentes voies évolutives, ces

bactéries ont des compositions génomiques différentes. Nous avons

commencé par examiner les données à propos de l'existence

ancienne de bactéries intracellulaires, leur adaptation à leur hôte et

les différences entre sympatrie et allopatrie. En outre, nous avons

exploré le répertoire génomique de ces bactéries pour comprendre

le phénomène de perte de gènes chez les bactéries intracellulaires.

Pour comprendre le répertoire génomique et sa composition chez

Page 9: Insight into intracellular bacterial genome repertoire using

9

bactéries intracellulaires, il est nécessaire de comprendre la

spécialisation de ces bactéries par rapport à leurs niches. Une

comparaison du contenu génomique de plusieurs bactéries avec

différents modes de vie a révélé la capacité des bactéries à échanger

des gènes à des degrés différents, en fonction de l'écosystème.

Dげailleuヴs, la gYミoマiケue a fouヴミi dげiマpoヴtaミts indices sur, les

mécanismes causant le processus de réduction des génomes, les

fonctions qui sont conservés loヴsケuげuミe espèce devient

iミtヴaIellulaiヴe et lげiミflueミIe ケue l'hôte peut a┗oiヴ suヴ la Ioマpositioミ génomique des bactéries intracellulaires. La deuxième partie de ma

thèse porte sur la séquence du génome de la souche Diplorickettsia

massiliensis 20B qui est une bactérie intracellulaire obligatoire à

Gram négatif isolée à partir des tiques de Slovaquie Ixodes ricinus.

Dans ma troisième et dernière partie, nous exploré le répertoire du

génome de Diplorickettsia massiliensis en le comparant aux

génomes de bactéries phylogénétiquement très proches de

Diplorickettsia massiliensis, issues de différentes niches. Ceci a

permis de révélé son mode de vie allopatrique. Dans cette étude,

nous avons comparé les caractéristiques du génome de

Diplorickettsia massiliensis avec vingt-neuf espèces séquencées de

Gammaproteobacteria (Legionella, Coxiella burnetii, Francisella

tularensis et Rickettsiella grylli) en utilisant l'approche

pangénomique multi-genre. Ce travail de thèse fournit des données

oヴigiミales et peヴマet dげappoヴteヴ plus de luマiXヴe suヴ la di┗eヴsitY des bactéries intracellulaires.

Mots clés : Bactéries intracellulaires, Diplorickettsia massiliensis,

répertoire génomique, sympatrie, allopatrie, pangénom,

Gammaproteobacteria

Page 10: Insight into intracellular bacterial genome repertoire using

10

Page 11: Insight into intracellular bacterial genome repertoire using

11

Contents

Preamble 5

Abstract 6

Résumé 8

Contents 11

1 Chapter One : Introduction 13

2 Chapter Two: Review 17

2.1 Review:

Genome repertoire of intracellular bacteria and symbionts

3 Chapter Three: Genome sequencing of intracellular bacteria 63

3.1 Article 1:

Genome Sequence of Diplorickettsia massiliensis,

an Emerging Ixodes ricinus-Associated Human Pathogen

4 Chapter Four: Comparative genomics 73

4.1 Article 2:

The genomic repertoire of Diplorickettsia massiliensis

reveals its allopatric lifestyle

5 Chapter Five: Conclusions 119

5.1 Conclusions and perspectives

5.2 Future perspective

Bibliography 125

Acknowledgements 143

Page 12: Insight into intracellular bacterial genome repertoire using

12

Page 13: Insight into intracellular bacterial genome repertoire using

13

Chapter 1

Introduction

The following section introduces the reader about the studies on

intracellular bacteria and their interactions between intracellular bacteria

and different niches. In the past, microbiologists were mainly restricted to

the study of microorganisms that could be isolated and grown on

relatively simple media. This often made it almost impossible to study

species that cannot survive outside their hosts, and severely limited our

knowledge of the genetics of these organisms. Advances in culture

techniques and genome sequencing now allow these organisms to be

studied, and the results of these endeavours have enlightened us on their

complete genetic code and provided powerful insights into their exquisite

relationships with their hosts. Three microbal categories have been

defined based on their niches: free-living, facultative intracellular and

obligate intracellular bacteria.

The genomes of intracellular bacteria are extremely varied.

Examples of facultative intracellular bacteria, which can multiply inside

vacuoles, include Legionella pneumophila spp., Francisella tularensis spp.

and Mycobacterium tuberculosis spp., and the obligate intracellular

bacteria include Chlamydia spp., whereas Listeria monocytogenes,

Shigella flexneri, enteroinvasive Escherichia coli and some Rickettsia spp.

are able to enter and replicate in the cytosol of mammalian cells (Zientz,

et al., 2004). Intracellular bacteria need factors to distinguish, intrude and

Page 14: Insight into intracellular bacterial genome repertoire using

14

replicate within the host cells when their intracellular phase is transient.

The intracellular location may facilitate the understanding of host

metabolites, which support bacterial multiplication in a relatively safe

host compartment devoid of potent host defense mechanisms. Moreover,

the intracellular compartment may allow the diffusion of bacteria within

the host and, after evading the host cells, the bacteria may be released

into the environment or directly transmitted to another host organism

(Finlay & Falkow, 1997, Gross, et al., 2003, Zientz, et al., 2004). Genome

sequencing, especially using Next Generation Sequencing (NGS) has

changed radically the face of microbiology and has helped to discern how

the diverse group of intracellular bacteria evolved to survive and replicate

in host cells. In the first part of my thesis, we reviewed literature to

summarize the knowledge on the ancient existence of intracellular

bacteria, their host adaptation and the differences between sympatry and

allopatry. Moreover, genomics has provided important clues to the

mechanisms driving the genome-reduction process, the functions that are

retained when a species becomes intracellular, and the role of the host in

molding the genomic composition of intracellular bacteria (Chapter2).

Subsequently my thesis work proceeds from the observation that,

despite the recent advent of sequencing techniques, little is still known

about the interactions between intracellular bacteria and various niches.

In the second part of my thesis, we report our work on the genome

completion and sequencing of Diplorickettsia massiliensis strain 20B

which is an obligate intracellular, gram negative bacterium isolated from

Page 15: Insight into intracellular bacterial genome repertoire using

15

Ixodes ricinus ticks collected from Slovak. D. massiliensis belongs to the

Gammaproteobacteria class, is non-endospore-forming, and is shaped as

small rods that are usually grouped in pairs. An initial phylogenetic

analysis based on 16S rRNA showed that Diplorickettsia massiliensis

clustered with Rickettsiella grylli. Because of its low 16S rDNA similarity

(94%) with R. grylli, it was classified as a new genus Diplorickettsia into

the family Coxiellaceae and the order Legionellales. D. massiliensis strain

20B was identified in three patients with suspected tick-borne infections

that exhibited a specific seroconversion. The evidence of infection was

further reconfirmed by using PCR-assay, thus establishing its role as a

human pathogen. Therefore, we were interested to understand the

genome repertoire of Diplorickettsia massiliensis.

Furthermore, we investigated the genome repertoire of

Diplorickettsia massiliensis compared to closely related bacteria according

to niche, revealed its allopatric lifestyle. In this study, we compared the

genomic features of Diplorickettsia massiliensis with twenty-nine

sequenced Gammaproteobacteria species (Legionella strains, Coxiella

burnetii strains, Francisella tularensis strains and Rickettsiella grylli) using

multi-genus pangenomic approach and sheds light on intracellular

bacterial diversity.

Page 16: Insight into intracellular bacterial genome repertoire using

16

Page 17: Insight into intracellular bacterial genome repertoire using

17

Chapter 2

Review: Genome repertoire of intracellular

bacteria and symbionts

Page 18: Insight into intracellular bacterial genome repertoire using

18

Page 19: Insight into intracellular bacterial genome repertoire using

19

2.1 Review:

Genome repertoire of intracellular bacteria and symbionts

Mano J. Mathew 1 and Didier Raoult1*

1 Unité de Recherche sur les Maladies Infectieuses et Tropicales

Emergentes: URMITE, Aix Marseille Université, UMR CNRS 7278,

IRD 198, INSERM 109, Faculté de Médecine, 27 Bd Jean Moulin,

13005, Marseille, France.

Submitted to FEMS Microbiology Review

*Corresponding author. E-mail: [email protected]

Keywords: Genome repertoire, intracellular, host-microbe, facultative,

obligate, genome reduction, virulence, secretion system

Page 20: Insight into intracellular bacterial genome repertoire using

20

Abstract

The recent explosion in knowledge of the diverse group of

intracellular bacteria has helped to discern how these microbes

evolved to survive and replicate in host cells. This review highlights

the genomic repertoire of intracellular bacteria and symbionts by

examining data on the ancient existence of intracellular bacteria,

their host adaptation and the differences between sympatry and

allopatry. Moreover, genomics has provided important clues to the

mechanisms driving the genome-reduction process, the functions

that are retained when a species becomes intracellular, and the role

of the host in molding the genomic composition of intracellular

bacteria are highlighted. This wealth of information will contribute to

a better understanding of the interactions between intracellular

bacteria and various niches.

Page 21: Insight into intracellular bacterial genome repertoire using

21

Contents

Introduction

Intracellular bacteria: an ancient outlook

Sympatric and allopatric lifestyles

Genomic repertoire

– Bias in base compositions

– Metabolic variations

– Ribosomal split operons

– Other observations

Loss of non-virulent genes in intracellular bacteria

Gene duplication facilitating adaptation in intracellular bacteria

Mobilome of intracellular bacteria

– General distribution of the mobilome in intracellular bacteria

– Types of mobile genetic elements

– Transposable elements

– Repeated palindromic elements (RPEs)

– Ankyrin and tetratricopeptide repeat proteins

Secretion system machinery in intracellular bacteria

Concluding remarks

Acknowledgements

References

Page 22: Insight into intracellular bacterial genome repertoire using

22

Introduction

Understanding the genome repertoire of intracellular bacteria and

symbionts cannot be considered without first grappling with the

uncertainties and ambiguities in the meanings of the terms repertoire,

intracellular bacteria and symbionts. Additionally, none of these terms

lends itself to a straightforward explanation. This confusion must be

addヴessed Hefoヴe del┗iミg iミto this Ioマple┝ suHjeIt. The teヴマ けgeミoマe

ヴepeヴtoiヴeげ Ioミミotes the eミtiヴe geミoマiI Ioマpositioミ of aミ oヴgaミism. The

goal of this review is to explore the microbes that reside inside cells and

how they came to acquire this specific lifestyle. Three microbe categories

have been defined based on their niches: free-living, facultative

intracellular and obligate intracellular bacteria. The genomes of

intracellular bacteria are extremely varied. Examples of facultative

intracellular bacteria, which can multiply inside vacuoles, include

Legionella pneumophila spp., Francisella tularensis spp. and

Mycobacterium tuberculosis spp., and the obligate intracellular bacteria

include Chlamydia spp., whereas Listeria monocytogenes, Shigella

flexneri, enteroinvasive Escherichia coli and some Rickettsia spp. are able

to enter and replicate in the cytosol of mammalian cells [1]. Intracellular

bacteria need factors to distinguish, intrude and replicate within the host

cells when their intracellular phase is transient. The intracellular location

may facilitate the understanding of host metabolites, which support

bacterial multiplication in a relatively safe host compartment devoid of

potent host defense mechanisms. Moreover, the intracellular

Page 23: Insight into intracellular bacterial genome repertoire using

23

compartment may allow the diffusion of bacteria within the host and,

after evading the host cells, the bacteria may be released into the

environment or directly transmitted to another host organism [1-3].

These intracellular bacteria possess certain mechanisms used to

protect or invade host cells. Legionella pneumophila induces its own

uptake and blocks lysosomal fusion; otherwise, lysosomes would degrade

the bacteria [4]. It also uses a Type IV secretion system known as Dot/Icm

to inject effector proteins into the host cell required for bacterium

sustainability [5], meanwhile, Salmonella and Mycobacterium spp.

are very resistant to intracellular killing by phagocytic cells [6, 7].

A comprehensive list of intracellular bacteria is shown in Table 1.

Obligate intracellular bacteria cannot multiply outside host cells, as

they lack many biosynthetic pathways; hence, they are dependent on host

cells. These cells are also known as obligate endosymbionts, which

multiply exclusively inside the cells of many eukaryotic organisms and

usually have no extracellular state. Compared with their free-living

relatives, obligate intracellular bacteria exhibit a set of features shared by

intracellular parasites and endosymbionts. They tend to have small

population sizes, and their genomes are usually small and show marked

AT nucleotide biases, increased rates of nucleotide substitution, random

accumulation of deleterious mutations, accelerated sequence evolution

and the loss of genes that are involved in recombination and repair

pathways [8].

Page 24: Insight into intracellular bacterial genome repertoire using

24

The word symbiont, originating from the Greek simbios, or living

together, was first introduced by Anton de Bary in 1879 and was defined

as さthe peヴマaミeミt assoIiatioミ Het┘eeミ t┘o oヴ マoヴe oヴgaミisマs of

diffeヴeミt speIies, at least duヴiミg a paヴt of the life I┞Ileざ (Gil, et al., 2004).

Considering the ecological ubiquity of bacteria, it is not surprising to find

many species in close relationships with members of several eukaryotic

taxa. Depending on the fitness effects on the members of the symbiotic

relationship, the relationship can be referred to as parasitism, mutualism

or commensalism. Based on the location of the symbiont in relation to the

host cells, these relationships may be ectosymbiotic or endosymbiotic.

Rickettsia are frequently identified in close relationships with arthropod

vectors that may assist in the transmission of the organism to mammalian

hosts [9]. Between 15 and 20% of the known insects have symbiotic

relationships with bacteria, making them the most species-rich group. The

nutritional enrichment that bacteria offer to insects could be an

interesting factor in the evolutionary success of this group [10, 11].

The recent explosion in knowledge of bacterial pathogenesis has

assisted efforts to discern why certain intracellular bacteria have evolved

to survive and replicate in host cells as part of their pathogenic

mechanisms. Recent developments in genomics have introduced concepts

such as bacterial genome expansion and reduction, which have provided

insight into bacterial genome evolution [12]. The comparison of free-living

and intracellular bacteria has revealed dramatic differences in genome

size and content. In this paper, we review the genomic repertoire of

Page 25: Insight into intracellular bacterial genome repertoire using

25

intracellular bacteria and symbionts. We begin by reviewing data on the

ancient existence of intracellular bacteria, their host adaptation and the

differences between sympatry and allopatry. Furthermore, we elaborate

on the genomic repertoire to understand the phenomenon of gene loss in

intracellular bacteria.

Intracellular bacteria: an ancient outlook

Ecologists and biologists are fascinated with the enormous diversity of

bacteria that complete their life cycles within, or closely associated with,

eukaryotic cells. Symbiosis between unicellular and multicellular

organisms has contributed considerably to the evolution of life on Earth

[13]. These interactions include a broad range of effects on hosts, from

invasive pathogenesis to obligate relationships in which the hosts depend

on infection for survival or reproduction. Bacterial associations can be

difficult to categorize, and many bacteria can be unambiguously labeled

as mutualists (bacteria that assist in the fitness of the host) or as parasites

(bacteria that decrease the fitness of the host).

Intracellular bacteria are found in a wide range of niches. Due to

their different evolutionary trajectories, these bacteria have different

genomic compositions. These intracellular bacteria, unfortunately, have

no fossil record to assist scientists in determining when they acquired the

ability to survive inside other organisms. Based on an endosymbiotic

origin for mitochondria and other eukaryotic organelles [14, 15], we

predict that the intracellular culture is ancient and predated the

Page 26: Insight into intracellular bacterial genome repertoire using

26

emergence of eukaryotic organisms. Intracellular bacteria exhibit three

important properties: a) size differences compared to non-intracellular

bacteria, b) a mechanism for insertion into the host and c) survival within

the host. These initial interactions could have resulted in the survival of

symbiotic microbes. The situation in which the survival of the host occurs

at the expense of the microbe is termed predation, the situation in which

the host is harmed is called intracellular pathogenesis and the situation in

which the microbes are damaged is called incompatibility or antagonism.

Each situation is subject to selection that allows the emergence of varied

types of bacteria-host relationships. In the case of insects, the arthropod

lineage arose 385 million years (MY) ago and swiftly diversified [8]. The

early establishment of symbiotic relationships among insects and bacteria

approximately 300 MY ago and the nutritional advantage that these

bacteria offered to insects could have been key factors in the evolutionary

success of this group [8]. Mealy bug beta-proteobacterial endosymbiosis

was the first stable intracellular symbiotic association identified involving

two species of bacteria [16]. Facultative or obligate intracellular bacteria

can be identified throughout the tree of life from eukaryotic

microorganism protists to multicellular plants and animals [17]. The

Rickettsiales order, which belongs to Alphaproteobacteria, comprises

obligate intracellular bacteria that are closely related to mitochondrial

origin, having diverged approximately 850–1500 MY ago [9]. Rickettsiales

species have well-known close relationships with varied eukaryotic hosts,

as shown by the manipulation of cellular process such as host

reproduction [18, 19]. These relationships have led to a massive

Page 27: Insight into intracellular bacterial genome repertoire using

27

integration of bacterial genome fragments into host cells [20, 21]. Studies

on Rickettsiales have thus improved the knowledge of intracellular

bacteria contemporaneous with mitochondrial origin, as parts of a

Rickettsiales genome were found integrated into the nucleus of one

eukaryotic host, and another genome fragment was found to be

integrated into the mitochondrial genome of another host [22]. In another

striking example of lateral genetic transfer, nearly the entire Wolbachia

genome was found to be integrated into the genome of its host [20, 23]. A

recent study on mitochondrial protein-based phylogeny suggested that

Rickettsiales and Rhizobiales may have diverged 1.5 billion years ago (BYA)

[24, 25]. Their fusion likely created the first mitochondrion approximately

1 BYA [24]. Additionally, the origin of mitochondrial genes is not limited to

the Rickettsiales, and the transfer of these genes did not happen in a

single event but rather through numerous successive events [24, 25].

These studies clearly establish that the intracellular bacterial lifestyle is

ancient and constantly co-evolving with the host [26]. Before we

understand the genomic repertoire and its composition in intracellular

bacteria, it is essential to understand specialization in bacteria with

respect to their niches.

Sympatry and Allopatry lifestyle

A comparison of the genomic contents of bacteria with certain lifestyles

revealed the bacterial capacity to exchange genes to different extents,

depending on the ecosystem [27]. Allopatric speciation in bacteria is

associated with restricted opportunities to exchange genes with other

Page 28: Insight into intracellular bacterial genome repertoire using

28

organisms, although gene duplication, mutation and deletion are more

frequently observed. A prominent example is the association of Rickettsia

prowazekii with the human louse Pediculus humanus [9]. Allopatry is

generally associated with genome reduction, especially in pathogens that

have a small genomic repertoire compared to less specialized bacteria. In

sympatry, multiple bacteria infect the same host and thus undergo

massive genetic exchange [28]. Some authors [29, 30] have identified

which bacteria participate in each intracellular lifestyle. For example, the

strictly intracellular bacteria that live in narrow niches are allopatric, and

intracellular bacteria that live in amoebas are sympatric, as in the case of

Legionella sp., where an amoeba definitely constitute the place for DNA

exchange [29, 30]. Intracellular bacteria that have sympatric relationships

within amoebas exhibit larger genomes than their relatives [30]. The

bacteria that live in a sympatric manner interact with many other bacteria

belonging to divergent phyla, allowing them to share genes at an

increased rate. The sympatric lifestyle is associated with larger genomes,

larger pan-genomes, a larger mobilome and genetic exchanges with other

bacteria. These bacteria often have more genes, ribosomal operons,

better metabolic capacities and significant resistance to antibiotics [31].

Gene recombination is found in sympatric organisms, resulting in genetic

diversity [32]. In Rickettsia felis, using a single gene phylogenetic

approach, researchers found that some genes could be linked to those of

other bacteria, namely Rickettsia bellii, Rickettsia typhi, Legionella sp. and

Francisella sp. [32]. The different sizes and functions of the genes

suggested random horizontal gene transfer in R. felis [32]. Bacteria in

Page 29: Insight into intracellular bacterial genome repertoire using

29

sympatric environments have conserved genomes with phenotypic

plasticity and exhibit species complexity. Species complexity may have

promoted varied genomic repertoires that produced environmentally

adaptable alternative phenotypes [33]. However, like several obligate

pathogens, many of these obligate intracellular endosymbionts have an

extraordinary genome repertoire, an extremely reduced genome size and

correspondingly less coding capacity [34]. Hence, it is likely that the

mutual relationships of these bacteria with their host cells may have

promoted genome reduction. Thus, it is important to understand the

dynamics of the processes whereby new genes are acquired and old genes

are removed.

Genomic repertoire: an insight

Complete genome sequences are available for many bacteriome-

associated symbionts with shared features. The genome size, number of

genes and G + C content of intracellular bacteria, which has become

reduced during the specialization to an intracellular niche, reflect a

continual selective pressure for a minimal genome [35]. The reason for

this reduction could be that an intracellular niche reduces the possibility

for gene acquisition by lateral gene transfer (LGT) [31, 36-38]. Genes may

also be lost upon adaptation to the niche [39, 40]. In free-living bacteria,

the G + C base composition is close to 50%; in obligate intracellular

bacteria, it ranges from 16.5–33%. The genome sizes of these bacteria

vary depending on the host adaptation stage [41].

Page 30: Insight into intracellular bacterial genome repertoire using

30

Bias in base composition

The most extreme bias in base composition is the uninterrupted shift

towards an increased A + T content. This content is highest at sites that

are neutral or near neutral with respect to selection, such as silent

positions in codons and intergenic spacers. A + T content is favored by

mutational bias and is also commonly found in obligate pathogenic

bacteria such as Rickettsiales and Chlamydiales. The bias has an important

effect on the amino acid composition of proteins, but in the Buchnera

genome, the silent sites and spacer base compositions have less than 10%

G + C content, while the overall genome composition has 25–30% G + C

content [42-45]. In general, the mutational bias reflects the loss of DNA

repair pathways. Support this trend, many repair genes are retained in

Baumannia cicadellinicola, which has 33% G + C content, whereas no

repair genes are retained in Carsonella ruddii and Sulcia muelleri, which

have 16.5% and 22% G + C content, respectively [46].

Metabolic variations

Compared to free-living bacteria, host-dependent bacteria exhibit fewer

transcriptional regulators, as determined from a statistical comparative

analysis of 317 bacterial genomes from bacteria with different lifestyles

[47]. Genes involved in translation modification and transcription are

often among the lost genes [47]. In bacteriocytes such as Carsonella ruddii

and Sulcia muelleri [44, 48], genes involved in important processes such as

translation, replication and transcription are depleted, along with genes

Page 31: Insight into intracellular bacterial genome repertoire using

31

required for the production of cell envelope components [36, 49, 50]. The

suggestion that host functions can replace those of the original bacterial

cell envelope can be demonstrated by the enclosure of symbionts in a

host-derived membrane within the bacteriocytes (Buchnera aphidicola,

Sulcia muelleri and Carsonella ruddii); these symbionts lose a greater

proportion of genes involved in the production of the cellular envelope

than those of the symbionts that are free in the cytosol (Wigglesworthia

glossinidia, Candidatus Blochmannia). Bacterial symbionts that live in

harmony within host mitochondria or host nuclei [51, 52], and mutualistic

bacterial symbionts dwelling within different types of bacterial symbionts

in the host cytoplasm are examples of rare close associations [16, 53]. Put

differently, the transition from free living to intracellular culture is

facilitated with the loss of large segments of DNA [8, 54]. Rickettsia spp.

have lost many genes needed for metabolic pathways, including those for

sugar, purine and amino acid metabolism [55]. Similarly, the loss of DNA

for host adaptation was observed in Candidatus Candidatus Blochmannia,

which is an obligate endosymbiont of ants [56]. Conversely, gene

acquisition can be observed in the eukaryote L. pneumophila, which is

closely associated with amoebae [57]. Genome reduction is also

associated with increased pathogenicity, as seen for Rickettsia conorii and

Rickettsia prowazekii [18, 47, 58, 59].

Page 32: Insight into intracellular bacterial genome repertoire using

32

Ribosomal split operons

In a recent study on intracellular bacteria, several abnormal or split

ribosomal operons were identified. This abnormal feature occurred

independently in several groups of specialized bacteria [60]. Split

ribosomal operons are found in Rickettsiales, Helicobacter pylori and

Leptospira species, the group containing Mycoplasma and Buchnera and

recently, in Bartonella birtlesii. In the study on B. birtlesii, the authors

found that disrupted genes belonged to the translation COG and

ribosomal operon. The number of activated genes in a restricted

environment is much lower than that in a changing environment, as the

translation genes are not used extensively. If the bacteria do not use many

ribosomal operons in their current environment, they often lose them,

and restricting translation is critical for specialization, as speciation is

often correlated with ribosomal operon inactivation [47, 60]. In another

comparative genomic analysis of free-living and host-dependent bacteria,

the host-dependent bacteria exhibited fewer rRNA genes, more split rRNA

operons and fewer transcriptional regulators, characteristics that are

linked to slow growth rates [47]. The identification of function-dependent

and non-random loss of 100 orthologous genes in the analyzed

intracellular bacteria revealed that these bacteria from different phyla

underwent convergent evolution by specialization according to their niche

[47]. The ribosomal RNA (rRNA) genes are classically organized in operons

with the general structure 16S-23S-5S; transfer RNA (tRNA) genes are

typically found in the spacer between the 16S and the 23S rRNA genes

[47]. Intracellular bacteria have fewer copies of each rRNA gene than free-

Page 33: Insight into intracellular bacterial genome repertoire using

33

living bacteria and significantly lower copy numbers of typical rRNA

operons. In obligate intracellular bacteria such as Rickettsia sp., split rRNA

operons are important evolutionary factors [61]. The co-adaptation of

host genes and the modification of ancestral bacterial genes create the

base for symbiosis [62]. A minimal genome size is typically observed in

sequenced symbiont genomes.

Other observations

Adenine-specific DNA methylase is an enzyme that methylates specific

DNA targets, namely GANTC for alphaproteobacteria, resulting in a

reduction of the thermodynamic stability of the DNA. This alteration

changes transcriptional regulation, which is important in host-pathogen

interactions and is missing in specialized bacteria [60]. Another distinctive

attribute of obligate symbionts is the elevated expression of heat shock

proteins, which is linked to lower thermal stability [60]. In Buchnera and

other obligate intracellular symbionts, the expression of GroEL, a protein

associated with chaperonin, is elevated [63, 64]. Based on microarray and

quantitative RT-PCR studies on available genome sequences, in the

absence of stress, other heat shock proteins also show unusually elevated

expression in these bacteria [65, 66]. It is likely that a compensatory

adaptation balances the effects of mutations genome-wide with lower

protein stability [43, 45, 67].

Page 34: Insight into intracellular bacterial genome repertoire using

34

Loss of non-virulent genes in intracellular bacteria

After understanding the various elements involved in gene loss, it is

important to understand how gene loss must have occurred in

intracellular bacteria. Two crucial mechanisms of evolution, Lamarckian

and Darwinian, have been commonly studied [68]. The central Lamarckian

concept is that phenotypic changes result from adaptation to a niche and

can be transmitted vertically [69]. In contrast, in the present vision of

evolutionary biology and in agreement with post-Darwinian experiments,

genetic modifications produce phenotypic changes and precede the

selection of the fittest individuals in a given niche. In this situation,

genotypic changes precede phenotypic changes. Lamarckian evolution

may have been involved in bacterial speciation events associated with a

reduction in the genome size [47], a finding that contradicts the dominant

model in which speciation and fitness gains are linked to an increase in

the gene repertoire. Thus, the main course of speciation (through

adaptation to a given environment) is usually through allopatry [70] and is

related to genome size reduction through the loss of useless genes—

aIIoヴdiミg to the LaマaヴIkiaミ マodel desIヴiHed H┞ Moヴaミ, さuse it oヴ lose itざ

[40]. In several intracellular pathogens, namely Shigella, Salmonella and

Francisella tularensis, when certain genes were inactivated or deleted, the

bacteria became pathogenic. These genes are called antivirulence genes

[71]. Gene loss is seminal to specialization. As an excellent example, 100

orthologous genes were lost in all specialized bacteria, as determined by a

comparative analysis of 317 bacterial genomes from different niches [47].

The most notable genes were associated with ribosomal operons,

Page 35: Insight into intracellular bacterial genome repertoire using

35

translation regulation and metabolism [47]. In the study on B. birtlesii, the

identification of a deletion in one of the two rRNA operons and

disruptions in genes that are associated with translation showed the

importance of translation for specialization in a specific niche [60]. Other

interesting features of intracellular bacteria include gene duplication,

which facilitates adaptation to different environments; the mobilome,

which transports virulent genes (repeat elements that cause instability

and lead to evolution); and a secretion system, which assists in bacterial

colonization, invasion and survival within the niche.

Gene duplication facilitating adaptation in intracellular

bacteria

Gene duplication facilitates the adaptation of bacteria to changing

environments and new niches [72]. The high number of duplicated genes

in small intracellular bacterial genomes, including those of Rickettsia

species, constitutes an intriguing phenomenon. After gene duplication,

the copies undergo one of three possible processes: they may retain the

same function and produce an increased amount of the gene product;

they may accumulate deleterious mutations and become non-functional;

or under positive selection, they may acquire divergent mutations and

eventually evolve new functions and confer a selective advantage in a

new niche [73-75]. For example, the Rickettsia prowazekii and Rickettsia

conorii genomes both contain two copies of the virB4 gene that are

distantly related to each other and have evolved under different

Page 36: Insight into intracellular bacterial genome repertoire using

36

functional constraints [18]. These copies show differences in non-

synonymous substitution frequencies, indicating different functions and

counter-selective constraints within the same genome [76]. In a

sequenced Rickettsia spp., SpoT paralogs (4–14 copies) were found to

have functions that control the concentration of alarmone [(p)ppGpp,

guanosine tetra-and pentaphosphates] in response to starvation in

Escherichia coli, as was the relA gene. Alarmone acts as an effector of

transcription, creating changes in cellular metabolism and (p)ppGpp-

mediated regulation, which may be involved in pathogenesis and bacterial

symbiosis [77]. All 14 spoT genes were transcribed in Rickettsia felis [78]

whereas, interestingly, the five spoT genes present in R. conorii were

differentially regulated depending on the niche. Gene families such as TLc,

ProP, AmpG and Sca have been identified in Rickettsia spp., in which

multiple copies of TLc, which exchanges ADP for host cytoplasmic ATP,

may be important for efficient host cell adaptation [78]. The multiple

copies of the proline/betaine transporter ProP seem to play an important

role in the adaptation of Rickettsia spp. to osmotic stress and to host

temperature conditions. AmpG may confer natural resistaミIe to β-lactam

antibiotics, and Sca proteins function in host-parasite interactions and

adaptive responses to host defense systems [59]. A genome analysis of

Rickettsia spp. disclosed 17 members of the Sca family that showed

diverse patterns of expression across various species and whose N-

terminal domains were highly variable, which may have facilitated

immune evasion and persistent growth [78, 79].

Page 37: Insight into intracellular bacterial genome repertoire using

37

Mobilome of intracellular bacteria

In recent years, much data on the distribution of mobile genetic elements

in bacterial genomes has become available [38, 80]. The genomic science

so far indicates that most bacterial genomes have viral origins, and in

some cases these elements make up to 20% of the host genome [81].

These mobile DNA elements, such as prophages, contribute more than

50% of the strain-specific DNA in many important pathogens [82-84] and

are common transporters of virulent genes in bacteria [85-87]. They

constitute the mobilome and include transposable elements, plasmids,

bacteriophages and associated genes for which horizontal movement is

critical [88, 89]. For this reason, understanding the mobilome of

intracellular bacterial genomes is necessary.

General distribution of mobilome in intracellular bacteria

Few mobile genetic elements are observed in free-living organisms

with larger genome sizes of 4–10 Mb. Facultative intracellular bacteria are

not restricted by host replication and are capable of living and

reproducing either inside or outside of host cells, as is the case for some

pathogenic bacteria. Their genome sizes of 2–7 Mb are similar to those of

some free-living organisms, and they have intermediate population sizes

[90]. The number of mobile genetic elements found in obligate

intracellular and facultative intracellular bacteria show similar ranges, but

facultative intracellular species contain four-fold more mobile DNA

elements than obligate intracellular bacteria. This observation is

consistent with predictions that these elements are similar to those of

Page 38: Insight into intracellular bacterial genome repertoire using

38

free-living obligate species [90]. Wolbachia pipientis is an exception; its

mobile genetic elements comprise less than 2% of its genome. This

estimate is similar to the lower end of the range of facultative intracellular

bacterial species [91]. Reductive evolution is supported by the small

genome size and deletion biases [92]. The Rickettsiales order shows

reductive evolution and also contains various families of mobile elements,

such as plasmids, transposases, and phage-related genes [32, 61]

Types of mobile genetic elements

There are three main classes of mobile genetic elements that occur in

prokaryotes: a) The first are small pieces of extrachromosomal DNA that

are either linear or circular and mostly replicate independently in the

host. These elements are called plasmids and are subject to evolution.

Lateral transfer from a donor to a recipient bacterial cell by direct contact

between the cells occurs via conjugative plasmids. b) Phage elements, as

the name suggests, are derived from phages, which are viruses of bacteria

that use the host machinery to replicate by a process in which the DNA of

the phage enters the host cell and integrates into the bacterial genome as

a prophage. These integrated prophage DNA molecules are passively

inherited until DNA excision and phage-induced lysis of the bacterial cell

takes place [93]. c) Transposable elements are short inverted repeats that

typically encode for proteins that help move genes and, in a few cases, are

embedded in the prophage regions [94]. Genome analysis of Rickettsiae

revealed a large fraction of mobile DNA that helps the movement of DNA

within and between genomes [18]. Plasmids are considered conjugative

Page 39: Insight into intracellular bacterial genome repertoire using

39

plasmids when they are dispersed by conjugation from cell to cell if they

can spread autonomously. Recent genomic data and phylogenetic

analyses have established the presence of conjugative plasmids and

suggested the existence of LGT events in the Rickettsia genus [95].

Transposable elements

In Orientia tsutsugamushi, transposable elements constitute the largest

portion of mobile DNA. A similar amplification of transposable elements

was noted in other intracellular bacteria such as Wolbachia pipientis wMel

[seven types of IS elements (51 copies in total) and four types of GII

introns (17 copies)] [91], Parachlamydia sp. UWE25 [82 IS transposases

(TPases)] [96], R. felis (82 TPase) [79], and R. bellii (39 TPases) [97]. In O.

tsutsugamushi, the transposable element copy number is 10 times higher

than that of obligate intracellular bacteria. Shigella dysenteriae contains

the highest number of insertion sequence (IS) elements among the

prokaryotes (701 copies in a 4469 kb chromosome and a 183 kb plasmid)

[98]. The number of prophage genes per genome is intermediate to those

of plasmids and transposable elements, while the proportion of plasmid

genes is notably small [90]. These intracellular bacterial genomes are

dominated by transposable elements, which can integrate into a genome

that already has a copy of the same transposable element and generally

do not require a specific site for insertion [90].

In contrast, phages are site-specific and confer immunity to multiple

infections. They also serve as vectors that carry other mobile elements,

such as transposable elements, into a host genome [90]. There is a striking

Page 40: Insight into intracellular bacterial genome repertoire using

40

difference between the quantity of transposable elements and prophage-

related genes found in intracellular prokaryotes, as prophage genomes

comprise tens of genes, whereas a transposable element carries a single

gene (encoding a transposase or reverse transcriptase/maturase) [99].

Repeated palindromic elements (RPEs)

Repeated elements are usually confined to the intergenic regions of

bacterial genomes [100]. For some of these RPEs, the variable number of

tandem repeats represents inter-individual length variability and has been

used for genotyping [100, 101]. RPEs are well studied in Rickettsia spp.

They are approximately 100–150 bp and invade both coding and

noncoding regions of the genome [102-104]. With the ability to insert

themselves within the existing protein coding frame, these RPEs often

generate new reading frames within a preexisting gene, creating an

additional peptide segment of 30–50 amino acids in the final gene

product. Repetitive DNA might be inserted with the help of plasmids.

Repeats are important, as they have roles in genomic instability and

evolution. The bacterial chromosomes that contain elevated repeat

density also show significant rates of rearrangements, leading to an

accelerated loss of gene order [105]. Transposons and other extragenic

interspersed repeats may function in gene rearrangement and duplication

[106, 107].

Page 41: Insight into intracellular bacterial genome repertoire using

41

Ankyrin and tetratricopeptide repeat proteins

Ankyrin (Ank) and tetratricopeptide (TPR) repeat proteins have been

found in several intracellular bacteria and have roles in host-pathogen

interactions. Nearly 4% of the Rickettsia belli and R. felis genomes

consisted of Ank and TPR proteins [108]. These proteins participate in

various functions, including chaperone activity, cell cycle regulation,

transcription, gene regulation, signal transduction and protein transport

[109-113]. TRPs establish infection and manipulate host cell trafficking

events in L. pneumophila [57, 114, 115], whereas Ank proteins found in

Anaplasma spp., Wolbachia spp. and Ehrlichia spp. are translocated into

the host cell cytoplasm and nucleus, playing dual roles in interfering with

host cell signaling by interacting with the host cytoskeleton and in altering

gene transcription by binding to host chromatin [115]. The deletion or

mutation of genes encoding for Ank proteins reduced the virulence of

Rickettsia peacockii and Rickettsia rickettsii strain Iowa compared to the R.

rickettsii strain Sheila Smith [114].

Secretion systems machinery in Intracellular bacteria

The interactions between intracellular bacteria and the host cells are

enabled using Type IV secretion systems (T4SSs). These systems are

required for bacterial colonization, invasion and persistence within the

niche and consist of supra-molecular transporters ancestrally related to

bacterial conjugation systems. They are complex proteins embedded in

the bacterial cell envelope, and one type has been well studied in

Page 42: Insight into intracellular bacterial genome repertoire using

42

Rickettsia [79, 97], Bartonella [116], Wolbachia [117], L. pneumophila [5],

N. sennetsu [118], N. risticii [118] and O. tsutsugamushi [119]. The T4SSs

are not only able to transport diverse macromolecule substrates, proteins

and virulence factors but are also able to transfer DNA through bacterial

conjugation [30, 120-123]. Genes that encode T4SS (VirB/VirD4 and Trw)

components have been found in several species of Bartonella [116]. In

Bartonella rattaustraliani, pNH4 encodes a T4SS containing a complete set

of proteins responsible for conjugal transfer, i.e., TraA, TraC, TraD and

TraG/VirD4 [116]. These systems are described as essential pathogenicity

factors in several mammalian pathogens, including Bartonella henselae

and Bartonella tribocorum [116]. The main role for T4SSs is to translocate

virulence factors to hosts and to promote DNA transfer [121]. The protein

encoded by traA initiates DNA transfer for bacterial systems by relaxing

DNA at a site-and strand-specific nick [124], while TraC is necessary for

the assembly of F pilin into the mature F pilus structure [125]. The

coupling protein traD is essential for transferring DNA by connecting the

DNA processing machinery to the Mpf transfer apparatus [126] a and TraG

is critical for the translocation of substrates through the inner cell

membrane [127]. T4SSs in the Bartonella genus are typically located on

chromosomes, and only Bartonella grahamii has a T4SS on its plasmid

pBGR3 [128]. In L. pneumophila, Dot/Icm T4SS facilitates the inhibition of

phagosome-lysosome fusion and the recruitment to the rough

endoplasmic reticulum to support replication in the host cell. The

components of the dot/icm loci are classified as T4SSs due to homology

with genes. In Legionella, the T4SS is encoded by 26 dot/icm genes

Page 43: Insight into intracellular bacterial genome repertoire using

43

arranged in two distinct regions of the chromosome, each approximately

20 kb in length. Region I contains dotDCB and dotA-icmVWX [129]. Region

II contains 18 genes, most of which are dot and icm genes [130]. The

dot/icm loci of the five L. pneumophila strains discussed above exhibit

very high nucleotide conservation, ranging from 98 to 100% among most

orthologs. The exceptions are dotA and icmX; additionally, the icmC gene

of the Corby strain is shorter than and more divergent from (84%

nucleotide identity) that of the Paris strain. Sequence comparisons of the

dot/icm genes to other known open reading frames revealed that at least

18 of the dot/icm genes show similarity to components of the bacterial

conjugative DNA transfer systems, particularly the IncI plasmids ColIB-P9

from Shigella flexneri and R64 from Salmonella enterica [130].

The bacterial genomic information suggests that T4SSs are not

limited to Legionella and related bacteria and IncI plasmids [131].

Interestingly, nearly all the T4SSs found in sequence analyses are encoded

on plasmids [132]. Notable exceptions include the Legionella, Coxiella and

Rickettsiella Dot/Icm systems. It is likely that a common ancestor of these

closely related bacteria acquired a chromosomally encoded T4SS that

played a critical role in its survival. The chromosomal acquisition of the

T4SS might be related to the adaptation of the ancestor bacterium to an

intracellular lifestyle. The genes encoding T4SSs tend to accumulate in

several conserved gene clusters; it appears that there is little pressure to

keep them at a single locus. The conserved gene clusters include (a) dotD-

dotC-dotB (traH-traI-traJ in I-type conjugation systems), (b) dotM/icmP-

dotL/icmO (trbA-trbC), and (c) dotI/icmL-dotH/icmK-dotG/icmE (traM-

Page 44: Insight into intracellular bacterial genome repertoire using

44

traN-traO). Together with the other genes found in all T4SSs, including

dotA (traY) and dotO/icmB (traU), these conserved genes are expected to

encode core components that play fundamental roles in transport [131].

The other genes of the dot/icm system include dotH, dotI, and dotO,

which are essential for intracellular growth and evasion of the endocytic

pathway, and icmGCDJBF and icmTSRQPO, which are involved in

macrophage cell death [133]. The type IV secretion system in intracellular

bacteria is critical for survival in this intracellular niche, possibly because it

allows future specialization as a mammalian pathogen [116].

Concluding remarks

The genomic era has paved the way to major findings regarding

intracellular bacteria. Symbiosis between unicellular and multicellular

organisms has contributed considerably to the evolution of life.

Intracellular bacteria are found in a wide range of niches and from various

evolutionary trajectories, resulting in different genomic compositions.

Based on an endosymbiotic origin for mitochondria and other eukaryotic

organelles, we believe that the intracellular culture is ancient and

constantly co-evolving with the host.

The comparison of bacterial genomic content and lifestyles has revealed

that the capacity to exchange genes depends on the bacterial niche.

Allopatric speciation in bacteria is linked to the restricted opportunity to

exchange genes with other organisms, whereas gene duplications,

mutations and deletions are more often observed. The sympatric lifestyle

is linked with larger genomes, larger pan-genomes, a larger mobilome and

Page 45: Insight into intracellular bacterial genome repertoire using

45

genetic exchanges with other bacteria. It is likely that the mutual

relationships between these bacteria and their host cells may have

promoted a noticeable reduction. One of the reasons for genome

reduction could be that the intracellular niche reduces the opportunity for

gene acquisition by lateral gene transfer, and the other is that genes are

lost upon adaptation to the niche.

Comparative analyses of bacterial genomes from different lifestyles,

including free-living and host-dependent bacteria, show that host-

dependent bacteria exhibit fewer transcriptional regulators. The numbers

of abnormal or split ribosomal operons have been identified, and it

appears that this abnormal event occurred independently in several

groups of specialized bacteria. If the bacteria do not use many ribosomal

operons, they are likely to lose them, and restricting translation is critical

for specialization, as speciation is often correlated with ribosomal operon

inactivation. Comparative genomic-based analyses of free-living and host-

dependent bacteria found that host-dependent bacteria exhibited fewer

rRNA genes, more split rRNA operons and fewer transcriptional

regulators, characteristics that are linked to slow growth rates.

Lamarckian evolution may have played a role in bacterial speciation

events associated with a reduction in the genome size, an observation

that contradicts the dominant model, which assumes that speciation and

fitness gain are linked with an increase in the gene repertoire. Gene

duplication facilitates adaptation for bacteria to changing environments

and the use of new niches. Gene copies often show differences in non-

synonymous substitution frequencies, indicating different functions and

Page 46: Insight into intracellular bacterial genome repertoire using

46

counter-selective constraints within the same genome. The number of

mobile genetic elements found in obligate intracellular bacteria and

facultative intracellular species are within a similar range, but facultative

intracellular species contain four-fold more mobile DNA elements than

obligate intracellular bacteria. This observation is consistent with

predictions that these element compositions are similar to those of free-

living obligate species. Repeated palindromic elements have important

roles in genomic instability and evolution.

Intracellular bacteria possess mechanisms to protect or to invade host

cells. The interactions between intracellular bacteria and host cells are

enabled by Type IV secretion systems (T4SSs). These systems are required

for bacterial colonization, invasion and persistence within the niche and

are supra-molecular transporters ancestrally related to bacterial

conjugation systems. The main role for T4SSs is to translocate virulence

factors to hosts and to promote DNA transfer. The T4SS facilitates the

inhibition of phagosome-lysosomes fusion and facilitates the transport to

the rough endoplasmic reticulum to support replication in the host cell.

Type IV secretion systems in intracellular bacteria are critical for bacterial

survival in the intracellular niche, possibly allowing for future

specialization as a mammalian pathogen. This system is common in

intracellular bacteria and appears to have been acquired from different

origins, demonstrating that genomes have converged to adapt to a

common lifestyle.

The sequencing of additional intracellular bacterial genomes will enable

the acquisition of a more precise picture of the genetic properties

Page 47: Insight into intracellular bacterial genome repertoire using

47

associated with the intracellular lifestyle. This effort will also contribute to

a better understanding of the interactions between intracellular bacteria

and different niches and the complex mechanisms implicated in

pathogenicity.

Acknowledgements

We would like to thank Roshan Padmanabhan for his support,

suggestions, corrections and Ripsy Merrin Chacko for helpful remarks.

Page 48: Insight into intracellular bacterial genome repertoire using

48

References:

1. Zientz, E., T. Dandekar, and R. Gross, Metabolic interdependence of obligate

intracellular bacteria and their insect hosts. Microbiology and molecular

biology reviews : MMBR, 2004. 68(4): p. 745-70.

2. Gross, R., J. Hacker, and W. Goebel, The Leopoldina international symposium

on parasitism, commensalism and symbiosis--common themes, different

outcome. Molecular microbiology, 2003. 47(6): p. 1749-58.

3. Finlay, B.B. and S. Falkow, Common themes in microbial pathogenicity

revisited. Microbiology and molecular biology reviews : MMBR, 1997. 61(2): p.

136-69.

4. Fernandez-Moreira, E., J.H. Helbig, and M.S. Swanson, Membrane vesicles

shed by Legionella pneumophila inhibit fusion of phagosomes with lysosomes.

Infection and immunity, 2006. 74(6): p. 3285-95.

5. D'Auria, G., et al., Legionella pneumophila pangenome reveals strain-specific

virulence factors. BMC genomics, 2010. 11: p. 181.

6. Pilsczek, F.H., A. Nicholson-Weller, and I. Ghiran, Phagocytosis of Salmonella

montevideo by human neutrophils: immune adherence increases

phagocytosis, whereas the bacterial surface determines the route of

intracellular processing. The Journal of infectious diseases, 2005. 192(2): p.

200-9.

7. Friedland, J.S., R.J. Shattock, and G.E. Griffin, Phagocytosis of Mycobacterium

tuberculosis or particulate stimuli by human monocytic cells induces

equivalent monocyte chemotactic protein-1 gene expression. Cytokine, 1993.

5(2): p. 150-6.

8. Gil, R., A. Latorre, and A. Moya, Bacterial endosymbionts of insects: insights

from comparative genomics. Environmental microbiology, 2004. 6(11): p.

1109-22.

9. Renvoise, A., et al., Intracellular Rickettsiales: Insights into manipulators of

eukaryotic cells. Trends in molecular medicine, 2011. 17(10): p. 573-83.

10. Douglas, A.E., Mycetocyte symbiosis in insects. Biological reviews of the

Cambridge Philosophical Society, 1989. 64(4): p. 409-34.

11. Moran, N.A. and P. Baumann, Bacterial endosymbionts in animals. Current

opinion in microbiology, 2000. 3(3): p. 270-5.

12. Stepkowski, T. and A.B. Legocki, Reduction of bacterial genome size and

expansion resulting from obligate intracellular lifestyle and adaptation to soil

habitat. Acta biochimica Polonica, 2001. 48(2): p. 367-81.

13. Lynn Margulis, R.F., Symbiosis as a Source of Evolutionary Innovation:

Speciation and Morphogenesis1991: The MIT Press.

14. Margulis, L., Symbiosis and evolution. Scientific American, 1971. 225(2): p. 48-

57.

Page 49: Insight into intracellular bacterial genome repertoire using

49

15. Margulis, L., The origin of plant and animal cells. American scientist, 1971.

59(2): p. 230-5.

16. von Dohlen, C.D., et al., Mealybug beta-proteobacterial endosymbionts

contain gamma-proteobacterial symbionts. Nature, 2001. 412(6845): p. 433-6.

17. Corsaro, D., et al., Intracellular life. Critical reviews in microbiology, 1999.

25(1): p. 39-79.

18. Merhej, V. and D. Raoult, Rickettsial evolution in the light of comparative

genomics. Biological reviews of the Cambridge Philosophical Society, 2011.

86(2): p. 379-405.

19. Werren, J.H., L. Baldo, and M.E. Clark, Wolbachia: master manipulators of

invertebrate biology. Nature reviews. Microbiology, 2008. 6(10): p. 741-51.

20. McNulty, S.N., et al., Endosymbiont DNA in endobacteria-free filarial

nematodes indicates ancient horizontal genetic transfer. PloS one, 2010. 5(6):

p. e11029.

21. Klasson, L., et al., Horizontal gene transfer between Wolbachia and the

mosquito Aedes aegypti. BMC genomics, 2009. 10: p. 33.

22. Koonin, E.V., The origin and early evolution of eukaryotes in the light of

phylogenomics. Genome biology, 2010. 11(5): p. 209.

23. Dunning Hotopp, J.C., et al., Widespread lateral gene transfer from

intracellular bacteria to multicellular eukaryotes. Science, 2007. 317(5845): p.

1753-6.

24. Georgiades, K. and D. Raoult, The rhizome of Reclinomonas americana, Homo

sapiens, Pediculus humanus and Saccharomyces cerevisiae mitochondria.

Biology direct, 2011. 6: p. 55.

25. Georgiades, K., et al., Phylogenomic analysis of Odyssella thessalonicensis

fortifies the common origin of Rickettsiales, Pelagibacter ubique and

Reclimonas americana mitochondrion. PloS one, 2011. 6(9): p. e24857.

26. Casadevall, A., Evolution of intracellular pathogens. Annual review of

microbiology, 2008. 62: p. 19-33.

27. Whitman, W.B., The modern concept of the procaryote. J Bacteriol, 2009.

191(7): p. 2000-5; discussion 2006-7.

28. Georgiades, K., et al., Gene gain and loss events in Rickettsia and Orientia

species. Biology direct, 2011. 6: p. 6.

29. Gimenez, G., et al., Insight into cross-talk between intra-amoebal pathogens.

BMC genomics, 2011. 12: p. 542.

30. Moliner, C., P.E. Fournier, and D. Raoult, Genome analysis of microorganisms

living in amoebae reveals a melting pot of evolution. FEMS microbiology

reviews, 2010. 34(3): p. 281-94.

31. Audic, S., et al., Genome analysis of Minibacterium massiliensis highlights the

convergent evolution of water-living bacteria. PLoS Genet, 2007. 3(8): p. e138.

Page 50: Insight into intracellular bacterial genome repertoire using

50

32. Merhej, V., et al., The rhizome of life: the sympatric Rickettsia felis paradigm

demonstrates the random transfer of DNA sequences. Molecular biology and

evolution, 2011. 28(11): p. 3213-23.

33. Marco, D., Metagenomics and the niche concept. Theory in biosciences =

Theorie in den Biowissenschaften, 2008. 127(3): p. 241-7.

34. Wernegreen, J.J., Genome evolution in bacterial endosymbionts of insects.

Nature reviews. Genetics, 2002. 3(11): p. 850-61.

35. Mira, A., H. Ochman, and N.A. Moran, Deletional bias and the evolution of

bacterial genomes. Trends in genetics : TIG, 2001. 17(10): p. 589-96.

36. Tamas, I., et al., 50 million years of genomic stasis in endosymbiotic bacteria.

Science, 2002. 296(5577): p. 2376-9.

37. Wernegreen, J.J., For better or worse: genomic consequences of intracellular

mutualism and parasitism. Current opinion in genetics & development, 2005.

15(6): p. 572-83.

38. Moran, N.A. and G.R. Plague, Genomic changes following host restriction in

bacteria. Current opinion in genetics & development, 2004. 14(6): p. 627-33.

39. Darby, A.C., et al., Intracellular pathogens go extreme: genome evolution in

the Rickettsiales. Trends in genetics : TIG, 2007. 23(10): p. 511-20.

40. Moran, N.A., Microbial minimalism: genome reduction in bacterial pathogens.

Cell, 2002. 108(5): p. 583-6.

41. Toft, C. and S.G. Andersson, Evolutionary microbial genomics: insights into

bacterial host adaptation. Nature reviews. Genetics, 2010. 11(7): p. 465-75.

42. Degnan, P.H., A.B. Lazarus, and J.J. Wernegreen, Genome sequence of

Blochmannia pennsylvanicus indicates parallel evolutionary trends among

bacterial mutualists of insects. Genome research, 2005. 15(8): p. 1023-33.

43. Moran, N.A., Accelerated evolution and Muller's rachet in endosymbiotic

bacteria. Proceedings of the National Academy of Sciences of the United

States of America, 1996. 93(7): p. 2873-8.

44. Nakabachi, A., et al., The 160-kilobase genome of the bacterial endosymbiont

Carsonella. Science, 2006. 314(5797): p. 267.

45. van Ham, R.C., et al., Reductive genome evolution in Buchnera aphidicola.

Proceedings of the National Academy of Sciences of the United States of

America, 2003. 100(2): p. 581-6.

46. Moran, N.A., J.P. McCutcheon, and A. Nakabachi, Genomics and Evolution of

Heritable Bacterial Symbionts. Annual Review of Genetics, 2008. 42(1): p. 165-

190.

47. Merhej, V., et al., Massive comparative genomic analysis reveals convergent

evolution of specialized bacteria. Biology direct, 2009. 4: p. 13.

48. McCutcheon, J.P. and N.A. Moran, Parallel genomic evolution and metabolic

interdependence in an ancient symbiosis. Proceedings of the National

Academy of Sciences of the United States of America, 2007. 104(49): p.

19392-7.

Page 51: Insight into intracellular bacterial genome repertoire using

51

49. Perez-Brocal, V., et al., A small microbial genome: the end of a long symbiotic

relationship? Science, 2006. 314(5797): p. 312-3.

50. Shigenobu, S., et al., Genome sequence of the endocellular bacterial symbiont

of aphids Buchnera sp. APS. Nature, 2000. 407(6800): p. 81-6.

51. Arneodo, J.D., et al., Ultrastructural detection of an unusual intranuclear

bacterium in Pentastiridius leporinus (Hemiptera: Cixiidae). Journal of

invertebrate pathology, 2008. 97(3): p. 310-3.

52. Sassera, D., et al., 'Candidatus Midichloria mitochondrii', an endosymbiont of

the tick Ixodes ricinus with a unique intramitochondrial lifestyle. International

journal of systematic and evolutionary microbiology, 2006. 56(Pt 11): p. 2535-

40.

53. Moran, N.A., et al., The players in a mutualistic symbiosis: insects, bacteria,

viruses, and virulence genes. Proceedings of the National Academy of Sciences

of the United States of America, 2005. 102(47): p. 16919-26.

54. Fraser-Liggett, C.M., Insights on biology and evolution from microbial genome

sequencing. Genome research, 2005. 15(12): p. 1603-10.

55. Renesto, P., et al., Some lessons from Rickettsia genomics. FEMS microbiology

reviews, 2005. 29(1): p. 99-117.

56. Wernegreen, J.J., A.B. Lazarus, and P.H. Degnan, Small genome of Candidatus

Blochmannia, the bacterial endosymbiont of Camponotus, implies irreversible

specialization to an intracellular lifestyle. Microbiology, 2002. 148(Pt 8): p.

2551-6.

57. Cazalet, C., et al., Evidence in the Legionella pneumophila genome for

exploitation of host cell functions and high genome plasticity. Nature genetics,

2004. 36(11): p. 1165-73.

58. Fournier, P.E., et al., Analysis of the Rickettsia africae genome reveals that

virulence acquisition in Rickettsia species may be explained by genome

reduction. BMC genomics, 2009. 10: p. 166.

59. Ogata, H., et al., Mechanisms of evolution in Rickettsia conorii and R.

prowazekii. Science, 2001. 293(5537): p. 2093-8.

60. Rolain, J.M., et al., Partial disruption of translational and posttranslational

machinery reshapes growth rates of Bartonella birtlesii. mBio, 2013. 4(2): p.

e00115-13.

61. Blanc, G., et al., Reductive genome evolution from the mother of Rickettsia.

PLoS genetics, 2007. 3(1): p. e14.

62. Moran, N.A., J.P. McCutcheon, and A. Nakabachi, Genomics and evolution of

heritable bacterial symbionts. Annual Review of Genetics, 2008. 42: p. 165-90.

63. Fares, M.A., A. Moya, and E. Barrio, GroEL and the maintenance of bacterial

endosymbiosis. Trends in genetics : TIG, 2004. 20(9): p. 413-6.

64. McCutcheon, J.P. and N.A. Moran, Extreme genome reduction in symbiotic

bacteria. Nature reviews. Microbiology, 2012. 10(1): p. 13-26.

Page 52: Insight into intracellular bacterial genome repertoire using

52

65. Moran, N.A., H.E. Dunbar, and J.L. Wilcox, Regulation of transcription in a

reduced bacterial genome: nutrient-provisioning genes of the obligate

symbiont Buchnera aphidicola. J Bacteriol, 2005. 187(12): p. 4229-37.

66. Wilcox, J.L., et al., Consequences of reductive evolution for gene expression in

an obligate endosymbiont. Molecular microbiology, 2003. 48(6): p. 1491-500.

67. Fares, M.A., et al., Endosymbiotic bacteria: groEL buffers against deleterious

mutations. Nature, 2002. 417(6887): p. 398.

68. Koonin, E.V., Darwinian evolution in the light of genomics. Nucleic acids

research, 2009. 37(4): p. 1011-34.

69. Colson, P. and D. Raoult, Lamarckian evolution of the giant Mimivirus in

allopatric laboratory culture on amoebae. Frontiers in cellular and infection

microbiology, 2012. 2: p. 91.

70. Georgiades, K. and D. Raoult, Defining pathogenic bacterial species in the

genomic era. Frontiers in microbiology, 2010. 1: p. 151.

71. Bliven, K.A. and A.T. Maurelli, Antivirulence genes: insights into pathogen

evolution through gene loss. Infect Immun, 2012. 80(12): p. 4061-70.

72. Hooper, S.D. and O.G. Berg, On the nature of gene innovation: duplication

patterns in microbial genomes. Molecular biology and evolution, 2003. 20(6):

p. 945-54.

73. Schmitz-Esser, S., et al., ATP/ADP translocases: a common feature of obligate

intracellular amoebal symbionts related to Chlamydiae and Rickettsiae. J

Bacteriol, 2004. 186(3): p. 683-91.

74. Aravind, L., et al., Evidence for massive gene exchange between archaeal and

bacterial hyperthermophiles. Trends in genetics : TIG, 1998. 14(11): p. 442-4.

75. Walsh, J.B., How often do duplicated genes evolve new functions? Genetics,

1995. 139(1): p. 421-8.

76. Frank, A.C., H. Amiri, and S.G. Andersson, Genome deterioration: loss of

repeated sequences and accumulation of junk DNA. Genetica, 2002. 115(1): p.

1-12.

77. Braeken, L., B. Van der Bruggen, and C. Vandecasteele, Flux decline in

nanofiltration due to adsorption of dissolved organic compounds: model

prediction of time dependency. The journal of physical chemistry. B, 2006.

110(6): p. 2957-62.

78. Blanc, G., et al., Molecular evolution of rickettsia surface antigens: evidence of

positive selection. Molecular biology and evolution, 2005. 22(10): p. 2073-83.

79. Ogata, H., et al., The genome sequence of Rickettsia felis identifies the first

putative conjugative plasmid in an obligate intracellular parasite. PLoS

biology, 2005. 3(8): p. e248.

80. Dai, L., et al., Database for mobile group II introns. Nucleic acids research,

2003. 31(1): p. 424-6.

81. Casjens, S., Prophages and bacterial genomics: what have we learned so far?

Molecular microbiology, 2003. 49(2): p. 277-300.

Page 53: Insight into intracellular bacterial genome repertoire using

53

82. Van Sluys, M.A., et al., Comparative analyses of the complete genome

sequences of Pierce's disease and citrus variegated chlorosis strains of Xylella

fastidiosa. J Bacteriol, 2003. 185(3): p. 1018-26.

83. Banks, D.J., S.B. Beres, and J.M. Musser, The fundamental contribution of

phages to GAS evolution, genome diversification and strain emergence. Trends

in microbiology, 2002. 10(11): p. 515-21.

84. Ohnishi, M., K. Kurokawa, and T. Hayashi, Diversification of Escherichia coli

genomes: are bacteriophages the major contributors? Trends in microbiology,

2001. 9(10): p. 481-5.

85. Boyd, E.F. and H. Brussow, Common themes among bacteriophage-encoded

virulence factors and diversity among the bacteriophages involved. Trends in

microbiology, 2002. 10(11): p. 521-9.

86. Boyd, E.F., B.M. Davis, and B. Hochhut, Bacteriophage-bacteriophage

interactions in the evolution of pathogenic bacteria. Trends in microbiology,

2001. 9(3): p. 137-44.

87. Miao, E.A. and S.I. Miller, Bacteriophages in the evolution of pathogen-host

interactions. Proceedings of the National Academy of Sciences of the United

States of America, 1999. 96(17): p. 9452-4.

88. Koonin, E.V. and Y.I. Wolf, Genomics of bacteria and archaea: the emerging

dynamic view of the prokaryotic world. Nucleic acids research, 2008. 36(21):

p. 6688-719.

89. Frost, L.S., et al., Mobile genetic elements: the agents of open source

evolution. Nature reviews. Microbiology, 2005. 3(9): p. 722-32.

90. Bordenstein, S.R. and W.S. Reznikoff, Mobile DNA in obligate intracellular

bacteria. Nature reviews. Microbiology, 2005. 3(9): p. 688-99.

91. Wu, M., et al., Phylogenomics of the reproductive parasite Wolbachia pipientis

wMel: a streamlined genome overrun by mobile genetic elements. PLoS

biology, 2004. 2(3): p. E69.

92. Andersson, S.G., et al., Comparative genomics of microbial pathogens and

symbionts. Bioinformatics, 2002. 18 Suppl 2: p. S17.

93. Simek, K., et al., Changes in bacterial community composition and dynamics

and viral mortality rates associated with enhanced flagellate grazing in a

mesoeutrophic reservoir. Appl Environ Microbiol, 2001. 67(6): p. 2723-33.

94. Simser, J.A., et al., A novel and naturally occurring transposon, ISRpe1 in the

Rickettsia peacockii genome disrupting the rickA gene involved in actin-based

motility. Molecular microbiology, 2005. 58(1): p. 71-9.

95. Blanc, G., et al., Lateral gene transfer between obligate intracellular bacteria:

evidence from the Rickettsia massiliae genome. Genome research, 2007.

17(11): p. 1657-64.

96. Horn, M., et al., Illuminating the evolutionary history of chlamydiae. Science,

2004. 304(5671): p. 728-30.

Page 54: Insight into intracellular bacterial genome repertoire using

54

97. Ogata, H., et al., Genome sequence of Rickettsia bellii illuminates the role of

amoebae in gene exchanges between intracellular pathogens. PLoS Genet,

2006. 2(5): p. e76.

98. Yang, F., et al., Genome dynamics and diversity of Shigella species, the

etiologic agents of bacillary dysentery. Nucleic acids research, 2005. 33(19): p.

6445-58.

99. Labrador, M. and V.G. Corces, Transposable element-host interactions:

regulation of insertion and excision. Annu Rev Genet, 1997. 31: p. 381-404.

100. van Belkum, A., et al., Short-sequence DNA repeats in prokaryotic genomes.

Microbiology and molecular biology reviews : MMBR, 1998. 62(2): p. 275-93.

101. Fournier, P.E., et al., Use of highly variable intergenic spacer sequences for

multispacer typing of Rickettsia conorii strains. Journal of clinical

microbiology, 2004. 42(12): p. 5757-66.

102. Amiri, H., C.M. Alsmark, and S.G. Andersson, Proliferation and deterioration of

Rickettsia palindromic elements. Molecular biology and evolution, 2002.

19(8): p. 1234-43.

103. Claverie, J.M. and H. Ogata, The insertion of palindromic repeats in the

evolution of proteins. Trends in biochemical sciences, 2003. 28(2): p. 75-80.

104. Ogata, H., et al., Selfish DNA in protein-coding genes of Rickettsia. Science,

2000. 290(5490): p. 347-50.

105. Rocha, E.P., DNA repeats lead to the accelerated loss of gene order in bacteria.

Trends in genetics : TIG, 2003. 19(11): p. 600-3.

106. Baldridge, G.D., et al., Transposon insertion reveals pRM, a plasmid of

Rickettsia monacensis. Appl Environ Microbiol, 2007. 73(15): p. 4984-95.

107. Moran, J.V., R.J. DeBerardinis, and H.H. Kazazian, Jr., Exon shuffling by L1

retrotransposition. Science, 1999. 283(5407): p. 1530-4.

108. Ogata, H., et al., Rickettsia felis, from culture to genome sequencing. Annals of

the New York Academy of Sciences, 2005. 1063: p. 26-34.

109. Li, J., A. Mahajan, and M.D. Tsai, Ankyrin repeat: a unique motif mediating

protein-protein interactions. Biochemistry, 2006. 45(51): p. 15168-78.

110. Mosavi, L.K., et al., The ankyrin repeat as molecular architecture for protein

recognition. Protein science : a publication of the Protein Society, 2004. 13(6):

p. 1435-48.

111. Rubtsov, A.M. and O.D. Lopina, Ankyrins. FEBS letters, 2000. 482(1-2): p. 1-5.

112. Blatch, G.L. and M. Lassle, The tetratricopeptide repeat: a structural motif

mediating protein-protein interactions. BioEssays : news and reviews in

molecular, cellular and developmental biology, 1999. 21(11): p. 932-9.

113. Bork, P., Hundreds of ankyrin-like repeats in functionally diverse proteins:

mobile modules that cross phyla horizontally? Proteins, 1993. 17(4): p. 363-74.

114. Felsheim, R.F., T.J. Kurtti, and U.G. Munderloh, Genome sequence of the

endosymbiont Rickettsia peacockii and comparison with virulent Rickettsia

rickettsii: identification of virulence factors. PloS one, 2009. 4(12): p. e8361.

Page 55: Insight into intracellular bacterial genome repertoire using

55

115. Caturegli, P., et al., ankA: an Ehrlichia phagocytophila group gene encoding a

cytoplasmic protein antigen with ankyrin repeats. Infection and immunity,

2000. 68(9): p. 5277-83.

116. Saisongkorh, W., et al., Evidence of transfer by conjugation of type IV secretion

system genes between Bartonella species and Rhizobium radiobacter in

amoeba. PloS one, 2010. 5(9): p. e12666.

117. Saridaki, A. and K. Bourtzis, Wolbachia: more than just a bug in insects

genitals. Current opinion in microbiology, 2010. 13(1): p. 67-72.

118. Lin, M., et al., Analysis of complete genome sequence of Neorickettsia risticii:

causative agent of Potomac horse fever. Nucleic acids research, 2009. 37(18):

p. 6076-91.

119. Cho, N.H., et al., The Orientia tsutsugamushi genome reveals massive

proliferation of conjugative type IV secretion system and host-cell interaction

genes. Proceedings of the National Academy of Sciences of the United States

of America, 2007. 104(19): p. 7981-6.

120. Burns, D.L., Type IV transporters of pathogenic bacteria. Current opinion in

microbiology, 2003. 6(1): p. 29-34.

121. Christie, P.J., Type IV secretion: intercellular transfer of macromolecules by

systems ancestrally related to conjugation machines. Molecular microbiology,

2001. 40(2): p. 294-305.

122. Christie, P.J. and J.P. Vogel, Bacterial type IV secretion: conjugation systems

adapted to deliver effector molecules to host cells. Trends in microbiology,

2000. 8(8): p. 354-60.

123. Deng, W., et al., VirE1 is a specific molecular chaperone for the exported

single-stranded-DNA-binding protein VirE2 in Agrobacterium. Molecular

microbiology, 1999. 31(6): p. 1795-807.

124. Chen, I., P.J. Christie, and D. Dubnau, The ins and outs of DNA transfer in

bacteria. Science, 2005. 310(5753): p. 1456-60.

125. Schandel, K.A., M.M. Muller, and R.E. Webster, Localization of TraC, a protein

involved in assembly of the F conjugative pilus. J Bacteriol, 1992. 174(11): p.

3800-6.

126. Beranek, A., et al., Thirty-eight C-terminal amino acids of the coupling protein

TraD of the F-like conjugative resistance plasmid R1 are required and sufficient

to confer binding to the substrate selector protein TraM. J Bacteriol, 2004.

186(20): p. 6999-7006.

127. Schroder, G. and E. Lanka, TraG-like proteins of type IV secretion systems:

functional dissection of the multiple activities of TraG (RP4) and TrwB (R388). J

Bacteriol, 2003. 185(15): p. 4371-81.

128. Berglund, E.C., et al., Run-off replication of host-adaptability genes is

associated with gene transfer agents in the genome of mouse-infecting

Bartonella grahamii. PLoS genetics, 2009. 5(7): p. e1000546.

Page 56: Insight into intracellular bacterial genome repertoire using

56

129. Matthews, M. and C.R. Roy, Identification and subcellular localization of the

Legionella pneumophila IcmX protein: a factor essential for establishment of a

replicative organelle in eukaryotic host cells. Infection and immunity, 2000.

68(7): p. 3971-82.

130. Vogel, J.P., et al., Conjugative transfer by the virulence system of Legionella

pneumophila. Science, 1998. 279(5352): p. 873-6.

131. Nagai, H. and T. Kubori, Type IVB Secretion Systems of Legionella and Other

Gram-Negative Bacteria. Frontiers in microbiology, 2011. 2: p. 136.

132. Nora, T., et al., Molecular mimicry: an important virulence strategy employed

by Legionella pneumophila to subvert host functions. Future microbiology,

2009. 4(6): p. 691-701.

133. Andrews, H.L., J.P. Vogel, and R.R. Isberg, Identification of linked Legionella

pneumophila genes essential for intracellular growth and evasion of the

endocytic pathway. Infection and immunity, 1998. 66(3): p. 950-8.

Page 57: Insight into intracellular bacterial genome repertoire using

57

Table S1: List of some of the main sequenced intracellular genomes (as of October 2013) indicating the genome size, GC contents,

number of protein coding genes, number of plasmids and the year of publishing along with its lifestyle. Lifestyle: FI- Facultative

intracellular; OI- Obligate intracellular

Niche Bacteria Size GC % Protein Plasmids Year Citation

gammaproteobacteria

OI Buchnera aphidicola Acyrthosiphon pisum 5p 0.64 26 555 0 2009 1

OI Buchnera aphidicola Acyrthosiphon pisum 0.64 26 564 2 2001 2

OI Buchnera aphidicola Baizongia pistaciae 0.62 25 504 1 2003 3

OI Buchnera aphidicola Cinara cedri 0.42 20 357 1 2006 4

OI Buchnera aphidicola Schizaphis graminum 0.64 25 546 0 2002 5

OI Buchnera aphidicola Acyrthosiphon pisum Tuc7 0.64 26 553 0 2009 1

OI Wigglesworthia glossinidia 0.7 22 611 1 2002 6

OI Candidatus Blochmannia floridanus 0.71 27 583 0 2003 7

OI Candidatus Blochmannia pennsylvanicus str. BPEN 0.79 29 610 0 2005 8

OI Baumannia cicadellinicola str Hc 0.69 33 595 0 2006 9

FI Sodalis glossinidius 4.17 54 2432 3 2006 10

OI Candidatus Hamiltonella defensa 5AT 2.1 40 2094 1 2009 11

FI Photorhabdus asymbiotica 5.06 42 4390 1 2009 12,13

OI Candidatus Carsonella ruddii 0.16 16 182 0 2006 14

FI Shigella flexneri 2a 2457T 4.6 50 4060 0 2003 15

FI Shigella flexneri 2a 301 4.6 50 4176 1 2002 16

FI Shigella flexneri 2a 5 str. 8401 4.57 50 4114 0 2006 17

FI Legionella pneumophila Corby 3.58 38 3204 0 2007 18

FI Legionella pneumophila Lens 3.35 38 2878 1 2004 19

FI Legionella pneumophila Paris 3.5 38 3027 1 2004 20

FI Legionella pneumophila subsp. pneumophila str. Philadelphia 1 3.4 38 2942 0 2001 19

FI Legionella pneumophila 2300/99 Alcoy 3.5 38 3190 0 2010 19

FI Legionella pneumophila subsp. pneumophila LPE509 3.5 38 3331 1 2013 21

Page 58: Insight into intracellular bacterial genome repertoire using

58

FI Legionella pneumophila subsp. pneumophila str. Thunder Bay 3.5 38 2998 0 2013 21

FI Legionella longbeachae NSW150 4.1 37 3470 1 2010 22

OI Coxiella burnetii CbuG_Q212 2 42 1866 0 2008 23

OI Coxiella burnetii CbuK_Q154 2.1 42 1900 1 2008 23

OI Coxiella burnetii Dugway 5J108-111 2.2 42 1993 1 2007 23

OI Coxiella burnetii RSA 331 2 42 1930 1 2009 23

OI Coxiella burnetii RSA 493 2 42 1817 1 2001 23

FI Francisella tularensis subsp. holarctica FSC200 1.9 32 1438 0 2012 24

FI Francisella tularensis subsp. tularensis TI0902 1.9 32 1544 0 2012 25

FI Francisella tularensis subsp. tularensis TIGB03 2.0 32 1624 0 2012 26

FI Francisella tularensis subsp. tularensis NE061598 1.9 32 1836 0 2009 27

FI Francisella tularensis subsp. holarctica F92 1.9 32 1842 0 2012 28

FI Francisella tularensis holarctica FTNF002-00 FTA 1.9 32 1580 0 2007 29

FI Francisella tularensis holarctica LVS 1.9 32 1754 0 2006 30

FI Francisella tularensis holarctica OSU18 1.9 32 1555 0 2006 30

FI Francisella tularensis mediasiatica FSC147 1.9 32 1406 0 2008 31

FI Francisella tularensis tularensis FSC198 1.9 32 1605 0 2006 30

FI Francisella tularensis tularensis SCHU S4 Schu S4 1.9 32 1604 0 2004 32

FI Francisella tularensis tularensis WY96-3418 1.9 32 1634 0 2007 29

OI Candidatus Vesicomyosocius okutanii HA 1.02 31 937 0 2007 33

OI Candidatus Ruthia magnifica str. Cm 1.16 34 976 0 2006 34

betaproteobacteria

FI Burkholderia mallei ATCC 23344 5.83 68 5024 0 2004 35

FI Burkholderia mallei NCTC 10229 5.76 68 5509 0 2007 35

FI Burkholderia mallei NCTC 10247 5.85 68 5415 0 2007 35

FI Burkholderia mallei SAVP1 5.23 68 5188 0 2007 35

OI Polynucleobacter necessarius subsp. asymbioticus QLW-

P1DMWA-1

2.16 44 2077 0 2007 36

OI Polynucleobacter necessarius subsp. necessarius STIR1 1.56 45 1508 0 2008 36

Page 59: Insight into intracellular bacterial genome repertoire using

59

alphaproteobacteria

FI Bartonella bacilliformis KC583 1.45 38 1283 0 2007 37

FI Bartonella grahamii as4aup 2.34 38 1737 1 2009 38

FI Bartonella henselae str. Houston-1 1.93 38 1488 0 2004 39

FI Bartonella quintana str. Toulouse Toulose 1.58 38 1142 0 2004 39

FI Bartonella tribocorum CIP 105476 2.6 38 2069 1 2007 40

OI Candidatus Hodgkinia cicadicola str. Dsem 0.14 58 169 1 2009 41

FI Phenylobacterium zucineum (strain HLK1) 4 71 3529 1 2007 42

OI Anaplasma Centrale str. Israel 1.2 49 923 0 2009 43

OI Anaplasma marginale str. Florida 1.2 49 940 0 2009 44

OI Anaplasma marginale str. St. Maries 1.2 49 948 0 2003 45

OI Anaplasma phagocytophilum 1.47 41 1264 0 2006 46

OI Ehrlichia canis str. Jake 1.3 28 925 0 2005 47

OI Ehrlichia chaffeensis str. Arkansas 1.18 30 1105 0 2006 48

OI Ehrlichia ruminantium str. Gardel 1.5 27 950 0 2005 49

OI Ehrlichia ruminantium str. Welgevonden 1.51 27 958 0 2005 50

OI Ehrlichia ruminantium str. Welgevonden 1.5 27 888 0 2003 50

OI Wolbachia pipientis wPip 1.5 34 1275 0 2008 51

OI Wolbachia pipientis wMel 1.27 35 1195 0 2002 52

OI Wolbachia pipientis wMel TRS 1.08 34 805 0 2005 53

OI Wolbachia pipientis wRi 1.45 35 1150 0 2009 54

OI Neorickettsia risticii str. Illinois 0.88 41 892 0 2009 55

OI Neorickettsia sennetsu str. Miyayama 0.86 41 932 0 2006 56

OI Rickettsia africae ESF-5 1.28 32 1030 1 2009 57

OI Rickettsia akari str. Hartford 1.23 32 1258 0 2007 58

OI Rickettsia bellii OSU 85-389 1.53 31 1475 0 2007 58

OI Rickettsia bellii RML369-C 1.52 31 1429 0 2006 59

OI Rickettsia canadensis str. McKiel 1.16 31 1090 0 2007 60

OI Rickettsia conorii str. Malish 7 1.27 32 1374 0 2001 61

Page 60: Insight into intracellular bacterial genome repertoire using

60

OI Rickettsia felis URRWXCal2 1.49 32 1400 2 2005 62

OI Rickettsia massiliae MTU5 1.36 32 968 1 2007 63

OI Rickettsia peacockii str. Rustic 1.3 32 927 1 2009 64

OI Rickettsia prowazekii str. Madrid E 1.1 29 835 0 2001 65

OI Rickettsia rickettsii str. 'Sheila Smith' 1.26 32 1343 0 2007 66

OI Rickettsia rickettsii str. Iowa 1.27 32 1383 0 2008 67

OI Rickettsia typhi str. Wilmington 1.11 28 838 0 2004 68

OI Candidatus Rickettsia amblyommii str. GAT-30V 1.48 32 1390 3 2012 69

OI Rickettsia australis str. Cutlack 1.32 32 1261 1 2012 70

OI Rickettsia canadensis str. CA410 1.15 31 1016 0 2012 71

OI Rickettsia heilongjiangensis 054 1.28 32 1297 0 2011 72

OI Rickettsia japonica YH 1.28 32 971 0 2011 73

OI Rickettsia massiliae str. AZT80 1.28 33 1207 1 2012 74

OI Rickettsia montanensis str. OSU 85-930 1.28 33 1217 0 2012 75

OI Rickettsia parkeri str. Portsmouth 1.30 32 1318 0 2012 76

OI Rickettsia philipii str. 364D 1.29 33 1344 0 2012 77

OI Rickettsia prowazekii str. Breinl 1.11 29 920 0 2013 78

OI Rickettsia prowazekii str. BuV67-CWPP 1.11 29 843 0 2012 79

OI Rickettsia prowazekii str. Chernikova 1.11 29 845 0 2012 80

OI Rickettsia prowazekii str. Dachau 1.11 29 839 0 2012 81

OI Rickettsia prowazekii str. GvV257 1.11 29 829 0 2012 82

OI Rickettsia prowazekii str. Katsinyian 1.11 29 844 0 2012 83

OI Rickettsia prowazekii str. NMRC Madrid E 1.11 29 938 0 2013 84

OI Rickettsia prowazekii str. Rp22 1.11 29 950 0 2010 85

OI Rickettsia prowazekii str. RpGvF24 1.11 29 834 0 2012 86

OI Rickettsia rhipicephali str. 3-7-female6-CWPP 1.31 32 1266 1 2012 87

OI Rickettsia rickettsii str. Arizona 1.27 32 1343 0 2012 88

OI Rickettsia rickettsii str. Brazil 1.26 33 1332 0 2012 89

OI Rickettsia rickettsii str. Colombia 1.27 33 1350 0 2012 90

Page 61: Insight into intracellular bacterial genome repertoire using

61

OI Rickettsia rickettsii str. Hauke 1.27 33 1340 0 2012 91

OI Rickettsia rickettsii str. Hino 1.27 33 1335 0 2012 92

OI Rickettsia rickettsii str. Hlp#2 1.27 33 1308 0 2012 93

OI Rickettsia slovaca 13-B 1.28 33 1112 0 2011 94

OI Rickettsia slovaca str. D-CWPP 1.28 33 1347 0 2012 95

OI Rickettsia typhi str. B9991CWPP 1.11 29 839 0 2012 96

OI Rickettsia typhi str. TH1527 1.11 29 838 0 2012 96

OI Orientia tsutsugamushi str. Boryong 2.13 30 1182 0 2007 97

OI Orientia tsutsugamushi str. Ikeda 2 30 1967 0 2008 98

deltaproteobacteria

OI Lawsonia intracellularis PHE/MN1-00 1.46 33 1180 3 2006 99

Bacteroidetes

OI Candidatus Sulcia muelleri GWSS 0.25 22 227 0 2007 100

OI Candidatus Sulcia muelleri SMDSEM 0.28 22 242 0 2009 101

OI Blattabacterium Bge 0.64 27 586 0 2009 102

OI Blattabacterium BPLAN 0.64 28 576 1 2009 102

OI Candidatus Amoebophilus asiaticus 5a2 1.9 35 1283 0 2008 103

Actinobacteria

OI Mycobacterium leprae TN 3.27 57 1605 0 2001 104

FI Renibacterium salmoninarum ATCC 33209 3.16 56 3507 0 2007 105

FI Tropheryma whipplei TW08/27 0.93 46 783 0 2003 106

FI Tropheryma whipplei Twist 0.93 46 808 0 2003 107

Chlamydiae

OI Chlamydophila abortus S26/3 1.14 39 932 0 2003 108

OI Chlamydophila caviae GPIC 1.17 39 998 1 2002 109

OI Chlamydophila felis Fe/C-56 1.17 39 1005 1 2006 110

OI Chlamydophila pneumoniae AR39 1.23 40 1112 0 2001 111

OI Chlamydophila pneumoniae CWL029 1.23 40 1052 0 2001 112

OI Chlamydophila pneumoniae J138 1.23 40 1069 0 2001 113

Page 62: Insight into intracellular bacterial genome repertoire using

62

OI Chlamydophila pneumoniae TW-183 1.23 40 1113 0 2003 114

OI Chlamydia muridarum Nigg 1.07 40 904 1 2001 111

OI Chlamydia trachomatis A/HAR-13 1.04 41 911 1 2005 115

OI Chlamydia trachomatis D/UW-3/CX 1.04 41 895 0 2001 116

OI Chlamydia trachomatis 434/Bu 1.04 41 874 0 2008 117

OI Chlamydia trachomatis B/Jali20/OT Jali20 1.04 41 875 0 2009 118

OI Chlamydia trachomatis B/TZ1A828/OT 1.04 41 880 0 2009 119

OI Chlamydia trachomatis L2b/UCH-1/proctitis 1.04 41 874 0 2008 120

OI Candidatus Protochlamydia amoebophila UWE25 2.41 34 2031 0 2004 121

Firmicutes

FI Listeria monocytogenes Clip80459 CLIP80459 2.9 38 2766 0 2009 122

FI Listeria monocytogenes EGD-e 2.94 37 2846 0 2001 123

FI Listeria monocytogenes HCC23 2.98 38 2974 0 2008 124

FI Listeria monocytogenes str. 4b F2365 2.91 38 2821 0 2001 125

Tenericutes

OI Phytoplasma Onion yellows OY-M 0.85 27 750 0 2003 126

OI Phytoplasma Australiense 0.88 27 684 0 2008 127

OI Phytoplasma Aster yellows witches-broom AYWB 0.71 26 671 4 2006 128

OI Phytoplasma mali str. AT 0.6 21 479 0 2008 129

OI Mycoplasma penetrans HF-2 1.36 25 1037 0 2002 130

Page 63: Insight into intracellular bacterial genome repertoire using

63

Chapter 3

Genome sequencing of intracellular bacteria

Page 64: Insight into intracellular bacterial genome repertoire using

64

Page 65: Insight into intracellular bacterial genome repertoire using

65

3.1 Article 1 :

Genome Sequence of Diplorickettsia massiliensis, an

Emerging Ixodes ricinus-Associated Human Pathogen

Mano J. Mathew1, Geetha Subramanian1, Thi-Tien Nguyen1,

Catherine Robert1, Oleg Mediannikov1, Pierre-Edouard Fournier1,

Didier Raoult1*

1 Unité de Recherche sur les Maladies Infectieuses et Tropicales

Emergentes: URMITE, Aix Marseille Université, UMR CNRS 7278,

IRD 198, INSERM 109, Faculté de Médecine, 27 Bd Jean Moulin,

13005, Marseille, France.

Published in J. Bacteriol. June 2012 vol. 194 no. 12 3287

*Corresponding author. E-mail: [email protected]

Page 66: Insight into intracellular bacterial genome repertoire using

66

Page 67: Insight into intracellular bacterial genome repertoire using

67

Preamble to article 1

The order Legionellales is composed of several pathogenic, aerobic,

motile and nutritionally fastidious pleomorphic gram negative bacteria

from the class gammaproteobacteria. The order Legionellales is

composed of two families: Legionellaceae and Coxiellaceae. Many species

of Legionella cause legionellosis. The family Coxiellaceae consists of

Aquicella, Coxiella (an intracellular bacterium that is the causative agent

of Q fever) (Beare, et al., 2009), Diplorickettsia and Rickettsiella (an

intracellular parasite of Gryllus bimaculatus) (Roux, et al., 1997,

Mediannikov, et al., 2010). Almost all bacteria isolated from ticks (Ixodes

ricinus) are pathogenic for humans, notably Borrelia burgdorferi, Borrelia

afzelii, Borrelia garinii, Rickettsia helvetica, Rickettsia monacensis and

Francisella tularensis (Parola & Raoult, 2001). F. tularensis, which causes

tularemia or plague-like disease, belongs to the Thiotrichales order

(Beckstrom-Sternberg, et al., 2007).

D. massiliensis strain 20B is an obligate intracellular, gram negative

bacterium isolated from Ixodes ricinus ticks collected in 2006 from the

southeastern part of the Rovinka forest in Slovakia (Mediannikov, et al.,

2010). D. massiliensis belongs to the Gammaproteobacteria class, is non-

endospore-forming, and is shaped as small rods that are usually grouped

in pairs. An initial phylogenetic analysis based on 16S rRNA showed that

D. massiliensis clustered with Rickettsiella grylli (Roux, et al., 1997,

Mediannikov, et al., 2010). Because of its low 16S rDNA similarity (94%)

Page 68: Insight into intracellular bacterial genome repertoire using

68

with R. grylli, it was classified as a new genus Diplorickettsia into the

family Coxiellaceae and the order Legionellales (Mediannikov, et al.,

2010). D. massiliensis strain 20B was identified in three patients with

suspected tick-borne infections that exhibited a specific seroconversion.

The evidence of infection was further reconfirmed by using PCR-assay,

thus establishing its role as a human pathogen. This article reports the

genome of D. massiliensis 20B, contains 1,727,973 bp with a G+C content

of 38.9%. When compared to closely related gammaproteobacteria,

D. massiliensis, with 1.7 Mb, had a bigger genome than Rickettsiella grylli,

with 1.4 Mb but smaller than Coxiella burnetii strain CbuK_Q154, with 2.0

Mb. However, D. massiliensis had more metabolism-related genes

(501 genes) than Rickettsiella grylli (360) and Coxiella burnetii (459); it

also had more genes involved in energy production and conversion

(109 versus 75 and 84, respectively) and more genes involved in

translation, ribosomal structure, and biogenesis (170 versus 134 and 135,

respectively).

Page 69: Insight into intracellular bacterial genome repertoire using

69

Page 70: Insight into intracellular bacterial genome repertoire using

70

Page 71: Insight into intracellular bacterial genome repertoire using

71

Page 72: Insight into intracellular bacterial genome repertoire using

72

Page 73: Insight into intracellular bacterial genome repertoire using

73

Chapter 4

Comparative genomics

Page 74: Insight into intracellular bacterial genome repertoire using

74

Page 75: Insight into intracellular bacterial genome repertoire using

75

4.1 Article 2 :

The genomic repertoire of Diplorickettsia massiliensis

reveals its allopatric lifestyle

Mano J. Mathew1, Laetitia Rouli1and Didier Raoult1*

1Unité de Recherche sur les Maladies Infectieuses et Tropicales

Emergentes: URMITE, Aix Marseille Université, UMR CNRS 7278,

IRD 198, INSERM 109, Faculté de Médecine, 27 Bd Jean Moulin,

13005, Marseille, France.

Submitted to Biology Direct

*Corresponding author. E-mail: [email protected]

Page 76: Insight into intracellular bacterial genome repertoire using

76

Page 77: Insight into intracellular bacterial genome repertoire using

77

Preamble to article 2

In this study, we used a pangenomic approach to elucidate strain-specific

genes as well as genomic differences and similarities between

Diplorickettsia massiliensis strain 20B and twenty-nine sequenced

species, including Legionella strains, Coxiella burnetii strains, F. tularensis

strains and R. grylli. We conducted a global pangenome analysis with

these thirty genomes as well as individual pangenome sets belonging to

Coxiella, Legionella and Francisella. An individual pangenome was

constructed for the Coxiella genus using five sequenced Coxiella burnetii

reference strains, ten sequenced L. pneumophila strains and twelve

sequenced F. tularensis strains. Another pangenome set was constructed

from ten sequenced L. pneumophila strains and a single L. longbeachae

NSW150 strain. A single R. grylli genome and the D. massiliensis strain

20B genome were also included in the above-mentioned pangenome set.

We estimated the sizes of both the pangenome and the core genomes.

Based on these pangenomes, we described the distribution of functional

genes and gene families across the different genomes analyzed, and

specifically characterized the D. massiliensis strain 20B genome.

Page 78: Insight into intracellular bacterial genome repertoire using

78

Page 79: Insight into intracellular bacterial genome repertoire using

79

Title: The genomic repertoire of Diplorickettsia massiliensis reveals its

allopatric lifestyle

Running title: The genomic repertoire of Diplorickettsia massiliensis

reveals its allopatric lifestyle

Mano J. Mathew1, Laetitia Rouli1and Didier Raoult1*

1 Unité de Recherche sur les Maladies Infectieuses et Tropicales

Emergentes: URMITE, Aix Marseille Université, UMR CNRS 7278,

IRD 198, INSERM 109, Faculté de Médecine, 27 Bd Jean Moulin,

13005, Marseille, France.

Submitted to Biology Direct

*Corresponding author. E-mail: [email protected]

Page 80: Insight into intracellular bacterial genome repertoire using

80

Abstract Background Diplorickettsia massiliensis strain 20B is an obligate intracellular, gram-

negative bacterium isolated from Ixodes ricinus ticks collected in Slovakia.

In this study, we compared the genomic features of D. massiliensis strain

20B with twenty-nine sequenced Gammaproteobacteria species

(Legionella strains, Coxiella burnetii strains, Francisella tularensis strains

and Rickettsiella grylli) using multi-genus pangenomic approach.

Results Using phylogenomic analysis, we found that D. massiliensis shares 635

genes with Rickettsiella grylli and clusters with Coxiella burnetii. We

identified 908 genes (61.56%) in common with Gammaproteobacteria

that constitute the core genome of D. massiliensis and 518 genes

(35.12%) that represent the dispensable genome. We also identified a link

between total gene content and different bacterial lifestyles. We

observed that fewer genes and a lower G+C content correlated with a

smaller genome size and helped the bacteria to adapt to the host.

Because of the reduced genomic repertoire, we speculate that fewer

lateral gene transfers have occurred in D. massiliensis. A pangenomic

approach allowed us to explore the different strategies by which

facultative or obligate intracellular organisms specialize to particular host.

Conclusion These results significantly contribute to our understanding of genome

repertoires. This approach can be used to uncover interesting genomic

features that cannot be predicted using conventional methods.

Moreover, the variability that we identified between the L. pneumophila

strains and L. longbeachae NSW150 may warrant re-classifying them as

separate subspecies.

Keywords: Genome repertoire; pangenome; Diplorickettsia; allopatric;

comparative genomics

Page 81: Insight into intracellular bacterial genome repertoire using

81

Background

The order Legionellales is composed of several pathogenic, aerobic,

motile and nutritionally fastidious pleomorphic gram negative bacteria

from the class gammaproteobacteria. The order Legionellales is

composed of two families: Legionellaceae and Coxiellaceae. Many species

of Legionella cause legionellosis. The family Coxiellaceae consists of

Aquicella, Coxiella (an intracellular bacterium that is the causative agent

of Q fever) [1], Diplorickettsia and Rickettsiella (an intracellular parasite of

Gryllus bimaculatus) [2, 3]. Almost all bacteria isolated from ticks (Ixodes

ricinus) are pathogenic for humans, notably Borrelia burgdorferi, Borrelia

afzelii, Borrelia garinii, Rickettsia helvetica, Rickettsia monacensis and

Francisella tularensis [4]. F. tularensis, which causes tularemia or plague-

like disease, belongs to the Thiotrichales order [5].

D. massiliensis strain 20B is an obligate intracellular, gram negative

bacterium isolated from Ixodes ricinus ticks collected in 2006 from the

southeastern part of the Rovinka forest in Slovakia [3]. D. massiliensis

belongs to the Gammaproteobacteria class, is non-endospore-forming,

and is shaped as small rods that are usually grouped in pairs. An initial

phylogenetic analysis based on 16S rRNA showed that D. massiliensis

clustered with Rickettsiella grylli [2, 3]. Because of its low 16S rDNA

similarity (94%) with R. grylli, it was classified as a new genus

Diplorickettsia into the family Coxiellaceae and the order Legionellales [3].

D. massiliensis strain 20B was identified in three patients with suspected

tick-borne infections that exhibited a specific seroconversion. The

evidence of infection was further reconfirmed by using PCR-assay, thus

Page 82: Insight into intracellular bacterial genome repertoire using

82

establishing its role as a human pathogen. Whole genome sequencing

was performed at a later date [6, 7].

Recent advances in next generation sequencing techniques have led

to the initiation of large-scale microbial genome projects [8]. Comparative

genomics studies use conventional non-sequence-based technologies

such as microarrays targeting genes or non-coding regions, studies of

specific pathways and whole genome sequence alignment [9]. Bacterial

strains from the same species may exhibit variations in their genetic

repertoire, with differences in both genomic structure and sequence

between strains, reflecting the extraordinary adaptability of prokaryotic

species. Thus, sequencing a single genome per species it is often

insufficient for describing the genetic variability of the species. This led to

the concept of a pangenomic approach, which takes into account the

genetic makeup of a bacterial species and its genomic diversity from

genus to genus. The pangenome of a bacterial species is larger than the

total gene content of any individual strain within the species.

The pangenome is composed of three parts: the core genome

(genes shared by all of the strains), the accessory or dispensable genome

(shared by only some of the strains) and unique genes (strain-specific)

[10]. The accessory genome can reveal evidence of lateral gene transfer

events that occurred during the evolutionary history of a strain and likely

contributed to the evolutionary potential of the organism. Furthermore, a

distinction can be made between closed pangenomes and open

pangenomes. A pangenome is closed when, despite the addition of new

genomes, the gene content remains unchanged, such as in Bacillus

Page 83: Insight into intracellular bacterial genome repertoire using

83

anthracis [9, 10]. In contrast, a pangenome is open, as in the case of

Escherichia coli [11], if the gene pool increases with the addition of new

genomes. Pangenome studies can reveal changes that are not easily

detectable using standard annotation analysis [12]. For example,

pangenome studies have facilitated the identification of strain-specific

genes in L. pneumophila. The L. pneumophila dispensable genome,

acquired by horizontal gene transfer, may act as a reservoir that could

confer evolutionary advantages over strains that lack this gene pool [13].

These microbial pathogens exhibit a striking ability to adapt to new hosts,

antibiotics, and host immune systems [14].

In this study, we used a pangenomic approach to elucidate strain-

specific genes as well as genomic differences and similarities between

D. massiliensis strain 20B and twenty-nine sequenced species, including

Legionella strains, Coxiella burnetii strains, F. tularensis strains and

R. grylli. We conducted a global pangenome analysis with these thirty

genomes as well as individual pangenome sets belonging to Coxiella,

Legionella and Francisella. An individual pangenome was constructed for

the Coxiella genus using five sequenced Coxiella burnetii reference

strains, ten sequenced L. pneumophila strains and twelve sequenced

F. tularensis strains. Another pangenome set was constructed from ten

sequenced L. pneumophila strains and a single L. longbeachae NSW150

strain. A single R. grylli genome and the D. massiliensis strain 20B genome

were also included in the above-mentioned pangenome set. We

estimated the sizes of both the pangenome and the core genomes. Based

on these pangenomes, we described the distribution of functional genes

Page 84: Insight into intracellular bacterial genome repertoire using

84

and gene families across the different genomes analyzed, and specifically

characterized the D. massiliensis strain 20B genome.

Results

Comparison of genomic features

The main features of the genomes analyzed here are summarized in

Table 1. The chromosomes from the thirty genomes compared in this

study range in size from 1.6 to 4.15 Mb and have a G+C content ranging

from 37.1 to 42.6%. The R. grylli genome is smaller than the

D. massiliensis strain 20B and F. tularensis genomes (1.6, 1.7 and 1.9 Mb,

respectively), and the L. longbeachae strain NSW150 genome (4.1 Mb) is

larger than those of other L. pneumophila strains. The number of protein

coding genes per genome within the various strains and species is

relatively similar, but the gene composition is much more variable. The

distribution of proteins by length among the organisms is shown in Figure

1. We compared the genomic and proteomic repertoires based on protein

length, genome size, and G+C content and found that D. massiliensis

strain 20B fell between the two groups (the first group being C. burnetii

and F. tularensis while the second group is L. pneumophila) but closer to

the former, which has more pathogenic proteins. The coding density of

these genomes ranges from 71.29% to 90.86%. In the Legionella species,

coding regions account for more than 85% of the genome. The number of

proteins associated with D. massiliensis strain 20B is much larger than in

R. grylli, F. tularensis and C. burnetii. Figure 2 summarizes the distribution

Page 85: Insight into intracellular bacterial genome repertoire using

85

of G+C content (%) and genome size (Mb). The Kyoto Encyclopedia of

Genes and Genomes (KEGG) characteristics of the organisms are

summarized in Figure 3.

Pangenome analysis

Figure 4 summarizes the results from the individual pangenomes of

C. burnetii, Legionella, L. pneumophila, F. tularensis as well as the set of

all thirty genomes analyzed. These genomes are characterized by a fairly

high number of hypothetical proteins, for which annotation is still

incomplete. Genes belonging to the core and dispensable genomes have

been classified according to their predicted function based on COG and

KEGG categories for the respective pangenomes (Additional file 1). The

C. burnetii pangenome is closed, as we found a finite number of gene

clusters. The L. pneumophila pangenome is open (unlimited) because the

number of pangenome clusters and core genome clusters changed

depending on how many different genomes were included in the analysis.

The F. tularensis pangenome was on the borderline between being

considered an open or closed genome (Additional file 2).

The Coxiella burnetii pangenome

The C. burnetii pangenome consists of 6,871 CDS with 1,080 core genes

(92.04 %) and 491 dispensable genes (7.15 %) (Additional file 3). A total of

56 genes were specific to the C. burnetii CbuG_Q212 (6), C. burnetii

CbuK_Q154 (6), C. burnetii Dugway 5J108-111 (34), C. burnetii RSA 331 (9)

and C. burnetii RSA 493 (1) genomes. Notably, 70 out of these

Page 86: Insight into intracellular bacterial genome repertoire using

86

491 accessory genes (14.25%) were hypothetical proteins. Of the 1,080

genes belonging to the core genome, 956 (88.6%) were attributed to a

COG category, and 510 (47.3%) were attributed to a KEGG category. In

the case of the 491 dispensable genes, 421 (85.7%) were assigned to a

COG category, and 185 (37.6%) were assigned to a KEGG category.

Using the COG database, we identified minor differences between the

compartments in the defense mechanisms (V) and intracellular

trafficking, secretion and vesicular transport (U) categories. Using the

KEGG database, we found that C. burnetii Dugway 5J108-111 contains a

greater number of CDSs involved in environmental information

processing and metabolism than other strains. The core genome

represented 92% of the pangenome (Additional file 2), showing again the

high rate of conservation.

The Legionellales pangenome

The Legionella pangenome consists of 23,736 CDSs with 1,410 core genes

(82.44 %) and a dispensable genome of 3791 CDSs (15.97 %) (Additional

file 3). A total of 378 genes were specific to the L. pneumophila str. Lens

(14), L. pneumophila str. Paris (20), L. pneumophila 2300/99 Alcoy (7), L.

pneumophila subsp. pneumophila HL06041035 (21), L. pneumophila

subsp. pneumophila str. Lorraine (8), L. pneumophila str. Corby (6), L.

pneumophila subsp. pneumophila str. Philadelphia 1 (3), L. pneumophila

subsp. pneumophila LPE509 (1) and L. pneumophila subsp. pneumophila

str. Thunder Bay (3) genomes. Of these 378 unique genes, 295 (78.04 %)

were present in L. longbeachae strain NSW150. Of the 1,410 genes

Page 87: Insight into intracellular bacterial genome repertoire using

87

belonging to the core genome, 1,316 (93.4%) were attributed to a COG

category and 688 (48.8%) were attributed to a KEGG category. In the case

of the 3791 dispensable genes, 3273 (86.3%) were attributed to a COG

category and 1464 (38.6%) were attributed to a KEGG category.

We observed several differences in the CDSs from the Cell

wall/membrane/envelope biogenesis (M) COG category. Legionellales has

a greater number of CDSs involved in membrane transport and signal

transduction (based on KEGG categories), which is associated with

environmental information processing. In particular, L. longbeachae strain

NSW150 has a greater number of genes associated with energy

production and conservation (C), signal transduction (T), and defense

mechanisms (V) and fewer genes related to cell motility (N), based on

COG categories. Significant differences were observed in the number of

CDSs associated with cellular processes, particularly flagellar assembly,

which is important for cell motility and carbohydrate and energy

metabolism.

The Legionella pneumophila pangenome

The L. pneumophila pangenome consists of 21,459 CDSs with a core

genome of 1,572 genes (90.71 %) and a dispensable genome of 1881

CDSs (8.77 %) (Additional file 3). A total of 112 genes were specific to the

L. pneumophila str. Lens (20), L. pneumophila str. Paris (27),

L. pneumophila 2300/99 Alcoy (7), L. pneumophila subsp. pneumophila

HL06041035 (26), L. pneumophila subsp. pneumophila str. Lorraine (15),

L. pneumophila str. Corby (9), L. pneumophila subsp. pneumophila str.

Page 88: Insight into intracellular bacterial genome repertoire using

88

Philadelphia 1 (4), L. pneumophila subsp. pneumophila LPE509 (1) and

L. pneumophila subsp. pneumophila str. Thunder Bay (3) genomes. Of the

1,572 genes belonging to the core genome, 1,465 (93.2 %) were

attributed to a COG category, and 760 (48.4 %) were attributed to a KEGG

category. In the case of the 1,881 dispensable genes, 1,524 (81%) were

attributed to a COG category, and 661 (35.14 %) were attributed to a

KEGG category.

We identified differences in the cell wall/membrane/envelope biogenesis

(M) category based on the CDSs for which a function could be identified

using the COG database. We found that greater number of CDSs are

involved in signal transduction (the bacterial secretion system and the

two-component system, which are associated with environmental

information processing) and translation (ribosomal elements that are

associated with genetic information processing). We did not observe any

differences in the cellular processes category.

The Francisella tularensis pangenome

The F. tularensis pangenome consists of 16,596 CDSs with a core of 1,010

genes (86.05 %) and a dispensable genome of 2297 CDSs (13.84 %)

(Additional file 3). A total of 18 genes were specific to the F. tularensis

subsp. holarctica F92 (1), F. tularensis subsp. holarctica LVS (1),

F. tularensis subsp. holarctica OSU18 (4), F. tularensis subsp. mediasiatica

FSC147 (3), F. tularensis subsp. tularensis NE061598 (4) and F. tularensis

subsp. tularensis WY96-3418 (5) genomes. Of the 1,010 genes belonging

to the core genome, 775 (76.8 %) were attributed to a COG category, and

Page 89: Insight into intracellular bacterial genome repertoire using

89

415 (41.1 %) were attributed to a KEGG category. In the case of the 2297

dispensable genes, 1,881 (81.8 %) were attributed to a COG category, and

886 (38.5 %) were attributed to a KEGG category.

We observed greater number of CDSs involved in information storage and

processing (translation, ribosomal structure and biogenesis (J); and

replication, recombination and repair (L)) and metabolism (amino acid

transport metabolism (E), carbohydrate transport metabolism (G) and

inorganic ion transport metabolism (P)). We found that F. tularensis

subsp. holarctica F92, F. tularensis subsp. holarctica LVS and F. tularensis

subsp. holarctica FTNF002-00 have a greater number of CDSs involved in

replication, recombination and repair (L) compared to other F. tularensis

genomes.

The Gammaproteobacteria pangenome

The Gammaproteobacteria pangenome consists of 49,833 CDS with a

core of 627 genes (47.16 %) and a dispensable genome of 25,933 genes

(52.04 %) (Additional file 4, Figure 5). The organisms that share the

greatest number of core genes are as follows: 618 out of 627 in Legionella

strains, 617 genes in L. pneumophila strains, 578 genes in F. tularensis

strains and 580 genes in C. burnetii strains. The organisms that share the

greatest number of dispensable genes are as follows: 13,640 out of

25,933 in Legionella strains (52.6 %), 12,458 in L. pneumophila strains

(48.04 %), 8,048 in F. tularensis strains (31.03 %) and 3,268 in C. burnetii

strains (12.6 %). A total of 400 genes were specific to the C. burnetii (28),

F. tularensis (9), Legionella (272), L. pneumophila (62), R. grylli (42) and D.

Page 90: Insight into intracellular bacterial genome repertoire using

90

massiliensis strain 20B (49) genomes. Of the 627 genes belonging to the

core genome, 594 (94.8 %) were attributed to a COG category, and 342

(54.6 %) were attributed to a KEGG category. Among the 25,933

dispensable genes, 22,402 (86.4 %) were attributed to a COG category,

and 10,484 (40.4 %) were attributed to a KEGG category.

In the core genome, we observed differences in the number of CDSs

involved in metabolism (based on COG categories), namely in energy

production and conversion (C), amino acid transport metabolism (E) and

coenzyme transport metabolism (H). In the dispensable genome, the

greatest number of CDSs was associated with amino acid transport

metabolism (E). A similar functional distribution was found in the set of

dispensable genes based on KEGG categories, in that a greater number of

CDSs were associated with metabolism categories but a lesser number

were associated with folding, sorting and degradation, glycan

biosynthesis metabolism, replication and repair and translation.

By analyzing 1,475 genes from D. massiliensis strain 20B using OrthoMCL,

we identified a core genome of 908 genes (61.56 %) and a dispensable

genome of 518 genes (35.12 %). The majority of the genes in the core

genome were associated with COG categories contributing to metabolism

(energy production and conversion (C) and coenzyme transport and

metabolism (H)) and information storage and processing (translation,

ribosomal structure and biogenesis (J); and replication, recombination

and repair (L)). Based on KEGG category assignments, a greater number of

CDSs in the core genome were associated with translation, cofactor and

vitamin metabolism, nucleotide metabolism and carbohydrate

Page 91: Insight into intracellular bacterial genome repertoire using

91

metabolism. Genes associated with amino acid metabolism and

carbohydrate metabolism were highly represented among the

dispensable genes. Of the 49 unique genes identified, 15 encoded

hypothetical proteins. Some specific genes were identified, including

PhoPQ-activated pathogenicity-related protein, dehydrogenases, SAM-

dependent methyltransferases, galactose mutarotase and others

(Additional file 5). Based on KEGG categories, these unique genes were

associated with metabolism, environmental information processing,

genetic information processing, two-component systems and sulfur relay

systems.

Phylogenomic analysis

A phylogenomic tree constructed based on gene content (i.e., the

presence or absence of protein-coding genes, as predicted by COG and

KEGG) showed different genome clustering than a whole genome tree

(Figure 6). In the phylogenomic tree constructed based on COG

classification, D. massiliensis strain 20B clustered with R. grylli and

clustered closely with C. burnetii strains. In contrast, in the tree

constructed based on KEGG classification, R. grylli formed a cluster with

the C. burnetii strains, and D. massiliensis strain 20B was not included in

this cluster. Based on all genes associated with cellular processes as

determined by KEGG classification, D. massiliensis strain 20B clustered

with four C. burnetii strains (C. burnetii CbuG_Q212, C. burnetii

CbuK_Q154, C. burnetii Dugway 5J108-111 and C. burnetii RSA 493).

Based on an analysis of COG categories, D. massiliensis strain 20B and

Page 92: Insight into intracellular bacterial genome repertoire using

92

R. grylli clustered closely with C. burnetii strains, with the exception of

five COG categories. For cell cycle control, cell division, and chromosome

partitioning (D), nucleotide transport metabolism (F), coenzyme transport

metabolism (H), lipid transport metabolism (I) and secondary metabolite

biosynthesis, transport and catabolism (Q), D. massiliensis strain 20B and

R. grylli clustered with the F. tularensis strains.

Discussion

Pangenomic studies were described by Tettelin et al. in 2005 [15]. These

types of studies analyze bacterial species in detail using different criteria

and can determine whether the nature of the pangenome is open or

closed. C. burnetii, an obligate intracellular bacterium [1], has a closed

pangenome with a core/pangenome ratio of 92% (Additional file 2) and a

relatively constant set of core genes [30]. Another example of a

gammaproteobacterium with a closed pangenome is Buchnera aphidicola

[16], which has a core/pangenome ratio of 98%. In this study we analyzed

the facultative intracellular bacteria L. pneumophila and F. tularensis,

which have core/pangenome ratios of 82% and 87%, respectively.

Although their ratios were very close to the threshold of 89%, both of

these bacteria can be considered to have open pangenomes, unlike the

E. coli pangenome, which is infinite [11].

Our results show that the clubbed pangenome of D. massiliensis is

composed of 23,500 (47.1%) core genes, 13,399 (57%) genes shared by

Legionella and C. burnetii, 12,114 (51.5%) genes shared by C. burnetii and

F. tularensis, and 18,363 (78.1%) genes shared by Legionella and

Page 93: Insight into intracellular bacterial genome repertoire using

93

F. tularensis. Moreover, based on the phylogenomic trees we

constructed, we conclude that D. massiliensis is more closely related to

R. grylli than to C. burnetii. D. massiliensis and R. grylli shared 635 genes

and clustered more often with C. burnetii. These results are in agreement

with Pearson et al. [17], who showed that R. grylli is one of the closest

known neighbors of C. burnetii. We also observed differences in lifestyle

among the species analyzed in this study.

Pangenomic studies elucidate the link between gene content and

bacterial lifestyles. An allopatric lifestyle is defined by a narrow ecological

niche with restricted opportunities for acquiring DNA from other

organisms. An allopatric lifestyle can be associated with genome

reduction, especially in pathogens that have smaller genomic repertoires

compared to less specialized bacteria [18], smaller pangenomes and

smaller mobilomes. In contrast, a sympatric lifestyle is associated with

larger genomes, larger pangenomes, a larger mobilome and more

frequent genomic exchanges with other bacteria. Moliner et al. [19, 20]

described two different types of intracellular lifestyles: allopatric bacteria

that are strictly intracellular bacteria and therefore live in narrow niches,

and sympatric bacteria such as Legionella spp. that live in amoebas where

DNA exchange can take place [19, 20]. The authors noted that

intracellular bacteria living in amoebas generally have a larger genome,

whereas other intracellular pathogens suffer from massive gene loss due

to specialization. D. massiliensis, R. grylli, C. burnetii and F. tularensis have

smaller genomes and exhibit losses of function compared to Legionella

species.

Page 94: Insight into intracellular bacterial genome repertoire using

94

Based on the comparison of G+C content and genome size and previous

work by Merhej et al., we identified three distinct lifestyles:

D. massiliensis and R. grylli are extremely allopatric, C. burnetii and

F. tularensis are allopatric and have very little interaction with other

organisms, and Legionella are sympatric, as they live in amoebas. In

addition, we compared the gene losses and gains (based on COG

functional analysis) in the genomes analyzed in this study to those

analyzed by Merhej et al. [21]. We found that the more specialized a

bacterium is, the more genes it has related to transcriptional regulation

(K), defense mechanisms (V), inorganic ions (P), amino acid metabolism

(E) and less genes in translation (J). For all of these categories, we

observed a considerable difference between Legionella spp. (more

pronounced in L. longbeachae than in L. pneumophila strains) and the

other bacteria. Moreover, for each of these categories, D. massiliensis

and Rickettsiella grylli have fewer genes with an assigned COG function.

Based on KEGG classification, we also found that D. massiliensis and

Rickettsiella grylli show immense losses of genes related to amino acid

metabolisms. These results are in agreement with those obtained by

Merhej et al. These results allowed us to divide the species analyzed in

this study into three categories based on lifestyle. D. massiliensis and

R. grylli are extremely allopatric species with fewer functional genes (as

classified by COG), including a high loss of amino acid metabolism genes,

and less severe loss in genes related to translation and transcription. The

intermediate allopatric bacteria, C. burnetii and F. tularensis, have more

functional genes compared to the extremely allopatric species. Sympatric

Page 95: Insight into intracellular bacterial genome repertoire using

95

bacteria such as Legionella, especially L. longbeachae, possess the

greatest number of functional genes (as classified by COG and KEGG)

compared to the other species analyzed in this study.

A comparative genomics-based analysis of free-living and host-

dependent bacteria showed that intracellular bacteria contain fewer

rRNA genes [21]. These bacterial genomes contained more split rRNA

operons and fewer transcriptional regulators than other bacteria, which

was linked to slower growth rates that are adaptive for their ecological

niche [21]. The deletion of inactivation of certain genes renders several

intracellular pathogens such as Shigella, Salmonella, and F. tularensis

pathogenic. These genes are referred to as antivirulence genes [22]. A

recent study in B. birtlesii identified a deletion in one of the two rRNA

operons and disrupted genes associated with translation that are

important for specialization to a specific niche [23]. The number of

activated genes in a restricted environment is much lower than in a

changing environment, as genes involved in translation are not expressed

extensively [23]. If bacteria do not typically express ribosomal operons in

their respective environments, then these operons are subject to loss

[23]. Bacterial specialization involves a striking degree of gene loss,

including decreased gene numbers, changes in G+C content and

decreased numbers of both incomplete and intact ribosomal operons [21,

24, 25]. Restricting translation is critical for specialization, as speciation is

often correlated with ribosomal operon inactivation [21, 23] and gene

inactivation.

Page 96: Insight into intracellular bacterial genome repertoire using

96

Conclusion

This study of intracellular Gammaproteobacteria has contributed to our

understanding of bacterial specialization based on the ecological niche.

The genome size and gene content of the bacteria are associated with

lifestyle. A smaller number of genes and a relatively low G+C content

were observed in the genomes analyzed here, similar to other studies of

intracellular bacteria [18]. Gene loss resulting in a smaller genome size

has been a driving force in the adaptation of these bacteria to their hosts.

Due to the reduction in the genomic repertoire, we speculate that fewer

lateral gene transfers occur in D. massiliensis compared to other

intracellular bacteria [26]. We used a multi-genus pangenomic approach

to characterize the genomic repertoire of representative strains and

compare the distribution of genes in D. massiliensis strain 20B with other

genomes. We found that majority of the genes in D. massiliensis strain

20B were shared with other gammaproteobacteria. A pangenomic

approach facilitates the exploration of different strategies by which

facultative or obligate intracellular bacteria adapt to particular hosts and

contributes significantly to our understanding of genome repertoires. This

approach can be used to uncover unique genomic features that cannot be

predicted by conventional methods. Moreover, our results suggest that

the Legionella strains could be re-classified based on their genomic

variability.

Page 97: Insight into intracellular bacterial genome repertoire using

97

Methods

Determination of genomic data

For the genomic comparison, we used thirty sequenced species including

five C. burnetii strains, ten L. pneumophila strains, L. longbeachae strain

NSW150 [27], twelve F. tularensis strains, Rickettsiella grylli and

D. massiliensis strain 20B from the Gammaproteobacteria class. The

information related to genome properties (genome size, coding regions,

G+C content, total number of genes, RNA-coding genes, protein-coding

genes, genes with a predicted function, genes assigned to Clusters of

Orthologous Groups of proteins (COGs), genes with peptide signals and

genes with transmembrane helices) was retrieved from NCBI

(ftp://ftp.ncbi.nih.gov/genomes/Bacteria/) and IMG/ER [28]

(https://img.jgi.doe.gov) (Table 1). Open reading frames (ORFs) were

predicted for the draft genome using Prodigal [29] with default

parameters, but the predicted ORFs were excluded if they spanned a

sequencing gap. The predicted bacterial protein sequences were searched

against the GenBank database [30] and the Clusters of Orthologous

Groups (COG) database using BLASTP (E-value 10-5

and coverage ≥ 70%).

Pangenome analysis

All of the CDSs from each genome were pooled together and clustered

using OrthoMCL [32] using the following parameters: an overlap of at

least 70% and a minimum of 80% similarity. Only protein sequences

longer than 50 amino acids were considered for further analysis.

Homologous sequences were selected using the all-against-all BLASTp

algorithm with an E value of less than 0.00001. Then, the orthologous

Page 98: Insight into intracellular bacterial genome repertoire using

98

sequences clustering was analyzed using the Markov Cluster algorithm,

which is based on probability and graph flow theory and allows the

simultaneous classification of global relationships in a similarity space

[32]. An inflation index of 1.5 was used to regulate cluster tightness

(granularity), and the resulting clustered ortholog groups were analyzed

further. Several Perl/Python scripts were compiled in our laboratory for

massive data handling, namely for the calculation of core set (shared

among all strains), dispensable set (shared between at least two) and

unique set (organism-specific) genes from the OrthoMCL results.

Functional annotation was derived using WebMGA [33] against the

Cluster of Orthologous Groups [34] and the Kyoto Encyclopedia of Genes

and Genomes [35].

Genome alignment and gene content-based phylogenomics

Using MAUVE [36], the backbone output file generated after global

genome alignment was used to calculate the composition of core

distribution depending on the pangenome size [37]. This

core/pangenome ratio is used to determinate if a pangenome is open or

closed. The gene content of the genomes was classified based on twenty-

five functional COG categories and was used to construct phylogenomic

trees. The gene content was converted to a matrix of discrete binary

characters ("0" and "1" for absence and presence, respectively) [38] and

used to construct the matrix for Euclidean distances between pairs of

points. The MEV (MultiExperiment Viewer) [39] was used to represent the

Page 99: Insight into intracellular bacterial genome repertoire using

99

results visually. The G+C content and COG data were compared with

previous work performed by Merhej et al. [21].

List of abbreviations

PCR: polymerase chain reaction; KEGG: Kyoto Encyclopedia of Genes and

Genomes; COG: clusters of orthologous groups; CDS: coding sequences

Competing interests and funding

The authors declare that they have no competing interests.

Authors' contributions

DR designed the research project. MJM performed the genomic analysis.

MJM and DR analyzed the data. MJM and LR wrote the paper. DR revised

the paper. All authors read and approved the final version.

Acknowledgements

We would like to thank Roshan Padmanabhan for technical support,

suggestions, corrections and Ripsy Merrin Chacko for helpful remarks.

Page 100: Insight into intracellular bacterial genome repertoire using

100

References

1. Beare PA, Unsworth N, Andoh M, Voth DE, Omsland A, Gilk SD, Williams KP,

Sobral BW, Kupko JJ, 3rd, Porcella SF, et al: Comparative genomics reveal

extensive transposon-mediated genomic plasticity and diversity among

potential effector proteins within the genus Coxiella. Infection and immunity

2009, 77:642-656.

2. Roux V, Bergoin M, Lamaze N, Raoult D: Reassessment of the taxonomic

position of Rickettsiella grylli. International journal of systematic bacteriology

1997, 47:1255-1257.

3. Mediannikov O, Sekeyova Z, Birg ML, Raoult D: A novel obligate intracellular

gamma-proteobacterium associated with ixodid ticks, Diplorickettsia

massiliensis, Gen. Nov., Sp. Nov. PloS one 2010, 5:e11478.

4. Parola P, Raoult D: Ticks and tickborne bacterial diseases in humans: an

emerging infectious threat. Clinical infectious diseases : an official publication

of the Infectious Diseases Society of America 2001, 32:897-928.

5. Beckstrom-Sternberg SM, Auerbach RK, Godbole S, Pearson JV, Beckstrom-

Sternberg JS, Deng Z, Munk C, Kubota K, Zhou Y, Bruce D, et al: Complete

genomic characterization of a pathogenic A.II strain of Francisella tularensis

subspecies tularensis. PloS one 2007, 2:e947.

6. Mathew MJ, Subramanian G, Nguyen TT, Robert C, Mediannikov O, Fournier

PE, Raoult D: Genome sequence of Diplorickettsia massiliensis, an emerging

Ixodes ricinus-associated human pathogen. Journal of bacteriology 2012,

194:3287.

7. Subramanian G, Mediannikov O, Angelakis E, Socolovschi C, Kaplanski G,

Martzolff L, Raoult D: Diplorickettsia massiliensis as a human pathogen.

European journal of clinical microbiology & infectious diseases : official

publication of the European Society of Clinical Microbiology 2012, 31:365-369.

8. Peterson J, Garges S, Giovanni M, McInnes P, Wang L, Schloss JA, Bonazzi V,

McEwen JE, Wetterstrand KA, Deal C, et al: The NIH Human Microbiome

Project. Genome research 2009, 19:2317-2323.

9. Hu B, Xie G, Lo CC, Starkenburg SR, Chain PS: Pathogen comparative

genomics in the next-generation sequencing era: genome alignments,

pangenomics and metagenomics. Briefings in functional genomics 2011,

10:322-333.

Page 101: Insight into intracellular bacterial genome repertoire using

101

10. Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R: The microbial pan-

genome. Current opinion in genetics & development 2005, 15:589-594.

11. Rasko DA, Rosovitz MJ, Myers GS, Mongodin EF, Fricke WF, Gajer P, Crabtree

J, Sebaihia M, Thomson NR, Chaudhuri R, et al: The pangenome structure of

Escherichia coli: comparative genomic analysis of E. coli commensal and

pathogenic isolates. Journal of bacteriology 2008, 190:6881-6893.

12. Rocha EP: Evolutionary patterns in prokaryotic genomes. Current opinion in

microbiology 2008, 11:454-460.

13. D'Auria G, Jimenez-Hernandez N, Peris-Bondia F, Moya A, Latorre A:

Legionella pneumophila pangenome reveals strain-specific virulence factors.

BMC Genomics 2010, 11:181.

14. Wren BW: Microbial genome analysis: insights into virulence, host

adaptation and evolution. Nature reviews Genetics 2000, 1:30-39.

15. Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R: The microbial pan-

genome. Curr Opin Genet Dev 2005, 15:589-594.

16. Snipen L, Almoy T, Ussery DW: Microbial comparative pan-genomics using

binomial mixture models. BMC Genomics 2009, 10:385.

17. Pearson T, Hornstra HM, Sahl JW, Schaack S, Schupp JM, Beckstrom-Sternberg

SM, O'Neill MW, Priestley RA, Champion MD, Beckstrom-Sternberg JS, et al:

When Outgroups Fail; Phylogenomics of Rooting the Emerging Pathogen,

Coxiella burnetii. Syst Biol 2013, 62:752-762.

18. Georgiades K, Merhej V, El Karkouri K, Raoult D, Pontarotti P: Gene gain and

loss events in Rickettsia and Orientia species. Biol Direct 2011, 6:6.

19. Gimenez G, Bertelli C, Moliner C, Robert C, Raoult D, Fournier PE, Greub G:

Insight into cross-talk between intra-amoebal pathogens. BMC Genomics

2011, 12:542.

20. Moliner C, Fournier PE, Raoult D: Genome analysis of microorganisms living

in amoebae reveals a melting pot of evolution. FEMS Microbiol Rev 2010,

34:281-294.

21. Merhej V, Royer-Carenzi M, Pontarotti P, Raoult D: Massive comparative

genomic analysis reveals convergent evolution of specialized bacteria. Biol

Direct 2009, 4:13.

22. Bliven KA, Maurelli AT: Antivirulence genes: insights into pathogen evolution

through gene loss. Infection and immunity 2012, 80:4061-4070.

Page 102: Insight into intracellular bacterial genome repertoire using

102

23. Rolain JM, Vayssier-Taussat M, Saisongkorh W, Merhej V, Gimenez G, Robert

C, Le Rhun D, Dehio C, Raoult D: Partial disruption of translational and

posttranslational machinery reshapes growth rates of Bartonella birtlesii.

MBio 2013, 4:e00115-00113.

24. Moran NA, Wernegreen JJ: Lifestyle evolution in symbiotic bacteria: insights

from genomics. Trends Ecol Evol 2000, 15:321-326.

25. Andersson JO, Andersson SG: Insights into the evolutionary process of

genome degradation. Current opinion in genetics & development 1999, 9:664-

671.

26. Audic S, Robert C, Campagna B, Parinello H, Claverie JM, Raoult D, Drancourt

M: Genome analysis of Minibacterium massiliensis highlights the convergent

evolution of water-living bacteria. PLoS Genet 2007, 3:e138.

27. Cazalet C, Gomez-Valero L, Rusniok C, Lomma M, Dervins-Ravault D, Newton

HJ, Sansom FM, Jarraud S, Zidane N, Ma L, et al: Analysis of the Legionella

longbeachae genome and transcriptome uncovers unique strategies to

cause Legionnaires' disease. PLoS Genet 2010, 6:e1000851.

28. Markowitz VM, Chen IM, Palaniappan K, Chu K, Szeto E, Grechkin Y, Ratner A,

Jacob B, Huang J, Williams P, et al: IMG: the Integrated Microbial Genomes

database and comparative analysis system. Nucleic acids research 2012,

40:D115-122.

29. Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ: Prodigal:

prokaryotic gene recognition and translation initiation site identification.

BMC bioinformatics 2010, 11:119.

30. Benson DA, Karsch-Mizrachi I, Clark K, Lipman DJ, Ostell J, Sayers EW:

GenBank. Nucleic acids research 2012, 40:D48-53.

31. Benson G: Tandem repeats finder: a program to analyze DNA sequences.

Nucleic acids research 1999, 27:573-580.

32. Li L, Stoeckert CJ, Jr., Roos DS: OrthoMCL: identification of ortholog groups

for eukaryotic genomes. Genome research 2003, 13:2178-2189.

33. Wu S, Zhu Z, Fu L, Niu B, Li W: WebMGA: a customizable web server for fast

metagenomic sequence analysis. BMC Genomics 2011, 12:444.

34. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov

DM, Mazumder R, Mekhedov SL, Nikolskaya AN, et al: The COG database: an

updated version includes eukaryotes. BMC bioinformatics 2003, 4:41.

Page 103: Insight into intracellular bacterial genome repertoire using

103

35. Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M: KAAS: an automatic

genome annotation and pathway reconstruction server. Nucleic acids

research 2007, 35:W182-185.

36. Darling AE, Mau B, Perna NT: progressiveMauve: multiple genome alignment

with gene gain, loss and rearrangement. PloS one 2010, 5:e11147.

37. Sheppard SK, Didelot X, Jolley KA, Darling AE, Pascoe B, Meric G, Kelly DJ, Cody

A, Colles FM, Strachan NJ, et al: Progressive genome-wide introgression in

agricultural Campylobacter coli. Mol Ecol 2013, 22:1051-1064.

38. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO: Assigning

protein functions by comparative genome analysis: protein phylogenetic

profiles. Proceedings of the National Academy of Sciences of the United States

of America 1999, 96:4285-4288.

39. Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M,

Currier T, Thiagarajan M, et al: TM4: a free, open-source system for

microarray data management and analysis. BioTechniques 2003, 34:374-378.

Page 104: Insight into intracellular bacterial genome repertoire using

104

Figures legends

Figure 1: Protein sequence length distributions. All the organisms

represented in different colors and symbols.

Figure 2: The distribution of GC content (%), genomic size (Mb) is

represented in red and blue respectively

Figure 3: KEGGs characteristics according to categories for the organisms

considered in the analysis

Figure 4: Summarizes the comparison using pangenome analysis. The

results obtained from comparing the five complete genomes of

pathogenic C. burnetii strains, eleven complete genomes of pathogenic

Legionella strains, ten complete genomes of pathogenic L. pneumophila

strains, eleven complete genomes of pathogenic Francisella tularensis

strains and thirty complete genomes in relation to their

orthologs/accessory gene distribution. The middle circle represents the

number of Core functions and each petal corresponds to the number of

accessory functions.

Figure 5 - Distribution of the accessory functions in whole set (30

organisms). The middle circle represents the number of Core functions

and each petal corresponds to the number of accessory functions.

Figure 6 - Phylogenomics analysis based on COG and KEGG information,

clustering based on Euclidean distance method.

Page 105: Insight into intracellular bacterial genome repertoire using

Figure 1

Page 106: Insight into intracellular bacterial genome repertoire using

106

Figure 2

Page 107: Insight into intracellular bacterial genome repertoire using

107

Figure 3

Page 108: Insight into intracellular bacterial genome repertoire using

108

Figure 4

Page 109: Insight into intracellular bacterial genome repertoire using

109

Figure 5

Page 110: Insight into intracellular bacterial genome repertoire using

110

Figure 6

Page 111: Insight into intracellular bacterial genome repertoire using

111

Organism Niche Chrs Plasmids

Size

(Mb) GC% Gene Protein

Coding

Density

Accession

Number PMID

Diplorickettsia massiliensis 20B Ticks 1 - 1.73 39.3 2333 2287 79.56 AJGC00000000 22628513

Rickettsiella grylli Insect 1 - 1.58 37.8 1557 1410 90.86 AAQJ02000001.1 5753287

Coxiella burnetii Dugway 5J108-111

T

ick

s+A

mo

eb

a

A

nim

als

1 1 2.21 42.3 2362 2045 82.04 NC_009727.1 19047403

Coxiella burnetii CbuG_Q212 1 - 2.01 42.6 2091 1866 77.55 NC_011527.1 19047403

Coxiella burnetii CbuK_Q154 1 1 2.1 42.6 2183 1942 77.69 NC_011528.1 19047403

Coxiella burnetii RSA 331 1 1 2.05 42.7 2278 1975 78.43 NC_010117.1 19047403

Coxiella burnetii RSA 493 1 1 2.03 42.6 2095 1847 78.00 NC_002971.3 19047403

Francisella tularensis subsp. tularensis SCHU S4 1 - 1.89 32.3 1852 1604 79.17 NC_006570.2 15640799

Francisella tularensis subsp. tularensis TIGB03 1 - 1.97 32.3 1850 1624 76.76 NC_016933.1 22535949

Francisella tularensis subsp. holarctica F92 1 - 1.89 32.2 1890 1842 80.55 NC_019537.1 23405342

Francisella tularensis subsp. holarctica FSC200 1 - 1.89 32.2 1810 1438 71.29 NC_019551.1 23209222

Francisella tularensis subsp. holarctica FTNF002-00 1 - 1.89 32.2 1887 1581 76.33 NC_009749.1 19756146

Francisella tularensis subsp. holarctica LVS 1 - 1.9 32.2 2020 1754 82.40 NC_007880.1 15780452

Francisella tularensis subsp. holarctica OSU18 1 - 1.9 32.2 1932 1555 74.62 NC_008369.1 16980500

Francisella tularensis subsp. mediasiatica FSC147 1 - 1.89 32.3 1750 1406 71.94 NC_010677.1 19521508

Francisella tularensis subsp. tularensis FSC198 1 - 1.89 32.3 1852 1605 79.21 NC_008245.1 17406676

Francisella tularensis subsp. tularensis NE061598 1 - 1.89 32.3 1888 1836 82.23 NC_017453.1 20140244

Francisella tularensis subsp. tularensis TI0902 1 - 1.89 32.3 1764 1544 76.58 NC_016937.1 22535949

Francisella tularensis subsp. tularensis WY96-3418 1 - 1.9 32.3 1872 1634 80.04 NC_009257.1 17895988

Legionella pneumophila subsp. pneumophila str. Philadelphia 1

Am

oe

ba

1 - 3.4 38.3 3003 2943 88.49 NC_002942.5 15448271

Legionella pneumophila str. Paris 1 1 3.64 38.4 3278 3166 87.19 NC_006368.1 15467720

Legionella pneumophila 2300/99 Alcoy 1 - 3.52 38.4 3243 3190 87.75 NC_014125.1 20236513

Legionella pneumophila str. Corby 1 - 3.58 38.5 3257 3204 87.06 NC_009494.2 17888731

Legionella pneumophila str. Lens 1 1 3.41 38.4 3058 2934 87.14 NC_006369.1 15467720

Legionella pneumophila subsp. pneumophila ATCC 43290 1 - 3.36 38.2 2981 2926 89.11 NC_016811.1 22374950

Legionella pneumophila subsp. pneumophila HL06041035 1 - 3.49 38.4 3184 3059 87.17 NC_018140.1 22044686

Legionella pneumophila subsp. pneumophila LPE509 1 1 3.51 38.3 3383 3331 88.66 NC_020521.1 23792742

Legionella pneumophila subsp. pneumophila str. Lorraine 1 1 3.62 38.4 3327 3221 87.48 NC_018139.1 22044686

Legionella pneumophila subsp. pneumophila str. Thunder Bay 1 - 3.46 38.2 3043 2998 88.04 NC_021350.1 23826259

Legionella longbeachae NSW150 1 1 4.15 37.1 3739 3470 84.73 NC_013861.1 20174605

Table 1- General characteristics of the organisms considered for the analysis

Page 112: Insight into intracellular bacterial genome repertoire using

112

Additional file 1- Functional analysis. COGs, KEGGs distribution within the core and dispensable compartments.

Page 113: Insight into intracellular bacterial genome repertoire using

113

Additional file 2- Pangenome of some proteobacteria summary. The % column corresponds to the core/pan-

genome ratio.

Species Genome

used

Niche Average

Genome

size

Pangenome

size (bp)

Core

genome

size (bp)

%

Salmonella enterica 20 Animals 4.8Mb 96520000 59960168 62

Campylobacter jejuni 14 Human, chicken 1.7MB 23720000 18122022 76

Helicobacter pylori 10 Human 1.6Mb 16370000 12849693 78

Haemophilus influenzae 9 Human 1.8Mb 17170000 13728166 80

Legionella pneumophila 10 Amoeba 3.4Mb 34548036 28477841 82

Francisella tularensis 13 Ticks, Amoeba 1.8Mb 24690000 21468663 87

Yersinia pestis 12 Rodents 4.7Mb 55015109 48947637 89

Coxiella burnetii 5 Animals 2Mb 6690114 6150819 92

Buchnera aphidicola 8 Aphid 0.6Mb 5133548 5033068 98

Page 114: Insight into intracellular bacterial genome repertoire using

114

Additional file 3- Individual Pangenome summary based on OrthoMCL clustering.

Corresponding information regarding core, accessory and unique genes in the organisms studies

Organisms

Proteins

used by

Orthomcl

Core

genes

Accessory

Cluster

Core

Cluster

Accessory

genes

Unique

genes

Total

cluster

No

Group Core Accessory Unique

Pangenome Coxiella burnetii (5 genomes) 6871 1080 6431 6324 491 56 1290 1993 92 7.15 0.82

Coxiella burnetii CbuG_Q212 1359 1079 87 1264 89 6 1172 401 93 6.55 0.44

Coxiella burnetii CbuK_Q154 1394 1079 106 1279 109 6 1191 394 91.8 7.82 0.43

Coxiella burnetii Dugway 5J108-111 1414 1080 122 1256 124 34 1236 421 88.8 8.77 2.4

Coxiella burnetii RSA 331 1351 1076 66 1276 66 9 1151 313 94.5 4.89 0.67

Coxiella burnetii RSA 493 1353 1079 102 1249 103 1 1182 464 92.3 7.61 0.07

Pangenome Francisella (12 genomes) 16596 1010 280 14281 2297 18 1308 2329 86.1 13.84 0.11

Francisella tularensis subsp. holarctica F92 1557 1008 209 1315 241 1 1218 192 84.5 15.48 0.06

Francisella tularensis subsp. holarctica FSC200 1248 999 143 1104 144 0 1142 142 88.5 11.54 0

Francisella tularensis subsp. holarctica FTNF002-00 1362 998 164 1193 169 0 1115 165 87.6 12.41 0

Francisella tularensis subsp. holarctica LVS 1534 1009 224 1271 262 1 1234 197 82.9 17.08 0.07

Francisella tularensis subsp. holarctica OSU18 1299 1001 147 1147 148 4 1152 180 88.3 11.39 0.31

Francisella tularensis subsp. mediasiatica FSC147 1243 1000 130 1109 131 3 1133 150 89.2 10.54 0.24

Francisella tularensis subsp. tularensis FSC198 1369 1009 188 1181 188 0 1197 236 86.3 13.73 0

Francisella tularensis subsp. tularensis NE061598 1512 1010 227 1269 239 4 1241 215 83.9 15.81 0.26

Francisella tularensis subsp. tularensis SCHU S4 1368 1009 188 1180 188 0 1197 236 86.3 13.74 0

Francisella tularensis subsp. tularensis TI0902 1322 1007 196 1126 196 0 1203 209 85.2 14.83 0

Francisella tularensis subsp. tularensis TIGB03 1393 1007 197 1195 198 0 1204 216 85.8 14.21 0

Francisella tularensis subsp. tularensis WY96-3418 1291 1007 192 1191 193 5 1204 191 92.3 14.95 0.39

Pangenome Legionella (11 genomes) 23736 1410 570 19567 3791 378 2358 1078 82.4 15.97 1.59

L_ longbeachae NSW150 2277 1356 194 1724 258 295 1845 107 75.7 11.33 12.96

L_pneumo_ 2300_99Alcoy 2177 1404 376 1773 397 7 1787 101 81.4 18.24 0.32

L_pneumo_ pneumophila_ATCC43290 2108 1398 261 1763 333 0 1731 95 83.6 15.8 0

L_pneumo_ Pneumophila_HL 2178 1400 342 1799 358 21 1763 96 82.6 16.44 0.96

L_pneumo_ Pneumophila_Lorraine 2162 1399 339 1812 342 8 1746 100 83.8 15.82 0.37

L_pneumo_ str. Corby 2179 1401 382 1780 393 6 1789 106 81.7 18.04 0.28

L_pneumo_ subsp. pneumophila str193Philadelphia 2145 1408 337 1800 342 3 1748 97 83.9 15.94 0.14

L_pneumo_ subsp. pneumophila str576Philadelphia 2109 1404 320 1783 325 1 1725 99 84.5 15.41 0.05

L_pneumo_ Thunder Bay 2167 1407 343 1814 350 3 1753 102 83.7 16.15 0.14

Page 115: Insight into intracellular bacterial genome repertoire using

115

L_ pneumophila str. Lens 2071 1393 319 1727 330 14 1726 89 83.4 15.93 0.68

L_ pneumophila str. Paris 2163 1398 353 1780 363 20 1772 86 82.3 16.78 0.92

Pangenome Legionella pneumophila (10 genomes) 21459 1572 346 19466 1881 112 2030 971 90.7 8.77 0.52

L_ pneumophila str. Lens 2071 1553 153 1888 163 20 1726 89 91.2 7.87 0.97

L_ pneumophila str. Paris 2163 1561 184 1942 194 27 1772 86 89.8 8.97 1.25

L_pneumo_ 2300_99Alcoy 2177 1565 215 1936 234 7 1787 101 88.9 10.75 0.32

L_pneumo_ pneumophila_ATCC43290 2108 1564 167 1937 171 0 1731 95 91.9 8.11 0

L_pneumo_ Pneumophila_HL 2178 1563 174 1962 190 26 1763 96 90.1 8.72 1.19

L_pneumo_ Pneumophila_Lorraine 2162 1562 169 1975 172 15 1746 100 91.4 7.96 0.69

L_pneumo_ str. Corby 2179 1564 216 1943 227 9 1789 106 89.2 10.42 0.41

L_pneumo_ subsp. pneumophila str193Philadelphia 2145 1570 174 1962 179 4 1748 97 91.5 8.34 0.19

L_pneumo_ subsp. pneumophila str576Philadelphia

LPE 2109 1566 158 1945 163 1 1725 99 92.2 7.73 0.05

L_pneumo_ Thunder Bay 2167 1569 181 1976 188 3 1753 102 91.2 8.68 0.14

Page 116: Insight into intracellular bacterial genome repertoire using

116

Additional file 4- Whole set Pan-genome summary based on OrthoMCL clustering and corresponding

information regarding core, accessory and unique genes in the organisms studies

Organisms

Proteins

used by

Orthomcl

Core

genes

Accessory

Cluster

Core

Cluster

Accessory

genes

Unique

genes

Total

cluster

No

Group Core Accessory Unique

Whole (30 genomes) 49833 627 2102 23500 25933 400 3130 5886 47.2 52.04 0.8

Coxiella burnetii (5 genomes) 6871 580 682 3575 3268 28 1290 1993 52 47.56 0.41

Coxiella burnetii CbuG_Q212 1359 572 598 721 636 2 1172 401 53.1 46.8 0.15

Coxiella burnetii CbuK_Q154 1394 574 616 725 668 1 1191 394 52 47.92 0.07

Coxiella burnetii Dugway 5J108-111 1414 577 640 693 702 19 1236 421 49 49.65 1.34

Coxiella burnetii RSA 331 1351 572 574 733 613 5 1151 313 54.3 45.37 0.37

Coxiella burnetii RSA 493 1353 575 606 703 649 1 1182 464 52 47.97 0.07

Francisella tularensis (12 genomes) 16596 578 8048 8539 8048 9 1308 2329 51.5 48.49 0.05

Francisella tularensis subsp. holarctica F92 1557 573 644 823 733 1 1218 192 52.9 47.08 0.06

Francisella tularensis subsp. holarctica FSC200 1248 570 572 638 610 0 1142 142 51.1 48.88 0

Francisella tularensis subsp. holarctica FTNF002-00 1362 565 550 640 584 0 1115 165 47 42.88 0

Francisella tularensis subsp. holarctica LVS 1534 576 657 775 758 1 1234 197 50.5 49.41 0.07

Francisella tularensis subsp. holarctica OSU18 1299 568 582 675 622 2 1152 180 52 47.88 0.15

Francisella tularensis subsp. mediasiatica FSC147 1243 565 568 629 614 0 1133 150 50.6 49.4 0

Francisella tularensis subsp. tularensis FSC198 1369 573 624 708 661 0 1197 236 51.7 48.28 0

Francisella tularensis subsp. tularensis NE061598 1512 576 662 787 722 3 1241 215 52.1 47.75 0.2

Francisella tularensis subsp. tularensis SCHU S4 1368 573 624 707 661 0 1197 236 51.7 48.32 0

Francisella tularensis subsp. tularensis TI0902 1322 572 631 651 671 0 1203 209 49.2 50.76 0

Francisella tularensis subsp. tularensis TIGB03 1393 572 632 704 689 0 1204 216 50.5 49.46 0

Francisella tularensis subsp. tularensis WY96-3418 1291 573 629 711 676 2 1204 191 55.1 52.36 0.15

Legionella (11 genomes) 23736 618 1468 9824 13640 272 2358 1078 41.4 57.47 1.15

L_ longbeachae NSW150 2277 608 1027 886 1181 210 1845 107 38.9 51.87 9.22

L_ pneumophila str. Lens 2071 610 1108 863 1200 8 1726 89 41.7 57.94 0.39

L_ pneumophila str. Paris 2163 610 1148 899 1251 13 1772 86 41.6 57.84 0.6

L_pneumo_ 2300_99Alcoy 2177 613 1168 890 1281 6 1787 101 40.9 58.84 0.28

L_pneumo_ pneumophila_ATCC43290 2108 612 1119 877 1231 0 1731 95 41.6 58.4 0

L_pneumo_ Pneumophila_HL 2178 611 1135 888 1273 17 1763 96 40.8 58.45 0.78

L_pneumo_ Pneumophila_Lorraine 2162 613 1127 944 1212 6 1746 100 43.7 56.06 0.28

L_pneumo_ str. Corby 2179 612 1171 888 1285 6 1789 106 40.8 58.97 0.28

Page 117: Insight into intracellular bacterial genome repertoire using

117

L_pneumo_ subsp. pneumophila str193Philadelphia 2145 612 1133 898 1244 3 1748 97 41.9 58 0.14

L_pneumo_ subsp. pneumophila str576Philadelphia 2109 614 1110 896 1212 1 1725 99 42.5 57.47 0.05

L_pneumo_ Thunder Bay 2167 611 1140 895 1270 2 1753 102 41.3 58.61 0.09

Legionella pneumophila (10 genomes) 21458 617 1351 8938 12458 62 2030 971 41.7 58.06 0.29

L_ pneumophila str. Lens 2071 610 1108 863 1200 8 1726 89 41.7 57.94 0.39

L_ pneumophila str. Paris 2163 610 1148 899 1251 13 1772 86 41.6 57.84 0.6

L_pneumo_ 2300_99Alcoy 2177 613 1168 890 1281 6 1787 101 40.9 58.84 0.28

L_pneumo_ pneumophila_ATCC43290 2108 612 1119 877 1231 0 1731 95 41.6 58.4 0

L_pneumo_ Pneumophila_HL 2178 611 1135 888 1273 17 1763 96 40.8 58.45 0.78

L_pneumo_ Pneumophila_Lorraine 2162 613 1127 944 1212 6 1746 100 43.7 56.06 0.28

L_pneumo_ str. Corby 2179 612 1171 888 1285 6 1789 106 40.8 58.97 0.28

L_pneumo_ subsp. pneumophila str193Philadelphia 2145 612 1133 898 1244 3 1748 97 41.9 58 0.14

L_pneumo_ subsp. pneumophila str576Philadelphia 2109 614 1110 896 1212 1 1725 99 42.5 57.47 0.05

L_pneumo_ Thunder Bay 2167 611 1140 895 1270 2 1753 102 41.3 58.61 0.09

Rickettsiella grylli 1155 554 459 654 459 42 987 38 56.6 39.74 3.64

Diplorickettsia massiliensis 1475 546 518 908 518 49 970 48 61.6 35.12 3.32

Page 118: Insight into intracellular bacterial genome repertoire using

118

Gene ID Cluster ID COG Functional describtion

12043713 OG5_126962 R hypothetical protein

12042690 OG5_127396 RTKL Serine/threonine protein kinase

12043131 OG5_127837 G Galactose mutarotase and related enzymes

12042957 OG5_129515 Q Probable taurine catabolism dioxygenase

12043183 OG5_131640 O Predicted redox protein, regulator of disulfide bond formation

12043061 OG5_131654 C Ferredoxin

12042314 OG5_132174 R FOG:Ankyrin repeat

12043224 OG5_133030 P 3'-Phosphoadenosine 5'-phosphosulfate (PAPS) 3'-phosphatase

12042814 OG5_134761 R FOG:Ankyrin repeat

12043977 OG5_136591 H SAM-dependent methyltransferases

12042324 OG5_136663 S Uncharacterized conserved protein

12043610 OG5_137437 T hypothetical protein

12042285 OG5_137732 R hypothetical protein

12042283 OG5_137790 IQR

Dehydrogenases with different specificities (related to short-chain alcohol

dehydrogenases)

12043203 OG5_138525 T

Response regulators consisting of a CheY-like receiver domain and a winged-helix

DNA-binding

12041993 OG5_141810 R PhoPQ-activated pathogenicity-related protein

12043260 OG5_142772 T FOG:CheY-like receiver

12043907 OG5_146647 R Soluble lytic murein transglycosylase and related regulatory proteins

12044061 OG5_146777 RTKL hypothetical protein

12043846 OG5_150445 S Uncharacterized protein conserved in bacteria

12044192 OG5_152528 T FOG:CheY-like receiver

12043032 OG5_153661 R hypothetical protein

12043044 OG5_156771 M Predicted choline kinase involved in LPS biosynthesis

12043182 OG5_158947 J contains the PP-loop ATPase domain

12043983 OG5_158957 T FOG:CheY-like receiver

12042792 OG5_164552 M UDP-glucose pyrophosphorylase

12043504 OG5_164798 S Uncharacterized protein conserved in bacteria

12042562 OG5_165570 D hypothetical protein

12042797 OG5_166276 M hypothetical protein

12043119 OG5_166572 R hypothetical protein

12043357 OG5_167967 R DNA primase (bacterial type)

12043647 OG5_170999 R Predicted periplasmic protein

12042005 OG5_171413 R hypothetical protein

12043834 OG5_172467 R hypothetical protein

12043019 OG5_175478 R Amino acid transporters

12043613 OG5_176228 H SAM-dependent methyltransferases

12043373 OG5_178450 TK DNA-binding HTH domain-containing proteins

12042871 OG5_178715 K Predicted nucleotide-binding protein containing TIR -like domain

12042514 OG5_181916 R hypothetical protein

12042318 OG5_185753 R hypothetical protein

12042907 OG5_191435 R hypothetical protein

12042962 OG5_204787 R hypothetical protein

12042193 OG5_211174 R hypothetical protein

12042682 OG5_211971 R hypothetical protein

12044129 OG5_215038 R hypothetical protein

12043970 OG5_228892 Q hypothetical protein

12043121 OG5_229846 R hypothetical protein

12043292 OG5_244660 R Uncharacterized protein conserved in bacteria

12044183 OG5_245288 R hypothetical protein

Additional file 5- Diplorickettsia massiliensis strain 20B description of unique genes

Page 119: Insight into intracellular bacterial genome repertoire using

119

Chapter 5

Conclusions

Page 120: Insight into intracellular bacterial genome repertoire using

120

Page 121: Insight into intracellular bacterial genome repertoire using

121

5.1 Conclusions and perspectives

Based on an endosymbiotic origin for mitochondria and other eukaryotic

organelles, we believe that the intracellular culture is ancient and

constantly co-evolving with the host. Comparative analyses of bacterial

genomes from different lifestyles, including free-living and host-

dependent bacteria, show that host-dependent bacteria exhibit fewer

transcriptional regulators. Lamarckian evolution may have played a role in

bacterial speciation events associated with a reduction in the genome

size, an observation that contradicts the dominant model, which assumes

that speciation and fitness gain are linked with an increase in the gene

repertoire. Intracellular bacteria possess mechanisms to protect or to

invade host cells. The interactions between intracellular bacteria and host

cells are enabled by Type IV secretion systems (T4SSs). These systems are

required for bacterial colonization, invasion and persistence within the

niche and are supra-molecular transporters ancestrally related to bacterial

conjugation systems.

The study of intracellular Gammaproteobacteria has contributed to our

understanding of bacterial specialization based on the ecological niche.

The genome size and gene content of the bacteria are associated with

lifestyle. A smaller number of genes and a relatively low G+C content

were observed in the genomes analyzed here, similar to other studies of

intracellular bacteria (Georgiades, et al., 2011). Gene loss resulting in a

Page 122: Insight into intracellular bacterial genome repertoire using

122

smaller genome size has been a driving force in the adaptation of these

bacteria to their hosts. Due to the reduction in the genomic repertoire,

we speculate that fewer lateral gene transfers occur in D. massiliensis

compared to other intracellular bacteria (Audic, et al., 2007). We used a

multi-genus pangenomic approach to characterize the genomic repertoire

of representative strains and compare the distribution of genes in

D. massiliensis strain 20B with other genomes. We found that majority of

the genes in D. massiliensis strain 20B were shared with other

gammaproteobacteria. A pangenomic approach facilitates the exploration

of different strategies by which facultative or obligate intracellular

bacteria adapt to particular hosts and contributes significantly to our

understanding of genome repertoires. This approach can be used to

uncover unique genomic features that cannot be predicted by

conventional methods. Moreover, our results suggest that the Legionella

strains could be re-classified based on their genomic variability. The

sequencing of additional intracellular bacterial genomes will enable the

acquisition of a more precise picture of the genetic properties associated

with the intracellular lifestyle. This effort will also contribute to a better

understanding of the interactions between intracellular bacteria and

different niches and the complex mechanisms implicated in pathogenicity.

Page 123: Insight into intracellular bacterial genome repertoire using

123

5.2 Future perspectives

Current knowledge barely scratches the surface of the diversity of these

intracellular bacteria and the complex host associations. Genomic studies

have shifted from looking only at genes and protein coding sequences to

exploring the entire genome. It will be interesting to learn more about

the genomic repertoire of emerging intracellular bacterial pathogens

because of its adverse roles. Genomic analyses will provide a springboard

for phylogenomic profiling, pangenomics, transcriptomics and

proteomics, which will ultimately enable better understanding of how

intracellular bacteria exploit their environment, and help to elucidate the

mysteries of pathogenicity among pathogenic intracellular bacteria.

Page 124: Insight into intracellular bacterial genome repertoire using

124

Page 125: Insight into intracellular bacterial genome repertoire using

125

Bibliography

Page 126: Insight into intracellular bacterial genome repertoire using

126

Page 127: Insight into intracellular bacterial genome repertoire using

127

Amiri, H., C. M. Alsmark, et al. (2002). "Proliferation and deterioration

of Rickettsia palindromic elements." Molecular biology and

evolution 19(8): 1234-1243.

Andersson, J. O. and S. G. Andersson (1999). "Insights into the

evolutionary process of genome degradation." Curr Opin Genet

Dev 9(6): 664-671.

Andersson, S. G., C. Alsmark, et al. (2002). "Comparative genomics of

microbial pathogens and symbionts." Bioinformatics 18 Suppl 2:

S17.

Andrews, H. L., J. P. Vogel, et al. (1998). "Identification of linked

Legionella pneumophila genes essential for intracellular growth

and evasion of the endocytic pathway." Infection and immunity

66(3): 950-958.

Aravind, L., R. L. Tatusov, et al. (1998). "Evidence for massive gene

exchange between archaeal and bacterial hyperthermophiles."

Trends in genetics : TIG 14(11): 442-444.

Arneodo, J. D., A. Bressan, et al. (2008). "Ultrastructural detection of an

unusual intranuclear bacterium in Pentastiridius leporinus

(Hemiptera: Cixiidae)." Journal of invertebrate pathology 97(3):

310-313.

Audic, S., C. Robert, et al. (2007). "Genome analysis of Minibacterium

massiliensis highlights the convergent evolution of water-living

bacteria." PLoS Genet 3(8): e138.

Baldridge, G. D., N. Y. Burkhardt, et al. (2007). "Transposon insertion

reveals pRM, a plasmid of Rickettsia monacensis." Appl Environ

Microbiol 73(15): 4984-4995.

Banks, D. J., S. B. Beres, et al. (2002). "The fundamental contribution of

phages to GAS evolution, genome diversification and strain

emergence." Trends in microbiology 10(11): 515-521.

Page 128: Insight into intracellular bacterial genome repertoire using

128

Beare, P. A., N. Unsworth, et al. (2009). "Comparative genomics reveal

extensive transposon-mediated genomic plasticity and diversity

among potential effector proteins within the genus Coxiella."

Infect Immun 77(2): 642-656.

Beckstrom-Sternberg, S. M., R. K. Auerbach, et al. (2007). "Complete

genomic characterization of a pathogenic A.II strain of Francisella

tularensis subspecies tularensis." PLoS One 2(9): e947.

Benson, D. A., I. Karsch-Mizrachi, et al. (2012). "GenBank." Nucleic Acids

Res 40(Database issue): D48-53.

Benson, G. (1999). "Tandem repeats finder: a program to analyze DNA

sequences." Nucleic Acids Res 27(2): 573-580.

Beranek, A., M. Zettl, et al. (2004). "Thirty-eight C-terminal amino acids

of the coupling protein TraD of the F-like conjugative resistance

plasmid R1 are required and sufficient to confer binding to the

substrate selector protein TraM." J Bacteriol 186(20): 6999-7006.

Berglund, E. C., A. C. Frank, et al. (2009). "Run-off replication of host-

adaptability genes is associated with gene transfer agents in the

genome of mouse-infecting Bartonella grahamii." PLoS genetics

5(7): e1000546.

Blanc, G., M. Ngwamidiba, et al. (2005). "Molecular evolution of

rickettsia surface antigens: evidence of positive selection."

Molecular biology and evolution 22(10): 2073-2083.

Blanc, G., H. Ogata, et al. (2007). "Lateral gene transfer between

obligate intracellular bacteria: evidence from the Rickettsia

massiliae genome." Genome research 17(11): 1657-1664.

Blanc, G., H. Ogata, et al. (2007). "Reductive genome evolution from the

mother of Rickettsia." PLoS genetics 3(1): e14.

Blatch, G. L. and M. Lassle (1999). "The tetratricopeptide repeat: a

structural motif mediating protein-protein interactions." BioEssays

: news and reviews in molecular, cellular and developmental

biology 21(11): 932-939.

Page 129: Insight into intracellular bacterial genome repertoire using

129

Bliven, K. A. and A. T. Maurelli (2012). "Antivirulence genes: insights

into pathogen evolution through gene loss." Infect Immun 80(12):

4061-4070.

Bordenstein, S. R. and W. S. Reznikoff (2005). "Mobile DNA in obligate

intracellular bacteria." Nature reviews. Microbiology 3(9): 688-

699.

Bork, P. (1993). "Hundreds of ankyrin-like repeats in functionally diverse

proteins: mobile modules that cross phyla horizontally?" Proteins

17(4): 363-374.

Boyd, E. F. and H. Brussow (2002). "Common themes among

bacteriophage-encoded virulence factors and diversity among the

bacteriophages involved." Trends in microbiology 10(11): 521-529.

Boyd, E. F., B. M. Davis, et al. (2001). "Bacteriophage-bacteriophage

interactions in the evolution of pathogenic bacteria." Trends in

microbiology 9(3): 137-144.

Braeken, L., B. Van der Bruggen, et al. (2006). "Flux decline in

nanofiltration due to adsorption of dissolved organic compounds:

model prediction of time dependency." The journal of physical

chemistry. B 110(6): 2957-2962.

Burns, D. L. (2003). "Type IV transporters of pathogenic bacteria."

Current opinion in microbiology 6(1): 29-34.

Casadevall, A. (2008). "Evolution of intracellular pathogens." Annual

review of microbiology 62: 19-33.

Casjens, S. (2003). "Prophages and bacterial genomics: what have we

learned so far?" Molecular microbiology 49(2): 277-300.

Caturegli, P., K. M. Asanovich, et al. (2000). "ankA: an Ehrlichia

phagocytophila group gene encoding a cytoplasmic protein antigen

with ankyrin repeats." Infection and immunity 68(9): 5277-5283.

Cazalet, C., L. Gomez-Valero, et al. (2010). "Analysis of the Legionella

longbeachae genome and transcriptome uncovers unique

strategies to cause Legionnaires' disease." PLoS Genet 6(2):

e1000851.

Page 130: Insight into intracellular bacterial genome repertoire using

130

Cazalet, C., C. Rusniok, et al. (2004). "Evidence in the Legionella

pneumophila genome for exploitation of host cell functions and

high genome plasticity." Nature genetics 36(11): 1165-1173.

Chen, I., P. J. Christie, et al. (2005). "The ins and outs of DNA transfer in

bacteria." Science 310(5753): 1456-1460.

Cho, N. H., H. R. Kim, et al. (2007). "The Orientia tsutsugamushi genome

reveals massive proliferation of conjugative type IV secretion

system and host-cell interaction genes." Proceedings of the

National Academy of Sciences of the United States of America

104(19): 7981-7986.

Christie, P. J. (2001). "Type IV secretion: intercellular transfer of

macromolecules by systems ancestrally related to conjugation

machines." Molecular microbiology 40(2): 294-305.

Christie, P. J. and J. P. Vogel (2000). "Bacterial type IV secretion:

conjugation systems adapted to deliver effector molecules to host

cells." Trends in microbiology 8(8): 354-360.

Claverie, J. M. and H. Ogata (2003). "The insertion of palindromic

repeats in the evolution of proteins." Trends in biochemical

sciences 28(2): 75-80.

Colson, P. and D. Raoult (2012). "Lamarckian evolution of the giant

Mimivirus in allopatric laboratory culture on amoebae." Frontiers

in cellular and infection microbiology 2: 91.

Corsaro, D., D. Venditti, et al. (1999). "Intracellular life." Critical reviews

in microbiology 25(1): 39-79.

D'Auria, G., N. Jimenez-Hernandez, et al. (2010). "Legionella

pneumophila pangenome reveals strain-specific virulence factors."

BMC genomics 11: 181.

Dai, L., N. Toor, et al. (2003). "Database for mobile group II introns."

Nucleic acids research 31(1): 424-426.

Darby, A. C., N. H. Cho, et al. (2007). "Intracellular pathogens go

extreme: genome evolution in the Rickettsiales." Trends in

genetics : TIG 23(10): 511-520.

Page 131: Insight into intracellular bacterial genome repertoire using

131

Darling, A. E., B. Mau, et al. (2010). "progressiveMauve: multiple

genome alignment with gene gain, loss and rearrangement." PloS

one 5(6): e11147.

Degnan, P. H., A. B. Lazarus, et al. (2005). "Genome sequence of

Blochmannia pennsylvanicus indicates parallel evolutionary trends

among bacterial mutualists of insects." Genome research 15(8):

1023-1033.

Deng, W., L. Chen, et al. (1999). "VirE1 is a specific molecular chaperone

for the exported single-stranded-DNA-binding protein VirE2 in

Agrobacterium." Molecular microbiology 31(6): 1795-1807.

Douglas, A. E. (1989). "Mycetocyte symbiosis in insects." Biological

reviews of the Cambridge Philosophical Society 64(4): 409-434.

Dunning Hotopp, J. C., M. E. Clark, et al. (2007). "Widespread lateral

gene transfer from intracellular bacteria to multicellular

eukaryotes." Science 317(5845): 1753-1756.

Fares, M. A., A. Moya, et al. (2004). "GroEL and the maintenance of

bacterial endosymbiosis." Trends in genetics : TIG 20(9): 413-416.

Fares, M. A., M. X. Ruiz-Gonzalez, et al. (2002). "Endosymbiotic bacteria:

groEL buffers against deleterious mutations." Nature 417(6887):

398.

Felsheim, R. F., T. J. Kurtti, et al. (2009). "Genome sequence of the

endosymbiont Rickettsia peacockii and comparison with virulent

Rickettsia rickettsii: identification of virulence factors." PloS one

4(12): e8361.

Fernandez-Moreira, E., J. H. Helbig, et al. (2006). "Membrane vesicles

shed by Legionella pneumophila inhibit fusion of phagosomes with

lysosomes." Infection and immunity 74(6): 3285-3295.

Finlay, B. B. and S. Falkow (1997). "Common themes in microbial

pathogenicity revisited." Microbiology and molecular biology

reviews : MMBR 61(2): 136-169.

Fournier, P. E., K. El Karkouri, et al. (2009). "Analysis of the Rickettsia

africae genome reveals that virulence acquisition in Rickettsia

Page 132: Insight into intracellular bacterial genome repertoire using

132

species may be explained by genome reduction." BMC genomics

10: 166.

Fournier, P. E., Y. Zhu, et al. (2004). "Use of highly variable intergenic

spacer sequences for multispacer typing of Rickettsia conorii

strains." Journal of clinical microbiology 42(12): 5757-5766.

Frank, A. C., H. Amiri, et al. (2002). "Genome deterioration: loss of

repeated sequences and accumulation of junk DNA." Genetica

115(1): 1-12.

Fraser-Liggett, C. M. (2005). "Insights on biology and evolution from

microbial genome sequencing." Genome research 15(12): 1603-

1610.

Friedland, J. S., R. J. Shattock, et al. (1993). "Phagocytosis of

Mycobacterium tuberculosis or particulate stimuli by human

monocytic cells induces equivalent monocyte chemotactic protein-

1 gene expression." Cytokine 5(2): 150-156.

Frost, L. S., R. Leplae, et al. (2005). "Mobile genetic elements: the agents

of open source evolution." Nature reviews. Microbiology 3(9): 722-

732.

Georgiades, K., M. A. Madoui, et al. (2011). "Phylogenomic analysis of

Odyssella thessalonicensis fortifies the common origin of

Rickettsiales, Pelagibacter ubique and Reclimonas americana

mitochondrion." PloS one 6(9): e24857.

Georgiades, K., V. Merhej, et al. (2011). "Gene gain and loss events in

Rickettsia and Orientia species." Biology direct 6: 6.

Georgiades, K. and D. Raoult (2010). "Defining pathogenic bacterial

species in the genomic era." Frontiers in microbiology 1: 151.

Georgiades, K. and D. Raoult (2011). "The rhizome of Reclinomonas

americana, Homo sapiens, Pediculus humanus and Saccharomyces

cerevisiae mitochondria." Biology direct 6: 55.

Gil, R., A. Latorre, et al. (2004). "Bacterial endosymbionts of insects:

insights from comparative genomics." Environmental microbiology

6(11): 1109-1122.

Page 133: Insight into intracellular bacterial genome repertoire using

133

Gimenez, G., C. Bertelli, et al. (2011). "Insight into cross-talk between

intra-amoebal pathogens." BMC genomics 12: 542.

Gross, R., J. Hacker, et al. (2003). "The Leopoldina international

symposium on parasitism, commensalism and symbiosis--common

themes, different outcome." Molecular microbiology 47(6): 1749-

1758.

Hooper, S. D. and O. G. Berg (2003). "On the nature of gene innovation:

duplication patterns in microbial genomes." Molecular biology and

evolution 20(6): 945-954.

Horn, M., A. Collingro, et al. (2004). "Illuminating the evolutionary

history of chlamydiae." Science 304(5671): 728-730.

Hu, B., G. Xie, et al. (2011). "Pathogen comparative genomics in the

next-generation sequencing era: genome alignments, pangenomics

and metagenomics." Brief Funct Genomics 10(6): 322-333.

Hyatt, D., G. L. Chen, et al. (2010). "Prodigal: prokaryotic gene

recognition and translation initiation site identification." BMC

Bioinformatics 11: 119.

Klasson, L., Z. Kambris, et al. (2009). "Horizontal gene transfer between

Wolbachia and the mosquito Aedes aegypti." BMC genomics 10:

33.

Koonin, E. V. (2009). "Darwinian evolution in the light of genomics."

Nucleic acids research 37(4): 1011-1034.

Koonin, E. V. (2010). "The origin and early evolution of eukaryotes in the

light of phylogenomics." Genome biology 11(5): 209.

Koonin, E. V. and Y. I. Wolf (2008). "Genomics of bacteria and archaea:

the emerging dynamic view of the prokaryotic world." Nucleic

acids research 36(21): 6688-6719.

Labrador, M. and V. G. Corces (1997). "Transposable element-host

interactions: regulation of insertion and excision." Annu Rev Genet

31: 381-404.

Li, J., A. Mahajan, et al. (2006). "Ankyrin repeat: a unique motif

mediating protein-protein interactions." Biochemistry 45(51):

15168-15178.

Page 134: Insight into intracellular bacterial genome repertoire using

134

Li, L., C. J. Stoeckert, Jr., et al. (2003). "OrthoMCL: identification of

ortholog groups for eukaryotic genomes." Genome Res 13(9):

2178-2189.

Lin, M., C. Zhang, et al. (2009). "Analysis of complete genome sequence

of Neorickettsia risticii: causative agent of Potomac horse fever."

Nucleic acids research 37(18): 6076-6091.

Lynn Margulis, R. F. (1991). Symbiosis as a Source of Evolutionary

Innovation: Speciation and Morphogenesis, The MIT Press.

Marco, D. (2008). "Metagenomics and the niche concept." Theory in

biosciences = Theorie in den Biowissenschaften 127(3): 241-247.

Margulis, L. (1971). "The origin of plant and animal cells." American

scientist 59(2): 230-235.

Margulis, L. (1971). "Symbiosis and evolution." Scientific American

225(2): 48-57.

Markowitz, V. M., I. M. Chen, et al. (2012). "IMG: the Integrated

Microbial Genomes database and comparative analysis system."

Nucleic Acids Res 40(Database issue): D115-122.

Mathew, M. J., G. Subramanian, et al. (2012). "Genome sequence of

Diplorickettsia massiliensis, an emerging Ixodes ricinus-associated

human pathogen." J Bacteriol 194(12): 3287.

Matthews, M. and C. R. Roy (2000). "Identification and subcellular

localization of the Legionella pneumophila IcmX protein: a factor

essential for establishment of a replicative organelle in eukaryotic

host cells." Infection and immunity 68(7): 3971-3982.

McCutcheon, J. P. and N. A. Moran (2007). "Parallel genomic evolution

and metabolic interdependence in an ancient symbiosis."

Proceedings of the National Academy of Sciences of the United

States of America 104(49): 19392-19397.

McCutcheon, J. P. and N. A. Moran (2012). "Extreme genome reduction

in symbiotic bacteria." Nature reviews. Microbiology 10(1): 13-26.

McNulty, S. N., J. M. Foster, et al. (2010). "Endosymbiont DNA in

endobacteria-free filarial nematodes indicates ancient horizontal

genetic transfer." PloS one 5(6): e11029.

Page 135: Insight into intracellular bacterial genome repertoire using

135

Mediannikov, O., Z. Sekeyova, et al. (2010). "A novel obligate

intracellular gamma-proteobacterium associated with ixodid ticks,

Diplorickettsia massiliensis, Gen. Nov., Sp. Nov." PLoS One 5(7):

e11478.

Medini, D., C. Donati, et al. (2005). "The microbial pan-genome." Curr

Opin Genet Dev 15(6): 589-594.

Merhej, V., C. Notredame, et al. (2011). "The rhizome of life: the

sympatric Rickettsia felis paradigm demonstrates the random

transfer of DNA sequences." Molecular biology and evolution

28(11): 3213-3223.

Merhej, V. and D. Raoult (2011). "Rickettsial evolution in the light of

comparative genomics." Biological reviews of the Cambridge

Philosophical Society 86(2): 379-405.

Merhej, V., M. Royer-Carenzi, et al. (2009). "Massive comparative

genomic analysis reveals convergent evolution of specialized

bacteria." Biology direct 4: 13.

Miao, E. A. and S. I. Miller (1999). "Bacteriophages in the evolution of

pathogen-host interactions." Proceedings of the National Academy

of Sciences of the United States of America 96(17): 9452-9454.

Mira, A., H. Ochman, et al. (2001). "Deletional bias and the evolution of

bacterial genomes." Trends in genetics : TIG 17(10): 589-596.

Moliner, C., P. E. Fournier, et al. (2010). "Genome analysis of

microorganisms living in amoebae reveals a melting pot of

evolution." FEMS microbiology reviews 34(3): 281-294.

Moran, J. V., R. J. DeBerardinis, et al. (1999). "Exon shuffling by L1

retrotransposition." Science 283(5407): 1530-1534.

Moran, N. A. (1996). "Accelerated evolution and Muller's rachet in

endosymbiotic bacteria." Proceedings of the National Academy of

Sciences of the United States of America 93(7): 2873-2878.

Moran, N. A. (2002). "Microbial minimalism: genome reduction in

bacterial pathogens." Cell 108(5): 583-586.

Moran, N. A. and P. Baumann (2000). "Bacterial endosymbionts in

animals." Current opinion in microbiology 3(3): 270-275.

Page 136: Insight into intracellular bacterial genome repertoire using

136

Moran, N. A., P. H. Degnan, et al. (2005). "The players in a mutualistic

symbiosis: insects, bacteria, viruses, and virulence genes."

Proceedings of the National Academy of Sciences of the United

States of America 102(47): 16919-16926.

Moran, N. A., H. E. Dunbar, et al. (2005). "Regulation of transcription in

a reduced bacterial genome: nutrient-provisioning genes of the

obligate symbiont Buchnera aphidicola." J Bacteriol 187(12): 4229-

4237.

Moran, N. A., J. P. McCutcheon, et al. (2008). "Genomics and Evolution

of Heritable Bacterial Symbionts." Annual Review of Genetics

42(1): 165-190.

Moran, N. A. and G. R. Plague (2004). "Genomic changes following host

restriction in bacteria." Current opinion in genetics & development

14(6): 627-633.

Moran, N. A. and J. J. Wernegreen (2000). "Lifestyle evolution in

symbiotic bacteria: insights from genomics." Trends in ecology &

evolution 15(8): 321-326.

Moriya, Y., M. Itoh, et al. (2007). "KAAS: an automatic genome

annotation and pathway reconstruction server." Nucleic Acids Res

35(Web Server issue): W182-185.

Mosavi, L. K., T. J. Cammett, et al. (2004). "The ankyrin repeat as

molecular architecture for protein recognition." Protein science : a

publication of the Protein Society 13(6): 1435-1448.

Nagai, H. and T. Kubori (2011). "Type IVB Secretion Systems of

Legionella and Other Gram-Negative Bacteria." Frontiers in

microbiology 2: 136.

Nakabachi, A., A. Yamashita, et al. (2006). "The 160-kilobase genome of

the bacterial endosymbiont Carsonella." Science 314(5797): 267.

Nora, T., M. Lomma, et al. (2009). "Molecular mimicry: an important

virulence strategy employed by Legionella pneumophila to subvert

host functions." Future microbiology 4(6): 691-701.

Ogata, H., S. Audic, et al. (2000). "Selfish DNA in protein-coding genes of

Rickettsia." Science 290(5490): 347-350.

Page 137: Insight into intracellular bacterial genome repertoire using

137

Ogata, H., S. Audic, et al. (2001). "Mechanisms of evolution in Rickettsia

conorii and R. prowazekii." Science 293(5537): 2093-2098.

Ogata, H., B. La Scola, et al. (2006). "Genome sequence of Rickettsia

bellii illuminates the role of amoebae in gene exchanges between

intracellular pathogens." PLoS Genet 2(5): e76.

Ogata, H., P. Renesto, et al. (2005). "The genome sequence of Rickettsia

felis identifies the first putative conjugative plasmid in an obligate

intracellular parasite." PLoS biology 3(8): e248.

Ogata, H., C. Robert, et al. (2005). "Rickettsia felis, from culture to

genome sequencing." Annals of the New York Academy of Sciences

1063: 26-34.

Ohnishi, M., K. Kurokawa, et al. (2001). "Diversification of Escherichia

coli genomes: are bacteriophages the major contributors?" Trends

in microbiology 9(10): 481-485.

Parola, P. and D. Raoult (2001). "Ticks and tickborne bacterial diseases

in humans: an emerging infectious threat." Clin Infect Dis 32(6):

897-928.

Pearson, T., H. M. Hornstra, et al. (2013). "When Outgroups Fail;

Phylogenomics of Rooting the Emerging Pathogen, Coxiella

burnetii." Systematic biology 62(5): 752-762.

Pellegrini, M., E. M. Marcotte, et al. (1999). "Assigning protein functions

by comparative genome analysis: protein phylogenetic profiles."

Proc Natl Acad Sci U S A 96(8): 4285-4288.

Perez-Brocal, V., R. Gil, et al. (2006). "A small microbial genome: the end

of a long symbiotic relationship?" Science 314(5797): 312-313.

Peterson, J., S. Garges, et al. (2009). "The NIH Human Microbiome

Project." Genome Res 19(12): 2317-2323.

Pilsczek, F. H., A. Nicholson-Weller, et al. (2005). "Phagocytosis of

Salmonella montevideo by human neutrophils: immune adherence

increases phagocytosis, whereas the bacterial surface determines

the route of intracellular processing." The Journal of infectious

diseases 192(2): 200-209.

Page 138: Insight into intracellular bacterial genome repertoire using

138

Rasko, D. A., M. J. Rosovitz, et al. (2008). "The pangenome structure of

Escherichia coli: comparative genomic analysis of E. coli

commensal and pathogenic isolates." J Bacteriol 190(20): 6881-

6893.

Renesto, P., H. Ogata, et al. (2005). "Some lessons from Rickettsia

genomics." FEMS microbiology reviews 29(1): 99-117.

Renvoise, A., V. Merhej, et al. (2011). "Intracellular Rickettsiales:

Insights into manipulators of eukaryotic cells." Trends in molecular

medicine 17(10): 573-583.

Rocha, E. P. (2003). "DNA repeats lead to the accelerated loss of gene

order in bacteria." Trends in genetics : TIG 19(11): 600-603.

Rocha, E. P. (2008). "Evolutionary patterns in prokaryotic genomes."

Curr Opin Microbiol 11(5): 454-460.

Rolain, J. M., M. Vayssier-Taussat, et al. (2013). "Partial disruption of

translational and posttranslational machinery reshapes growth

rates of Bartonella birtlesii." mBio 4(2): e00115-00113.

Roux, V., M. Bergoin, et al. (1997). "Reassessment of the taxonomic

position of Rickettsiella grylli." Int J Syst Bacteriol 47(4): 1255-

1257.

Rubtsov, A. M. and O. D. Lopina (2000). "Ankyrins." FEBS letters 482(1-

2): 1-5.

Saeed, A. I., V. Sharov, et al. (2003). "TM4: a free, open-source system

for microarray data management and analysis." Biotechniques

34(2): 374-378.

Saisongkorh, W., C. Robert, et al. (2010). "Evidence of transfer by

conjugation of type IV secretion system genes between Bartonella

species and Rhizobium radiobacter in amoeba." PloS one 5(9):

e12666.

Saridaki, A. and K. Bourtzis (2010). "Wolbachia: more than just a bug in

insects genitals." Current opinion in microbiology 13(1): 67-72.

Sassera, D., T. Beninati, et al. (2006). "'Candidatus Midichloria

mitochondrii', an endosymbiont of the tick Ixodes ricinus with a

Page 139: Insight into intracellular bacterial genome repertoire using

139

unique intramitochondrial lifestyle." International journal of

systematic and evolutionary microbiology 56(Pt 11): 2535-2540.

Schandel, K. A., M. M. Muller, et al. (1992). "Localization of TraC, a

protein involved in assembly of the F conjugative pilus." J Bacteriol

174(11): 3800-3806.

Schmitz-Esser, S., N. Linka, et al. (2004). "ATP/ADP translocases: a

common feature of obligate intracellular amoebal symbionts

related to Chlamydiae and Rickettsiae." J Bacteriol 186(3): 683-

691.

Schroder, G. and E. Lanka (2003). "TraG-like proteins of type IV secretion

systems: functional dissection of the multiple activities of TraG

(RP4) and TrwB (R388)." J Bacteriol 185(15): 4371-4381.

Sheppard, S. K., X. Didelot, et al. (2013). "Progressive genome-wide

introgression in agricultural Campylobacter coli." Molecular

ecology 22(4): 1051-1064.

Shigenobu, S., H. Watanabe, et al. (2000). "Genome sequence of the

endocellular bacterial symbiont of aphids Buchnera sp. APS."

Nature 407(6800): 81-86.

Simek, K., J. Pernthaler, et al. (2001). "Changes in bacterial community

composition and dynamics and viral mortality rates associated

with enhanced flagellate grazing in a mesoeutrophic reservoir."

Appl Environ Microbiol 67(6): 2723-2733.

Simser, J. A., M. S. Rahman, et al. (2005). "A novel and naturally

occurring transposon, ISRpe1 in the Rickettsia peacockii genome

disrupting the rickA gene involved in actin-based motility."

Molecular microbiology 58(1): 71-79.

Snipen, L., T. Almoy, et al. (2009). "Microbial comparative pan-genomics

using binomial mixture models." BMC genomics 10: 385.

Stepkowski, T. and A. B. Legocki (2001). "Reduction of bacterial genome

size and expansion resulting from obligate intracellular lifestyle

and adaptation to soil habitat." Acta biochimica Polonica 48(2):

367-381.

Page 140: Insight into intracellular bacterial genome repertoire using

140

Subramanian, G., O. Mediannikov, et al. (2012). "Diplorickettsia

massiliensis as a human pathogen." Eur J Clin Microbiol Infect Dis

31(3): 365-369.

Tamas, I., L. Klasson, et al. (2002). "50 million years of genomic stasis in

endosymbiotic bacteria." Science 296(5577): 2376-2379.

Tatusov, R. L., N. D. Fedorova, et al. (2003). "The COG database: an

updated version includes eukaryotes." BMC Bioinformatics 4: 41.

Toft, C. and S. G. Andersson (2010). "Evolutionary microbial genomics:

insights into bacterial host adaptation." Nature reviews. Genetics

11(7): 465-475.

van Belkum, A., S. Scherer, et al. (1998). "Short-sequence DNA repeats

in prokaryotic genomes." Microbiology and molecular biology

reviews : MMBR 62(2): 275-293.

van Ham, R. C., J. Kamerbeek, et al. (2003). "Reductive genome

evolution in Buchnera aphidicola." Proceedings of the National

Academy of Sciences of the United States of America 100(2): 581-

586.

Van Sluys, M. A., M. C. de Oliveira, et al. (2003). "Comparative analyses

of the complete genome sequences of Pierce's disease and citrus

variegated chlorosis strains of Xylella fastidiosa." J Bacteriol

185(3): 1018-1026.

Vogel, J. P., H. L. Andrews, et al. (1998). "Conjugative transfer by the

virulence system of Legionella pneumophila." Science 279(5352):

873-876.

von Dohlen, C. D., S. Kohler, et al. (2001). "Mealybug beta-

proteobacterial endosymbionts contain gamma-proteobacterial

symbionts." Nature 412(6845): 433-436.

Walsh, J. B. (1995). "How often do duplicated genes evolve new

functions?" Genetics 139(1): 421-428.

Wernegreen, J. J. (2002). "Genome evolution in bacterial endosymbionts

of insects." Nature reviews. Genetics 3(11): 850-861.

Page 141: Insight into intracellular bacterial genome repertoire using

141

Wernegreen, J. J. (2005). "For better or worse: genomic consequences of

intracellular mutualism and parasitism." Current opinion in

genetics & development 15(6): 572-583.

Wernegreen, J. J., A. B. Lazarus, et al. (2002). "Small genome of

Candidatus Blochmannia, the bacterial endosymbiont of

Camponotus, implies irreversible specialization to an intracellular

lifestyle." Microbiology 148(Pt 8): 2551-2556.

Werren, J. H., L. Baldo, et al. (2008). "Wolbachia: master manipulators

of invertebrate biology." Nature reviews. Microbiology 6(10): 741-

751.

Whitman, W. B. (2009). "The modern concept of the procaryote." J

Bacteriol 191(7): 2000-2005; discussion 2006-2007.

Wilcox, J. L., H. E. Dunbar, et al. (2003). "Consequences of reductive

evolution for gene expression in an obligate endosymbiont."

Molecular microbiology 48(6): 1491-1500.

Wren, B. W. (2000). "Microbial genome analysis: insights into virulence,

host adaptation and evolution." Nat Rev Genet 1(1): 30-39.

Wu, M., L. V. Sun, et al. (2004). "Phylogenomics of the reproductive

parasite Wolbachia pipientis wMel: a streamlined genome overrun

by mobile genetic elements." PLoS biology 2(3): E69.

Wu, S., Z. Zhu, et al. (2011). "WebMGA: a customizable web server for

fast metagenomic sequence analysis." BMC genomics 12: 444.

Yang, F., J. Yang, et al. (2005). "Genome dynamics and diversity of

Shigella species, the etiologic agents of bacillary dysentery."

Nucleic acids research 33(19): 6445-6458.

Zientz, E., T. Dandekar, et al. (2004). "Metabolic interdependence of

obligate intracellular bacteria and their insect hosts." Microbiology

and molecular biology reviews : MMBR 68(4): 745-770.

Page 142: Insight into intracellular bacterial genome repertoire using

142

Page 143: Insight into intracellular bacterial genome repertoire using

143

Acknowledgements

I thank God for providing me patience, persistence and perspiration. I

thank all people who stood with me in completing my thesis. I would not

have been able to achieve my thesis without the help and support of

countless people over the past three years.

I must express my sincere gratitude to my guide and director Professor

Didier Raoult, for his constant suggestions and guidance. I also would like

to thank him for creating a scientific environment at URMITE to learn and

improve my skills and also I would like to thank him for providing me the

financial help (AP-HM) to make my life easier in France.

I am indeed thankful to the core bioinformatics team for helping me in

solving various technical issues. I express my hearty thanks to Ghislain,

Gregory, Fabrice and Olivier for their constant support.

I would like to thank the reviewers of my thesis, Prof. Jérôme ETIENNE

and Prof. Max MAURIN for their scientific advises and detailed review

during the preparation of my thesis. There sincere suggestions indeed

helped me to improve my thesis. I thank Prof. Jean-Louis MEGE for his

support and honoring me by acting as the president of my thesis jury.

My thesis completion would have been harder without these guys. I owe

a special thanks to Catherine Robert and her team, especially Thi-Tien

Nguyen for teaching me molecular biology techniques. I remember here

their time and patience. I am thankful to Francine Simula, Valerie Filosa

and Sylvain Buffet for their administrative support and their constant

help.

In the field of genomics, when I was naïve and lost, Roshan Padmanabhan

guided me with various skills and methods needed. He not only gave me

the feedbacks, but many a times, he helped me in understanding the

problems and also in writing the manuscripts. Without his guidance and

constant feed backs, this PhD would not have been achievable.

Page 144: Insight into intracellular bacterial genome repertoire using

144

Page 145: Insight into intracellular bacterial genome repertoire using

145

My friends in France, India especially (Sagar, Vishal, Sijo, Mayur) and

others who are in different parts of the world were my sources of

laughter, joy, happiness and support come from. I am happy that, my

friendships with you have extended well beyond our shared times. I owe

a special thanks to all those guys for keeping me determined.

I need to give a special sincere thanks to my wife Ripsy, who stood by me

both in my happy and difficult times. Last but not least, I would like to

express my sincere gratitude to my mother Susamma Mathew, father M J

Mathew, brother Mithun, Mummy Daisy Chacko, Papa K V Chacko,

brother in law Rohind, my grandfather and my grandmother for their

unconditional love, blessing and support. I dedicate my thesis to three

important persons, my mummy (who is my first teacher & my

inspiration), my wife (who is my better half) and my papa (who

encouraged).