Conserved domains and evolution of secreted phospholipases A2

Secreted phospholipases A2 (sPLA2s) are lipolytic enzymes present in organisms ranging from prokaryotes to eukaryotes but their origin and emergence are poorly understood. We identified and compared the conserved domains of 333 sPLA2s and proposed a model for their evolution. The conserved domains were grouped into seven categories according to the in silico annotated conserved domain collections of ‘cd00618: PLA2_like’ and ‘pfam00068: Phospholip_A2_1’. PLA2s containing the conserved domain cd04706 (plant‐specific PLA2) are present in bacteria and plants. Metazoan PLA2s of the group (G) I/II/V/X PLA2 collection exclusively contain the conserved domain cd00125. GIII PLA2s of both vertebrates and invertebrates contain the conserved domain cd04704 (bee venom‐like PLA2), and mammalian GIII PLA2s also contain the conserved domain cd04705 (similar to human GIII PLA2). The sPLA2s of bacteria, fungi and marine invertebrates contain the conserved domain pfam09056 (prokaryotic PLA2) that is the only conserved domain identified in fungal sPLA2s. Pfam06951 (GXII PLA2) is present in bacteria and is widely distributed in eukaryotes. All conserved domains were present across mammalian sPLA2s, with the exception of cd04706 and pfam09056. Notably, no sPLA2s were found in Archaea. Phylogenetic analysis of sPLA2 conserved domains reveals that two main clades, the cd‐ and the pfam‐collection, exist, and that they have evolved via gene‐duplication and gene‐deletion events. These observations are consistent with the hypothesis that sPLA2s in eukaryotes shared common origins with two types of bacterial sPLA2s, and their persistence during evolution may be related to their role in phospholipid metabolism, which is fundamental for survival.


Introduction
Phospholipases A 2 Phospholipases A 2 (PLA 2 s; EC 3.1.1.4) are a group of lipolytic enzymes that hydrolyze the sn-2 bond of phospholipids, such as phosphatidylcholine and phosphatidylethanolamine, resulting in the release of fatty acid and lysophospholipid. They have been isolated from organisms ranging from bacteria to mammals and are suggested to have emerged early in evolution. In general, PLA 2 s are classified into three broad categories: Ca 2+ -dependent secreted PLA 2 s (sPLA 2 s); Ca 2+ -dependent cytosolic PLA 2 s; and Ca 2+ -independent cytosolic PLA 2 s. Cytosolic PLA 2 s are known to play an important role in cellular signalling and prostanoid metabolism [1,2]. Secreted PLA 2 s are components of various body fluids, including blood plasma, pancreatic juice, tears, seminal fluid, and snake and other venoms, and they participate in diverse physiological and pathological functions, such as in the digestion of dietary phospholipids, in inflammatory reactions and in the defence against bacteria and other pathogens [1][2][3][4][5][6][7].
Secreted PLA 2 s Secreted PLA 2 s are the product of distinct genes and they have been classified, according to their molecular structure, into the following groups: IA, IB, IIA, IIB, IIC, IID, IIE, IIF, III, V, IX, X, XIA, XIB, XIIA, XIIB, XIII and XIV [1]. Evidence of structural similarity among group (G)I, GII, GV and GX members was taken to suggest that they may form a distinct GI ⁄ II ⁄ V ⁄ X PLA 2 collection [8]. The GXIV PLA 2 s of bacteria and fungi differ in their primary structure and folding from the other sPLA 2 s [1].
Secreted PLA 2 s are small-molecular-mass proteins (14-18 kDa, 120-135 amino acid residues) that require the presence of Ca 2+ at millimolar concentrations for their catalytic activity. They share a conserved 3D structure that is stabilized by five to eight disulfide bonds. The structure of the GI, GII and GX PLA 2 s consists of three a-helices, a two-stranded b-sheet (the b-wing) and a conserved Ca 2+ -binding loop [9][10][11][12][13]. The conserved catalytic network of GIB and GIIA PLA 2 s consists of hydrogen-bonded side-chains formed by histidine, which is localized in the long a-helix1, tyrosine and aspartic acid residues, as well as the hydrophobic wall that shields it [14]. The sPLA 2 s of plants have a different 3D structure. In the GXIB PLA 2 of rice (Oryza sativa) the b-wing is absent and the C-terminal a-helix3 has a different orientation [15]. The sPLA 2 s of prokaryotes and fungi are characterized by a dominant a-helical fold [16]. The 3D structure of GIII PLA 2 differs from that of GI ⁄ II ⁄ V ⁄ X PLA 2 s but they share identical motifs [17].

Conserved domains of sPLA 2 s and their evolution
Functional motifs of sPLA 2 s that are well conserved include the Ca 2+ -binding and catalytic sites, and the conserved cysteine residues and disulfide bond pattern [1,2]. In silico database annotations identified two conserved signature patterns: the 'PA2_HIS (PS0018, Phospholipase A 2 histidine active site C-C-{P}-x-H-{LGY}-x-C', where x represents a nonconserved amino acid, and amino acids within brackets are not allowed) that contains the histidine residue of the sPLA 2 active site, and the 'PA2_ASP (PS00119, Phospholipase A 2

aspartic acid active site [LIVMA]-C-{LIVMFYWPCST}-C-D-{GS}-{G}-{N}-x-{QS}-C'
, where x represents a nonconserved amino acid, amino acids within curly brackets are not allowed and amino acids within square brackets are allowed) centred on the active site aspartic acid residue and localized towards the C-terminal portion of the molecule [18]. The Ca 2+ -binding motif, which, for example in mammalian GIB sPLA 2 s, is Y-x-G-x-G (where x represents a nonconserved amino acid) is localized before the histidine catalytic site towards the N-terminus.
Studies of sPLA 2 s have focussed mainly on eukaryotes [1][2][3][4]. However, the availability of molecular data from phylogenetically distant organisms, and the identification of new sPLA 2 s, challenges the current classification system, especially when applied to invertebrate and prokaryote sPLA 2 s [1,[19][20][21]. Despite their potential evolutionary relationships [22,23], the origin and emergence of sPLA 2 s is still intriguing. Although all sPLA 2 s share the same catalytic mechanism involving the canonical histidine residue, there is considerable variation in their sequence identity. Based on their identity, members of the GI ⁄ II ⁄ V ⁄ X PLA 2 collection have been designated 'conventional sPLA 2 s' and are believed to be evolutionally close, whereas GIII and GXII PLA 2 s are classified as 'atypical sPLA 2 s' and are evolutionally more distant [4,8]. In bacteria and fungi there are sPLA 2 s that show only limited sequence identity with other sPLA 2 s [1,16], and to date the evolutionary connection between these structurally distant proteins is lacking.
The present study aimed to contribute to the understanding of the origin and evolution of sPLA 2 s based upon the identification of their conserved domains in a representative collection of prokaryotes and eukaryotes. Recently, based on published data and the collection of well-annotated multiple sequence alignment models of their conserved domains, the family hierarchy of sPLA 2 s was comprehensively classified in two general collections: 'cd00618 PLA 2 _like: Phospholipase A 2 , a superfamily of secretory and cytosolic phospholipases A 2 ' and 'pfam00068: Phospholip_A2_1, Phospholipase A 2 ' [24,25]. In the present study, secreted PLA 2 protein sequences retrieved from public databases were classified according to their conserved domain structures. Their sequences were compared and specific motifs within each subfamily group were identified, and, based upon their sequence and structure similarity, a model for their origin and evolution was proposed.
All conserved domains of sPLA 2 s identified are present in representatives of the Kingdom Animalia with the exception of cd04706 of the sPLA 2 s of bacteria and plants (Table 1). Within this Kingdom, sPLA 2 s containing the conserved domain cd00125 are widespread in organisms ranging from basal metazoa (Porifera, Placozoa, Cnidaria and Rotifera), protostomes (Insects, Nematodes, Molluscs and Arthropods) and early deuterostomes (Echinodermata, Cephalocordata and Tunicata) to teleosts and tetrapods (Amphibia, Aves and Mammalia). The conserved domain cd00125 is typical of GIA, IB, IIA, IIB, IIC, IID, IIE, IIF, V and X sPLA 2 s. The conserved domain cd04707 was found in the N-and C-terminal regions of vertebrate otconins. The conserved domains cd04704 and cd04705 are present in metazoan GIII PLA 2 s, the latter domain exclusively in mammals. The sPLA 2 s of bacteria contain either cd04706 or pfam09056 (present in XIV PLA 2 s) and exceptionally pfam06951. In contrast, no sPLA 2 s were found in archaea. Pfam09056 is present in the sPLA 2 s of fungi. The majority of plant sPLA 2 s contained the conserved domain cd04706 (identified in XIA and XIB PLA 2 s). The conserved domain pfam06951 is present in the GXIIA PLA 2 s of unicellular and multicellular organisms, including marine algae and bacteria (Tables 1 and S1).

Conserved domain cd04706 of PLA 2 s of bacteria and plants
The sPLA 2 s of bacteria and plants share a number of conserved structural features for the histidine and aspartic acid catalytic motifs and Ca 2+ -binding domains (Fig. 1). The conserved domain cd04706 was identified in the sPLA 2 s of bacteria of the class Alphaproteobacteria of the phylum Proteobacteria and also in those of the phylum Firmicutes, which includes both Gram-negative and Gram-positive bacteria, such as the human pathogens Streptococcus pyogenes, Clostridium perfringens, Clostridium botulinum and Bacillus cereus [26] (Table S1). Secreted PLA 2 s containing cd04706 were also identified in numerous plants. The sPLA 2 s of O. sativa are well characterized and classified as GXI PLA 2 s [15,27]. In general, several GXI PLA 2 isoforms were identified in plants, which may have resulted from gene-duplication events, for example, four GXI PLA 2 s were identified in O. sativa (Table S1). The GXIB PLA 2 of O. sativa is a 16.6 kDa protein in which the Ca 2+ -binding site contains tyrosine, glycine and aspartic acid residues, and the histidine residue of the active site is centred in the catalytic site motif (Fig. 1A).
Otoconin ) ) ) ) + pfam06951 XIIA, XIIB ) a Group number according to the classification in Schaloske and Dennis [1]. b cd04705 is found only in mammalian GIII PLA 2 s. Sequence comparisons revealed that the plant (O. sativa) and bacterial (S. pyogenes) sPLA 2 s share 17% amino acid sequence identity. In both proteins, there is an asparagine residue in the C-terminal portion of the molecule, which probably contributes to the catalytic function [15], instead of the more commonly occurring aspartic acid residue. In the GXIB PLA 2 of O. sativa there are 12 cysteine residues that form six disulfide bonds [15], while the conserved domain of S. pyogenes sPLA 2 contains five cysteine residues, four of which (C 1 -C 2 and C 3 -C 5 ), align with the plant GXIB PLA 2 cysteines (C 4 -C 8 and C 9 -C 10 ) and form conserved disulfide bonds, and are suggestive of structural similarity (Fig. 1B).

Conserved domain cd00125 of PLA 2 of animals
The conserved domain cd00125 was identified in the basal invertebrates Placozoa up to mammals and was prevalent in the vertebrate GI ⁄ II ⁄ V ⁄ X PLA 2 collection where the functional sites and cysteine residues are highly conserved (Fig. 2). Members of this collection share 26-50% amino acid sequence identity within the conserved domain region. Among the members of the GI ⁄ II ⁄ V ⁄ X PLA 2 collection, the catalytic histidine is centred in the active site motif and the aspartic acid residue of the active site is present in the C-terminal portion of the molecule. In the Ca 2+ -binding site, Ca 2+ is bound by tyrosine and two glycines and an aspartic acid adjacent to the catalytic histidine (Fig. 2).
Homologues of GI ⁄ II ⁄ V ⁄ X PLA 2 collection members are present also in invertebrate genomes, such as the sea anemone Nematostella vectensis, where six genes were identified (Table S1) [28]. The sPLA 2 of the sea anemone Adamsia carciniopados (also called Adamsia palliata) contains the pancreatic loop characteristic of vertebrate GI PLA 2 s and lacks the C-terminal extension found in GII PLA 2 s but shares homology for both GI and GII PLA 2 s [19]. Comparison of the disulfide bond positions of the A. carciniopados sPLA 2 with those of the human GI ⁄ II ⁄ V ⁄ X PLA 2 collection indicates that the sea anemone sPLA 2 shares five conserved disulfide bonds with the vertebrate homologues, suggesting structural, and possibly functional, conservation across the phylogenetically distant organisms of Cnidaria and Mammalia (Fig. 3).

Conserved domain cd04707 of otoconins
Otoconins containing the conserved domain cd04707 were identified in a number of vertebrates, from fish to mammals, and two conserved domains within the N-terminal (OtoN) and C-terminal (OtoC) regions of the same collection were identified within the mature protein sequence (Tables 1 and S1). The human otoconin-90 is a 53 kDa protein, and the OtoN and OtoC domains are 37% identical and structurally closely related to the members of the GI ⁄ II ⁄ V ⁄ X PLA 2 collection, suggesting common ancestry (Fig. 2). The human OtoN and OtoC domains are 36% and 34% identical, respectively, when compared with GIB PLA 2 . Moreover, the disulfide bond patterns of human otoconin and GIB PLA 2 are identical (Fig. 3). Both OtoN and OtoC domains contain a histidine residue within the conserved catalytic site. However, they are believed to be catalytically inactive as a result of mutations in the Ca 2+ -binding sites (Fig. 2), leading to loss of the usual Ca 2+ -binding residues in the OtoN domain, and in the OtoC domain the conserved second glycine of the mammalian GIB sPLA 2 s (Y-x-G-x-G) is replaced with a glutamic acid, although PLA 2 activity remains to be investigated [29].

Conserved domain cd04704 of group III PLA 2 s
Secreted PLA 2 s, containing the conserved domain cd04704 (bee venom-like PLA 2 ), were identified in arthropods, reptiles and vertebrates, including humans. The honeybee Apis mellifera venom GIII PLA 2 shares considerable sequence identity with the members of the GI ⁄ II ⁄ V ⁄ X PLA 2 collection (e.g. 44% identity with the human GIB PLA 2 ) and is related in its 3D structure and catalytic mechanism to GI and GII PLA 2 s [11,17]. The backbone of the GIII PLA 2 molecule contains the conserved Ca 2+ -binding loop with tryptophan, glycine and aspartic acid residues and the conserved catalytic histidine and aspartic acid residues (Fig. 4). Human GIII PLA 2 is a 57 kDa protein in which cd04704 is localized in the middle part of the molecule and flanked by N-and C-terminal extensions [30]. The cd04704 of human GIII PLA 2 displays features similar to the arthropod GIII PLA 2 s, including the Ca 2+binding and catalytic site residues and also the 10 conserved cysteines that form five disulfide bonds at similar locations, and shares 44% sequence identity with honeybee venom GIII PLA 2 .

Conserved domain cd04705 of group III PLA 2
Proteins containing the conserved domain cd04705 (similar to human group III PLA 2 ) were found only in mammals ( Table 1). The conserved domain cd04705 is localized in the C-terminal part of the GIII PLA 2 molecule and is structurally unrelated to cd04704 (Fig. 2). In contrast to cd04704, cd04705 is considered to be catalytically inactive and its function is unknown [30].  There are four cysteines that form two putative disulfide bonds, including a bond that connects the catalytic and the Ca 2+ -binding sites (Fig. 5).
In addition to bacteria, sPLA 2 s containing pfam09056 were also identified in both unicellular and multicellular fungi. The sPLA 2 of the fungus Tuber borchii [32] is a 23 kDa protein that contains two conserved disulfide bonds analogous to those of the sPLA 2 of S. violaceoruber (Fig. 5), and the two conserved domains are 42% identical.
There are pfam09056-containing sPLA 2 s in aquatic invertebrates of the phylum Cnidaria, including the sea anemone N. vectensis and the hydrozoan Hydra magnipapillata. Cnidarian sPLA 2 s have the conserved catalytic histidine and aspartic acid residues but contain more numerous cysteine residues and putative disulfide bonds than the corresponding bacterial and fungal sPLA 2 s (Fig. 5).
Recently, expressed sequence tags (ESTs), coding for potential sPLA 2 s containing the conserved domain pfam09056, were identified in the protist Astrammina rara (phylum Foramifera) and the scallop Mizuhopecten yessoensis (phylum Mollusca) [33]. Previously, a related sPLA 2 was isolated from the venom of another mollusc -the marine snail Conus magus -and classified as GIX PLA 2 [34] but no conserved domain was identified for this protein in the present study, probably because of the incompleteness of its sequence. However, conserved histidine and aspartic acid activesite motifs are present in the pfam09056-containing sPLA 2    snail C. magus (Fig. 5). Furthermore, the GIX PLA 2 of C. magus shares 13-33% sequence identity with the bacterial, fungal and cnidarian pfam09056 PLA 2 s, but lower identity of 9-13% with the cd04706, cd04704 and cd00125 PLA 2 s.
Conserved domain pfam06951 of GXII PLA 2 s GXIIA PLA 2 was first cloned from human [35]. In the present study, the conserved domain pfam06951 was identified in the GXII PLA 2 s of a large number of vertebrate and invertebrate species, including the basal metazoans N. vectensis (Cnidaria), Trichoplax adhaerens (Placozoa) and Brachionus plicatus (Rotifera), as well as in the nonmetazoan eukaryotes (protists), such as the amoeba Naegleria gruberi (Heterolobosea), Monosiga ovata (Choanoflagellida), Euglena gracilis (Euglenozoa), Phytophthora infestans (Oomycetes), Thalassiosira pseudonana (Bacillariophyta), Capsaspora owczarzaki (Ichthyosporea), Chromera velia (Alveolata) and Thecamonas trahens (Apusozoa). Furthermore, pfam06951 was also identified in the sPLA 2 s of the marine algae Micromonas (Viridiplantae) and the prokaryote Planctomyces maris (Bacteria) ( Table S1). These observations demonstrate that the conserved domain pfam06951 is widely distributed not only in the sPLA 2 s of higher animals but also in the sPLA 2 s of simple eukaryotes and prokaryotes. Sequence alignment of the GXIIA PLA 2 conserved domains of organisms ranging from protists to mammals reveals high conservation of the Ca 2+ -binding and catalytic sites. The canonical his-tidine catalytic site C-C-x-x-H-x-x-C motif is highly conserved. In the aspartic acid catalytic motif, the cysteine and aspartic acid residues are also conserved, with the exception of E. gracilis (Euglenozoa), Micromonas pusilla (Viridiplantae) and P. infestans (Oomycetes), in which aspartic acid is replaced by glutamic acid and the first cysteine is absent in all the sequences (Fig. S1). Comparison of the human GXIIA PLA 2 (pfam06951) with GIB PLA 2 (cd00125) indicated 37% amino acid sequence identity.
Another gene product closely related to GXIIA PLA 2 is GXIIB PLA 2 [36]. Human GXIIB PLA 2 shares 46% sequence identity with GXIIA PLA 2 . GXIIB PLA 2 is a catalytically inactive protein as a result of the substitution of histidine with leucine in the catalytic site (Fig. 2).

Phylogenetic analysis of sPLA 2 s
Phylogenetic analysis of the conserved domains of sPLA 2 s, carried out with the maximum likelihood (ML) and neighbour-joining (NJ) methods, produced similar tree topologies, suggesting that the members of this protein family have a common and ancient evolutionary origin (Fig. 6). Two major sPLA 2 groups -the cd-collection (which includes the cd00125, cd04704, cd04705, cd04706 and cd04707 of unicellular and multicellular organisms) and the pfam-collection (which contains the sPLA 2 s with the annotated pfam06951 and pfam09056 domains) -exist and underwent distinct trajectories during evolution.  Table S1 for details) and using the NJ method. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches, and bootstrap branches lower than 50 were collapsed. Acap_56, Ajellomyces capsulata (Fungi); AmelIII_04, Apis mellifera (Insecta); Aory2_56, Aspergillus oryzae (Fungi); Bbac_56, Bdellovibrio bacteriovorus (Deltaproteobacteria); BtauIII_05, Bos taurus (Mammalia); CfamIII_05, Canis familiaris (Mammalia); Cglo_56, Chaetomium globosum (Fungi); Cimm_56, Coccidioides immitis (Fungi); Crei_06, Chlamydomonas reinhardtii (Viridiplantae); Dcar_06, Dianthus caryophyllus (Viridiplantae); Dgeo_56, Deinococcus geothermalis (Deinococci); Dmel_25, Drosophila melanogaster (Insecta); DmelXIIA_51, Drosophila melanogaster (Insecta); Fsp_56, Frankia sp. (Actinobacteria); GgalOtoN_07, Gallus gallus (Aves); Hmag_56, Hydra magnipapillata (Cnidaria); HsapIII_05, Homo sapiens  The sPLA 2 forms of bacteria cluster within each PLA 2 s subgroup, suggesting that the eukaryote and prokaryote sPLA 2 repertoire share a PLA 2 -like common ancestor molecule, which remains to be identified. The cd-collection contains the majority of the sPLA 2 sequences identified, and the clustering observed indicates that discrete relationships between the sPLA 2 protein groups exist (Fig. 6). Members of the cd04704 and cd04705 groups seem to have evolved separately from the other cd-collection members, such as cd00125 and cd04707, and have only been identified in animals.

Discussion
PLA 2 activity was first reported in canine pancreatic juice [37] and sPLA 2 s have now been isolated from a large number of snake and other venoms, and also from cells, tissues and body fluids of various unicellular and multicellular organisms [38][39][40]. Initially, sPLA 2 s were classified into two major groups of GI and GII PLA 2 s [41], but subsequent sequence homology and conserved disulfide bonds were observed within the other members of the GI ⁄ II ⁄ V ⁄ X PLA 2 col- lection and the related protein otoconin [42]. This classification has been expanded and refined over the past two decades as information from phylogenetically distinct organisms has accumulated at an accelerating rate [1,2,8,23]. The recent systematic identification of the conserved domains of protein molecules [24,25] allows a novel classification of the sPLA 2 s ('cd ⁄ pfam classification') based on the identification of the conserved domains, and a meaningful phylogenetic analysis of the whole range of sPLA 2 s. The aim of the present study was to elucidate the origin and evolution of the sPLA 2 s. The enormous evolutionary span between the organisms expressing sPLA 2 s (ranging from prokaryotes to metazoan eukaryotes) by necessity introduces a degree of uncertainty in the comparison of protein sequences between such distant phyla. When PLA 2 sequences containing conserved domains of different groups are compared, the molecular structure of a particular PLA 2 reflects the divergent or convergent evolution of the conserved domain structure and the evolutionary history of the organism in question. In the present study, we investigated the structural variability of the conserved domains of sPLA 2 s in a wide range of organisms, from bacteria to mammals. A total of 358 conserved domains were identified among 333 sPLA 2 sequences. In viruses, sPLA 2 -like proteins (GXIII PLA 2 ) have also been reported but their molecular structure, including the conserved domain and the enzymatic activity, differ from those of sPLA 2 s [43] and thus were not included in the current analysis.
Two main distinct forms of sPLA 2 s were identified in bacteria: those containing cd04706 (bacteria ⁄ plantspecific subfamily of PLA 2 ) and those containing pfam09056 (prokaryotic ⁄ fungal PLA 2 ). Notably, no sPLA 2 s were identified in the Archaea. A marked difference between Bacteria and Archaea is that the predominant lipid constituents of the archaean membranes are prenyl ether lipids, whereas the bacterial membranes contain acyl ester lipids [44][45][46]. The phospholipid metabolism of bacteria is driven by phospholipases that specifically hydrolyze the acyl ester bonds but are incapable of hydrolyzing the prenyl ether lipids. In light of this observation, it is hypothesized that the sPLA 2 s of the higher organisms may have their evolutionary origin in Bacteria rather than in Archaea.
Members of the sPLA 2 s are proposed to share a common ancestry and to have emerged early in evolution, and several theories based upon their sequence similarities and predicted molecular structure have suggested that they evolved rapidly. The GI and GII sPLA 2 s of Elapidae and Crotalidae snake venoms, respectively, have distinct molecular structures and it has been proposed that they share a common ancestor [22]. Snake venom sPLA 2 s are suggested to have evolved at an accelerated rate that has resulted in the presence of many variant PLA 2 molecules produced by the venom glands [40,[47][48][49].
A recent example of the rapid evolution of sPLA 2 genes is the bovine GIID PLA 2 . In cattle, five duplications have been identified, while a single gene copy is present in the human and rodent genomes. The bovine GIID PLA 2 s are expressed in the mammary gland and up-regulated during the lactating period, and are suggested to participate in the innate immune response [50]. In human and mouse, GIIA, GIIC, GIID, GIIE, GIIF and GV PLA 2 genes are also the result of geneduplication events within the same chromosome. In human they are localized in chromosome 1, whereas GIB and GX genes map to chromosomes 12 and 16, respectively. However, comparisons between the gene homologues in human and mouse reveal that speciesspecific events may occur; for example, the human GIIC PLA 2 is a pseudogene, whereas the mouse homologue encodes an enzymatically active protein [51]. Another example of the functional diversity within the members of the GI ⁄ II ⁄ V ⁄ X PLA 2 collection are the human and mouse GIIA PLA 2 s, which are efficient bactericidal enzymes involved in the innate immune response, whereas the closely related digestive enzyme GIB PLA 2 is only marginally bactericidal [6,52]. For example, in vitro assays demonstrated that Gram-positive bacteria are killed by sPLA 2 s and that human GIIA PLA 2 is highly potent in this respect, whereas GIB PLA 2 is of low efficacy [52].
In Cnidaria, a sister group of the vertebrate clade that diverged more than 500 million years ago [53,54], sPLA 2 s have been reported that structurally resemble the vertebrate GI and GII PLA 2 s [19][20][21]28]. Our current observations indicate that a cnidarian sPLA 2 [19] containing cd00125 has disulfide bonds at locations identical to those of the human GV PLA 2 and conserved in other members of the GI ⁄ II ⁄ V ⁄ X PLA 2 collection. Group I PLA 2 s of elapid snake venoms have lost the ancestral pancreatic loop present in the cnidarian sPLA 2 and also in the mammalian GIB PLA 2 . Such structural changes have resulted in the appearance of novel functions, including toxicity of the venom sPLA 2 s [55]. However, the number of cysteine residues and disulfide bonds of sPLA 2 s vary among closely related animals such as the sea anemones Adamsia carcinipados, Urticina crassicornis, Condylactis gigantea and N. vectensis [19][20][21]28,56]. The variation in the disulfide patterns and other structural features seems to preclude the exact placement of these cnidarian sPLA 2 s in the currently recognized groups of the GI ⁄ II ⁄ V ⁄ X PLA 2 collection [19][20][21].
In the current study, homologues of the vertebrate GXII members with highly conserved domain regions were identified for the first time in unicellular organisms, and their function in such organisms remains to be established. In human, GXIIA PLA 2 is expressed in T lymphocytes and seems to be involved in the regulation of the immune response [57]. Human GXIIB PLA 2 was recently shown to be involved in the triglyceride metabolism in the liver [58] and is proposed to activate specific receptors that remain to be identified [36].
Life as we know it can be divided into the prokaryotic (cellular organisms that lack a nucleus) Domains of Bacteria (Eubacteria) and Archaea (Archaebacteria), and the Domain of Eukaryota (organisms consisting of nucleated cells, such as animals, plants and fungi) [44]. Prokaryotes are the oldest cellular life forms on Earth, dating back 3.5-4 billion years and predating eukaryotes by 1 billion years. The current phylogenetic analysis of the conserved domains of sPLA 2 s of the representatives of the major prokaryote and eukaryote taxa supports the hypothesis that the sPLA 2 s of the eukaryotes, including the Metazoa, Viridiplantae and Fungi, may have shared a common origin with their homologues in bacteria. Two sPLA 2 groups (the cd-collection and the pfam-collection) emerged early in evolution and underwent distinct evolutionary trajectories. Based upon the retrieved data and phylogenetic relatedness, a model for the evolution of the two sPLA 2 s group members is proposed centred around gene-duplication and gene-deletion events (Fig. 7A,B). While the two members of the pfam-collection are present in representatives of the Eubacteria and Animalia kingdoms and maintained throughout evolution (Fig. 7B), the cd-collection members seem to have mainly emerged in the Animalia kingdom (Fig. 7A) and their expansion may be associated with the gene duplications that are proposed to have contributed to the increase in organismal complexity during eukaryote evolution [59]. The cd04706 protein members are exclusively found in Eubacteria and Plantae kingdoms and were lost from other life forms. Despite the lack of data from representatives of all clades and the failure to identify homologues it can be hypothesized that in the kingdom Animalia, sPLA 2 s with the conserved domains cd00125 and cd04704 were the first members to emerge. Subsequently, several independent gene or genome-duplication events occurred and resulted in the emergence of two novel family members of vertebrate cd04704 and mammalian cd04705. The sequence similarity observed between the vertebrate sPLA 2 s containing conserved domains cd04707 and cd00125 and also the mammalian cd04705 with cd04704 suggests that they have a shared common origin and that cd04707 emerged from a gene-duplication event of the cd00125-like ancestral gene at the time of vertebrate emergence and that cd04705 resulted from a later gene-duplication event of the cd04704-like ancestral gene precursor within the mammalian lineage (Fig. 7A). In contrast, only fungi, protists and aquatic invertebrates, such as those of the phyla Cnidaria and Mollusca, have acquired pfam09056-containing sPLA 2 s and the homologue gene seems to have been deleted from Protostomes and Deuterostome genomes (Fig. 7B).  The present analysis is based on the highly conserved peptide motifs directly involved in the catalytic function of sPLA 2 s, including the Ca 2+ -binding site and the catalytic centre. However, the functional roles of the surrounding domains are at present less well understood (e.g. the functions of the domains flanking the central catalytically active cd04704 of mammalian GIII PLA 2 s are unknown) [3] and their study may provide novel insight into the catalytic and noncatalytic functions of sPLA 2 s. For instance, binding of sPLA 2 s to specific cellular receptors is independent of their catalytic activity, and the protein domains involved in the receptor activation are not yet fully resolved [4]. Other examples are the toxicity of some snake venom PLA 2 s, which does not correlate with their catalytic activity [8], and the bactericidal effects of sPLA 2 s lacking catalytic activity [60].
It is concluded that the sPLA 2 s of eukaryotes share their evolutionary origin with two distinct types of bacterial sPLA 2 s. Their evolution and prevalence in genomes seems to be related to the functional constraints of phospholipid metabolism which is a fundamental and conserved process in organisms. Although relatively little is known about the function of prokayotic sPLA 2 s, the large number of distinct sPLA 2 isoforms in metazoans reflects the wide variation of substrate types encountered in the extracellular environment where the enzyme plays many important roles in nutrition, reproduction and immunity.

Database mining and data collection
Secreted PLA 2 sequences were retrieved from the publicly available protein databases of NCBI (http://www.ncbi.nlm.nih.gov) and Swiss-Prot (http://www.uniprot.org) using the Basic Local Alignment Search Tool (BLASTp) algorithm [61] and default settings. Database searches were performed using the peptide sequences of the human GIB PLA 2 (P04054), honeybee venom GIII PLA 2 (P00630), human GXIIA PLA 2 (Q9BZM1), O. sativa GXIB PLA 2 (Q9XG81) and S. violaceoruber PLA2 (Q6UV28). In addition, PLA 2 and PLA 2 -like protein sequences were identified in the Microbial and Eukaryotic Genome database (http:// www.ncbi.nlm.nih.gov/genome) following a similar strategy. Searches were also performed on nucleotide database data and covered all completed genomes in the NCBI genome database (1359 bacterial, 79 archaeal and 231 eukaryotic genomes; October 2010 release), and also available EST data using the tBLASTn and sequence matches with an e-value of < 10 were retrieved and their sequence analysed. The deduced protein sequences were obtained using the BCM Search Launcher (http://searchlauncher.bcm.tmc.edu/ seq-util/Options/sixframe.html) and compared with available homologue data.

In silico sequence annotations
The conserved domains of sPLA 2 were identified using the NCBI Conserved Domains Database CDD-27036 PSSMs (http://www.ncbi.nlm.nih.gov/cdd) annotation by using the sequences identified in this study as queries. The result includes an alignment between the query and the search model consensus sequence, the expected-value for the alignment, the identity (name) of the conserved domain and the location of the conserved domain in the query sequence [25]. Histidine active-site and aspartic acid active-site protein motifs were identified based on the PROSITE database (http://au.expasy.org/prosite) pattern annotation. The localization of disulfide bonds was retrieved from the Swiss-Prot database and from published data.

Sequence comparisons and phylogenetic analysis
Pairwise and multiple protein sequence alignments were carried out using the ClustalW2 program [62] available from EBI (http://www.ebi.ac.uk/Tools/msa/clustalw2) and the default parameters. The percentage of sequence similarities (based on the observed substitutions of one amino acid for another in homologous proteins) and identities between the sPLA 2 s were calculated based upon protein alignments using the GeneDoc interface (http:// www.psc.edu/biomed/genedoc). Phylogenetic analysis of the conserved domain sequences of bacterial, fungal and metazoan PLA 2 s was performed using 53 taxa representatives. The protein alignment produced was submitted to PROTTEST analysis (http://darwin.uvigo.es/software/prottest.html) to select the best model of protein evolution that fits the data set [63]. Phylogenetic analyses were conducted using the NJ [64] and ML methods, and reliability for internal branching was assessed using the bootstrap method [65]. NJ analysis was performed using MEGA4 programme [66] and the p-distance amino acid model with 1000 bootstrap replicates. All positions containing alignment gaps and missing data were eliminated (pairwise deletion option) and a total of 215 positions were analyzed in the final data set. The ML tree (PhyML, v3.0 aLRT) [67] was constructed in Phylogeny.fr web interface (http://phylogeny.lirmm.fr/phylo_cgi/index.cgi). The WAG substitution model was selected assuming an estimated proportion of invariant sites (of 0.167) and four gamma-distributed rate categories to account for rate heterogeneity across sites. The gamma shape parameter was estimated directly from the data (c = 1.804) and analysis was performed using 100 bootstrap replicates. Graphical representation and edition of the phylogenetic tree were performed with treedyn (v198.3, http://www.treedyn.org).
Both methods produced similar tree topologies and the NJ bootstrap consensus tree was selected and taxa clades with support values < 50% collapsed.

Supporting information
The following supplementary material is available: Fig. S1. Alignment of the conserved domain pfam06951 sequences of 20 GXIIA PLA 2 s. Table S1. Conserved domains of 333 prokaryotic and eukaryotic secreted phospholipases A 2 .
This supplementary material can be found in the online version of this article.
Please note: As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer-reviewed and may be re-organized for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.