18

Click here to load reader

A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

Embed Size (px)

Citation preview

Page 1: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

Chemoecology 17: 139–156 (2007)0937-7409/07/030139-18© Birkhäuser Verlag, Basel, 2007DOI 10.1007/s00049-007-0374-z

CHEMOECOLOGY

A new approach to graphical and numerical analysis of links betweenplant chemotaxonomy and secondary metabolism from HPLC datasmoothed by a simplex mixture designNabil Semmar1, Maurice Jay2 and Saïd Nouira1

1Institut Supérieur des Sciences Biologiques Appliquées de Tunis (ISSBAT), 9 Rue Zouhair Essafi, 1006 Tunis, Tunisia2Laboratoire de Phytochimie et Biologie Micromoléculaire, Université Claude Bernard, 43 BD 11 Novembre, 69100 Villeurbanne, France

Summary. HPLC analysis of secondary metabolitesrepresents an efficient tool for the studying of plant chemi-cal diversity under different aspects: chemotaxonomy, meta-bolomics, adaptative responses to ecological factors, etc.Statistical analyses of HPLC databases, e.g. correlationanalysis between HPLC peaks, can reliably provide infor-mation on the similarity/dissimilarity degrees between thechemical compounds. The similarities, corresponding topositive correlations, can be interpreted in terms of analo-gies between chemical structures, synchronic metabolismsor co-evolution of two compounds under certain environ-ment conditions, etc. . In terms of metabolism, positive cor-relations can translate precursor-product relationshipsbetween compounds; negative correlations can be indicative ofcompetitive processes between two compounds for a commonprecursor(s), enzyme(s) or substrate(s). Furthermore, the cor-relation analysis under a metabolic aspect can help to under-stand the biochemical origins of an observed polymorphismin a plant species. With the aim of showing this, we presenta new approach based on a simplex mixture design,Scheffé matrix, which provides a correlation network mak-ing it possible to graphically visualise and to numericallymodel the metabolic trends between HPLC peaks. The prin-ciple of the approach consisted in mixing individual HPLCprofiles representative of different phenotypes, then from acomplete mixture set, a series of average profiles were cal-culated to provide a new database with a small variability.Several iterations of the mixture design provided a smoothedfinal database from which the relationships between the sec-ondary metabolites were graphically and numericallyanalysed. These relationships were scale-dependent, namelyeither deterministic or systematic: the first consisted of amonotonic global trend covering the whole variation field ofeach metabolites’ pair; the second consisted of repetitivemonotonic variations which gradually attenuated or intensi-fied along a global trend. This new metabolomic approachwas illustrated from 404 individual plants of Astragaluscaprinus (Leguminoseae), belonging to four chemical phe-notypes (chemotypes) on the basis of flavonoids analysed intheir leaves. After smoothing, the relationships between

flavonoids were numerically fitted using linear or polyno-mial models; therefore the co-response coefficients wereeasily interpreted in terms of metabolic affinities or compe-titions between flavonoids which would be responsible ofthe observed chemical polymorphism (the four chemo-types). The statistical validation of the approach was carriedout by comparing Pearson correlations to Spearman correla-tions calculated from the smoothed and the crude HPLCdatabase, respectively. Moreover, the signs of the smoothedrelationships were finely supported by analogies and differ-ences between the chemical structures of flavonoids, lead-ing to fluent interpretation in relation to the pathwayarchitecture.

Key words. Flavonoids – HPLC profiles – simplex mixturedesign – smoothing – modelling – metabolomics – metabolicpathways - chemotypes – Astragalus caprinus

Introduction

Flavonoids are secondary metabolites that are largely spreadin angiosperms, gymnosperms, and ferns. They are usedby plants as feeding deterrents, feeding and pollinatorattractants, UV radiation and free radical protectors(Winkel-Shirley, 2002; Kanoun et al., 2001; Harborneand Baxter, 1999; Olsson et al., 1998; Harborne, 1994).Moreover, they have several biological activities, e.g. anti-oxidant, anti-inflammatory, anti-viral, anti-hypertensive,and cytotoxic (Oteiza et al., 2005; Shi et al., 2001; Lacaille-Dubois and Wagner, 1996).

Apart from their biological roles in the cell protection,flavonoids are known to be good chemotaxonomic markersof plant taxa. The chemotaxonomy consists in studyingthe chemical polymorphism of a biological species from astatistical classification of its individuals into homogeneouschemical groups, called chemotypes. From HPLC or GCdatabase, these chemotypes can be highlighted and validatedusing different multivatiate analyses, e.g. principal compo-nent, correspondence, cluster and discriminant analyses(Semmar et al. 2005; Fiehn et al., 2000). Each chemotypeCorrespondence to: Nabil Semmar, e-mail: [email protected]

374.qxd 7/24/2007 5:15 PM Page 139

Page 2: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

140 N. Semmar, M. Jay and S. Nouria CHEMOECOLOGY

can be attributed to the major regulation of a metabolicpathway leading to a characteristic pattern in which somechemical compounds are major elements by opposition toother minor ones.

Information about metabolite regulation can be analysedby combining chromatographic methods (HPLC, GC, TLC)with spectroscopic (IR, UV, mass spectrometry, NMR) orradioactivity detection techniques (Kose et al., 2001). Themultivariate database resulting from the chromatographicand/or spectroscopic analyses can be explored by multivariatestatistics which will highlight any clustering or similaritybetween the various samples (Lindon et al., 2006). In thesecases, the clustering provides information on a chemicaldiversification which can result from the expression of dis-tinct metabolic pathways in a whole biological population orsystem. Beyond descriptive analyses, mathematical model-ling of metabolism led to many algorithms that aimed toengineer biochemical pathways in a more general way(Mendes and Kell, 1998).

In this paper, we present a novel metabolomic approachwhich aims to extract numerical relationships between allthe metabolites from a single HPLC database. More pre-cisely, from a large HPLC database representing the chemo-types of a plant species, our aim consisted in evaluating theinter-dependence relationships between its compounds (e.g.flavonoids) in order to interpret the metabolic pathwayswhich would be at the origin of the observed chemical poly-morphism. This question was statistically approached bycombining the chemotypes of the studied plant species inorder to extract a qualitative and quantitative informationbackground from which the metabolic pathways could beeasily interpreted.

For this, a complete series of mixtures between chemo-types was carried out using a simplex mixture design,known as Scheffé matrix (Droesbeke et al., 1997; Sado andSado, 1991; Scheffé, 1960; Scheffé, 1958). Therefore, theHPLC profiles representative of different chemotypes, andcontributing to a same mixture, were averaged to give asmoothed common HPLC pattern. Several iterations of thismixture design led to very smoothed HPLC patterns whichserved as a basis to analyse graphically and numerically therelationships between the chemical compounds.

The graphical analysis made it possible to visualize theshapes, the precisions and the scale-dependence of relation-ships between compounds. This qualitative basic informationwas completed by a numerical modelling which providedco-response coefficients between compounds, leading tointerpret the relationships in terms of metabolic controlpoints.

This new approach was illustrated here, from a database of404 individual plants Astragalus caprinus (Leguminoseae)representing four chemotypes based on the flavonoids in theleaves (Semmar et al. 2005). The results were validated bycomparing parametric Pearson correlations calculated on thesmoothed database to non parametric Spearman correlationscalculated on the crude (initial) HPLC data (Semmar et al.,2001a; Camacho et al., 2005).

Moreover, the statistical results (correlations and co-response coefficients) given by the new approach a priori,were compatible with analogies and differences in the chemi-cal structures of the flavonoids, identified a posteriori.

Materials and methods

Sampling of plant Material

The sampling concerned 404 individual plants Astragalus caprinus(A.c.) (Leguminoseae), which were collected in Tunisia duringApril-May 1997–1999, because these months correspond to thematurity period of A.c. (Semmar et al., 2001a). The plants were leftto dry for five days in the shade. The first year (1997) concernedan intensive sampling of 304 individual plants which were col-lected in three geographical areas in Tunisia: north, center andsouth. This intensive sampling was completed by 100 new otherindividual plants which were collected from the threegeographical area during the years 1998 (50 plants) and 1999(50 plants), in order to check and to confirm the integrity of infor-mation of the first year, and also to increase the variability ofthe whole database. Some individual plants were collected duringeach year, and led to conclude that the results concerning a sameplant were identical whatever the sampling year. Concerning thethree geographical areas, they represent three different climates:from north to south, the atmospheric temperatures increase, andthe rainfall decreases (Semmar et al., 2005). This last informationwas considered to analyze later the effect of environment on thechemical results.

Extraction of flavonoids

From each of the 404 dried plants, 500 mg of leaves were weighedusing an electronic balance (Mettler PM600), then crushed intohomogeneous powder using a coffee grinder. From the foliar pow-der, the flavonoids were extracted twice under reflux using aLiebig condenser system (William, 2006), with 50ml (x2) of boil-ing MeOH-water (7:3) for 30min. The 404 hydro-alcoholicextracts (100ml) were filtrated, evaporated under reduced pressure,then each residue was concentrated in 5 ml of methanol 80% to beimmediately analyzed by HPLC.

HPLC analysis of flavonoids

The individual plant extracts were placed in the autosampler of aKontron liquid chromatograph (Autosampler 360, Pump 322);20µL of each concentrated methanolic residue were automaticallyinjected in an ultrabase C-18 column (5 µm, 200 × 4.6 mm).The flavonoids were eluted using a linear program combining twosolvants, water: acetic acid, 98: 2 (A) and acetonitrile: water: aceticacid 79: 19: 2 (B). The analysis was run for 130 min using an alter-nating gradient-isocratic program: 0 to 30 min linear gradient 10to 17% B, 30 to 45 min 17 % B, 45 to 100 min linear gradient 17to 35% B, 100 to 110 min 35 % B, 110 to 112 min linear gradient35 to 50% B, 112 to 114 min 50 % B, 114 to 118 min linear gradi-ent 50 to 10% B, 118 to 130 min 10 % B, end. The flow rate was0.8 ml/min and the detection was monitored with a UV/VIS diodearray detector (Waters 991) between 230 and 420 nm.Quantifications of the separated flavonoids (mg/g leaves dw) wereexpressed with reference to hyperoside used as internal standard,after an automatic integration of the HPLC peak areas (Semmaret al. 2001a).

HPLC data standardization

The HPLC chromatograms showed 14 major flavonoids (1-14)(Fig. 1). In each chromatogram, the areas of the 14 peaks were con-verted into relative values (percentages related to the whole area ofthe chromatogram). The relative value gives the regulated part ofeach flavonoid within the total flavonoid metabolism. For exam-ple, the two different amount profiles {1, 2, 3, 4} and {2, 4, 6, 8}correspond to the same regulation profile {0.1, 0.2, 0.3, 0.4}. Inother words, the expression of flavonoid amounts as percentages ofthe total flavonoid pool will allow the data to be interpreted interms of flux regulating points within the flavonoid metabolism.Moreover, from a methodological point of view, this normalizationon a scale between 0 and 1 makes it possible to process the data for

374.qxd 7/24/2007 5:15 PM Page 140

Page 3: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

Vol. 17, 2007 A new approach to graphical and numerical analysis 141

Compounds 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Chromatograms 1 9 16 26 2 5 4 7 6 8 6 3 2 4 2

2 26 8 23 7 5 4 4 5 3 3 4 2 4 2

3 5 29 20 3 4 4 7 6 11 6 1 1 2 1

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

402 10 11 22 4 22 4 6 4 6 5 2 1 2 1

403 11 2 8 10 3 0.5 1 1 1 0.5 17 13 19 13

404 23 18 27 4 4 3 4 3 5 3 1 2 2 1

∑=

i

i

i

iA

AR

Relative Area

Integrated area

of the peaks

0 10 20 30 40 50 60 70 80 90 100 110 120

1

2

3

4 56

7 9

8 10 1112

13

14

Time (min)

A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14

Relative area Ri

Standardization

HPLC

Chromatogram

with 14

compounds

Fig. 1 Standardization of HPLC chromatograms by converting the absolute areas of their 14 compound peaks into relative values (%).

374.qxd 7/24/2007 5:15 PM Page 141

Page 4: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

142 N. Semmar, M. Jay and S. Nouria CHEMOECOLOGY

simplex geometry and then to carry out mixtures of chro-matograms, a basic concept of the actual proposed approach.

Chemical structures of flavonoids

Among the fourteen flavonoids, ten (1, 2, 3, 5, 7, 8, 9, 10, 11, 13)were identified by 1H- and 13C- NMR (Fig. 2) (Semmar et al.,2001b, 2002a, 2002b). Three others (4, 12, 14) were identified bymass spectrometry and TLC control after acid and alkaline hydrol-ysis. The fourteenth (6) was only identified by UV spectra due toan insufficient pure amount required for NMR (Semmar et al.2005). All the compounds were flavonol glycosides and showed astrong structural diversification from their aglycones, their degreesof glycosylation, and sometimes their acylation. According to theaglycones, we identified:

a) kaempferol glycosides (2, 3, 5-10)b) one quercetin (1) and one methyl-3’-quercetin (or isorham-

netin) (4) glycosidesc) methylated forms at the 7- position, namely rhamnocitrin

(12, 13), rhamnetin (11), and rhamnazin (14) glycosides.

According to the glycosylation, there were diglycosides (11)to (14) (galactose-apiose), a triglycoside (5) (galactose-rhamnose-xylose), and two sets of tetraglycosides with a ramified sugarchain: compounds (1), (3) and (4) on the one hand, with (galactose-{[apiose], [rhamnose-xylose]}), and (2), (7-10) on the other hand,

with (galactose-{[rhamnose], [rhamnose-xylose]}). According tothe acylation, the glycosides (7-10) had an aromatic acyl (ferrulicor p-coumaric), and (11), (13), (14) had an aliphatic acyl (3-Me,OH glutaric).

The chemotaxonomy of Astragalus caprinus

In a previous work, we highlighted four chemotypes of A.c.(Fig. 3) on the basis of HPLC analysis of flavonoids in the leavesof 404 individual plants, and using multivariate analyses: clusteranalysis and correspondence analysis (Semmar et al. 2005). Eachchemotype resulted from major regulations of some specificflavonoids compared with:

− all the other flavonoids in a same plant, and− the same flavonoid(s) in all the other plants.

Chemotype I showed major regulations of the less polar com-pounds (11-14) which are methylated acylated diglycosides deriv-ing from quercetin and kaempferol. It presented two patternsin which compound (11) was either higher (chemotype Ia) or lower(chemotype Ib) than compound (12) respectively (Fig. 3).Chemotype II represented a major regulation of quercetin tetragly-coside (1). In chemotype III, the major compound was kaempferoltetraglycoside (2), with a good regulation of its acylated deriva-tives (6-10). Finally, chemotype IV represented a high regulationof kaempferol triglycoside (5).

Flavonoids R1 R2 R3 R4 R5 R6

1 OH OH api -rha 3-xyl H H

2 H OH rha -rha 3-xyl H H

3 H OH api -rha 3-xyl H H

4 OCH3 OH api -rha 3-xyl H H

5 H OH H -rha 3-xyl H H

7 H OH rha -rha 3-xyl p-coum H

8 H OH rha -rha 3-xyl H p-coum

9 H OH rha -rha 3-xyl fer H

10 H OH rha -rha 3-xyl H fer

11 OH OCH3 api 3-OH-3-CH3-glut H H

12 H OCH3 api H H H

13 H OCH3 api 3-OH-3-CH3-glut H H

14 OCH3 OCH3 api 3-OH-3-CH3-glut H H

api: D-apiosyl; rha: L-rhamnosyl; xyl: D-xylosyl; p-coum: p-coumaroyl; fer: ferruloyl; glut:

glutaroyl

O

O OO H

R2

OH

O

O -R3

R6-O

R5-O

R4-O

R1

Fig. 2 Chemical structures of flavonoids in the leaves of Astragalus caprinus.

374.qxd 7/24/2007 5:15 PM Page 142

Page 5: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

In term of environmental significance, these four chemotypesshowed different geographical distributions translating adaptationsto different climatic conditions: chemotype I was very abundant inthe south of Tunisia (warmer and less rainy climate); chemotype IIIand IV were relatively more present in the north (less hot and morerainy); chemotype II was especially well developed in the center(hot and rainy) (Semmar et al., 2005).

Preliminary trends between flavonoids based on Spearmancorrelations

After conversion of the HPLC peak area into relative values, the trendsbetween the pairs of flavonoids were preliminarily estimated by nonparametric Spearman correlations (ρ) (Jerrold, 1998):

where:

di = (rank of ai – rank of bi): difference between the ranks ofpeaks a and b in the chromatogram i.

n : sample size (n = 404 in our case).

Spearman correlations provide trend estimations which can beinterpreted in terms of opposition or affinity between metabolicregulations of flavonoid pairs (Semmar et al., 2001a; Camachoet al. 2005). With 14 variables (14 flavonoids), 91 correlations (ρ)need to be calculated; their significance was considered with alpharisk ≤ 1%. These non parametric correlations have been consideredas reference values for the comparison and validation of resultsobtained with the proposed smoothing approach.

Smoothing of trends between flavonoids based on a mixturedesign approach

The question consists now in simulating a data set from the initial(noised) HPLC database in order to extract smoothed relationships

Vol. 17, 2007 A new approach to graphical and numerical analysis 143

0

5

10

15

20

25

Re

lati

ve

am

ou

nts

(%

)

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Flavonoids

0

5

10

15

20

25

Re

lati

ve

am

ou

nts

(%

)

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Flavonoids

0

10

20

30

40

Re

lati

ve

am

ou

nts

(%

)

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Flavonoids

0

10

20

30

40

Re

lati

ve

am

ou

nts

(%

)

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Flavonoids

Chemotype IV

Chemotype Ia

Chemotype II

Chemotype Ib

0

5

10

15

20

25

30

35

Re

lati

ve

am

ou

nts

(%

)

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Flavonoids

Chemotype III

Fig. 3 Normalized stick diagrams representing the HPLC patterns of the four chemotypes I to IV of Astragalus caprinus, based on theleaves’ flavonoids.

ρ = 1 −6

n∑

i=1di

n3 − n

374.qxd 7/24/2007 5:15 PM Page 143

Page 6: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

144 N. Semmar, M. Jay and S. Nouria CHEMOECOLOGY

between flavonoids; such relationships will be examined for moreunderstanding of the plant metabolism organization. This questionwas solved on the basis of a complete set of average HPLC profilescalculated from the initial HPLC database, and representing asmoothed database from which the relationships betweenflavonoids were graphically visualized and statistically modelled.Moreover, by considering in advance a plant classification criteria(e.g. chemotypes), the average HPLC profiles corresponded tobarycentric profiles between plant groups (between chemotypes);this makes it possible for the results to be interpreted in terms ofplant diversity-generating processes.

Preparation of Scheffé mixture design

The individual HPLC profiles were at first classified into q chem-ically homogeneous groups (i.e. q = 4 chemotypes) representingthe chemical diversity of the studied plant species (Fig. 4a, 4b)(Semmar et al., 2005). Then, these q groups were mixed by apply-ing a mixture design, called Scheffé simplex matrix (Scheffé,

1963; Scheffé, 1958): this design consisted of a complete set of Nmixtures containing (each one) m (constant) elements issued fromthe q groups. In our case, the m elements of each mixture consistedof m HPLC profiles representing m individual plants belonging tothe q chemotypes (Fig. 4c). Each mixture was carried out by tak-ing randomly ni individuals from each group i (i = 1 to q), so that ∑ni = m (m constant and total number of individuals per mixture).

The number N of mixtures can be calculated from both thenumber q of groups and the total number m of individuals per mix-ture (Fig. 5a):

The mixture design (m, q) was generated using JMP statisticalsoftware (SAS, 1997).

Fig. 4 (a) Classification of the whole plant population in some distinct taxons according to the flavonoid HPLC patterns, before applica-tion of a mixture design: Four taxons (q = 4) were defined on the basis of four HPLC patterns; (b) The whole database of individual plantswas classified in relation to these four chemical taxons; (c) Statistical mixtures between the taxons were carried out using a mixture design(Scheffé matrix); each mixture combined m = 10 HPLC profiles corresponding to m = 10 individual plants representative of the q = 4 tax-ons, and taken randomly from the whole HPLC database.

N = (m + q − 1)!

(q − 1)!m!

374.qxd 7/24/2007 5:15 PM Page 144

Page 7: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

Vol. 17, 2007 A new approach to graphical and numerical analysis 145

Geometrical signification of Scheffé mixture design

In Scheffé matrix, each row can be represented by a point with qcoordinates ni corresponding to the contributions of the q mixedgroups i. The fact that the sum of each row is constant (=m), thecontributions ni of groups i (∑ni=m) can be converted into propor-tions wi (0≤ ≤1) by dividing ni on m, such as ∑wi = 1. This last rela-tionship allows the N mixture points of q groups (q coordinates) tobe geometrically represented within a (q-1) dimension space,called Scheffé simplex networks (Fig. 5) (Droesbeke et al., 1997;Sado and Sado, 1991; Scheffé, 1963; Scheffé, 1958). In these sim-plex networks, each apex represents a “pure mixture” containing melements of only one group i at once (100% of group i) (Fig. 5a).Each side of simplex combines (q-1) groups excluding thereforeone group; it represents all linear combinations between the (q-1)groups (Fig. 5b). Inside simplex, the space covers the linear com-binations between all groups excluding any null contribution ofone of them.

Application of Scheffé mixture design

To carry out Scheffé mixture design, the number m of individualplants per mixture was fixed to 10. This total of 10 was chosen

because (a) it is a reasonable high number, and (b) the totalnumber N of mixtures increases drastically with m.

For q = 4 groups (q = 4 chemotypes I-IV) and m = 10 individ-uals (m = 10 random individual plants) per mixture, there are N =286 mixtures to carry out (Fig. 5a), which represent all possiblecombinations between 10 random individual plants belonging tofour chemotypes. For example, the mixture (3, 4, 2, 1) consists of3 individual plants from chemotype I, 4 from chemotype II, 2 fromchemotype III and 1 from chemotype IV (Fig. 6a).

Calculation of average HPLC profiles as response ofScheffé mixture design

From a mixture of m individual plants (m elementary HPLC pro-files) representing the q=4 chemotypes, an average HPLC profilewas calculated as a smoothed response variable (Fig. 6b). Suchaverage HPLC profile contained 14 average flavonoid peaks eachone calculated from m=10 elementary values:

mixture points

)0,0,10

1,

10

9(

m = 10 and q = 4

1 1

m 3)0,0,

3

1,

3

2(

20!3!3

!6

!)!1(

)!1(==

-

-+

mq

mq

m = 3 and q = 4

286!10!3

!13

!)!1(

)!1(==

-

-+

mq

mq

10

9

8

7

6

5

4

3

2

1

0

9

8

7

6

5

4

3

2

1

0

8

7

6

5

4

3

2

1

0

7

6

5

4

3

2

1

0

6

5

4

3

2

1

0

5

4

3

2

1

0

4

3

2

1

0

3

2

1

0

2

1

0

1

00

0

0

0

0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1

1

1

2

2

2

2

2

2

2

2

2

3

3

3

3

3

3

3

3

4

4

4

4

4

4

4

5

5

5

5

5

5

6

6

6

6

6

7

7

7

7

8

8

8

9

910

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

0

1

2

3

4

5

6

0

1

2

3

4

5

0

1

2

3

4

0

1

2

3

0

1

2

0

10

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

00

(a)

(b)

1st class Contributions 2

nd class Contributions

3rd

class Contributions 4th class Contributions

)0,0,10

10,0(

)0,10

10,0,0(

)0,0,0,10

10(

1 1m 10

=

)10

10,0,0,0(

Fig. 5 Some simplex Scheffé networks (m, q) depending on both the number (q) of components and their mixture cardinal (m): (a) twotetrahedrons each representing three dimensional simplex networks (q-1 = 3) in which q = 4 groups are combined within mixtures contain-ing m = 3 and m = 10 elements, respectively; (b) One side of a tetraedric simplex network (m = 10, q = 4) showing the contributions of thefour groups in mixtures containing m = 10 elements.

Cps =

q∑

i=1

ni∑

j=1(Cps)ij

m

374.qxd 7/24/2007 5:15 PM Page 145

Page 8: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

146 N. Semmar, M. Jay and S. Nouria CHEMOECOLOGY

where:

C−

ps: mean of compound p calculated out of the m values of mix-ture s, s = 1, N.

(Cps)ij: j th value of compound p corresponding to the j th pro-file belonging to group i and contributing to mixture s.

p: index of the compounds in HPLC profile (p = 1, 14)s: index of the mixtures (rows) in Scheffé matrix (s = 1, 286)j: index of the profiles belonging to group i, (j = 1, ni ).ni: number of representative profiles of group i (0 ≤ ni ≤ m).i: index of the groups (i = 1, q).m: total and constant number of plants combined in a

same mixture (here m = 10).

Taking into account the contributions ni of each chemotype,this average profile calculated on m individual profiles correspondsalso to a barycentric profile of the q chemotypes.

Smoothing of average profiles by iteration of Scheffé matrix

The N (=286) average HPLC profiles resulting from the N distinctmixtures of Scheffé matrix represent N single responses where

only m (=10) individual plants contributed for each response(Fig. 6b). These single responses (286 average profiles) need to beiterated to take into account the inter-individual variability whichresults from the random change of the m mixed individual plants.Therefore, the 286 mixtures of Scheffé’s mixture matrix S wereiterated 50 times (Fig. 6c): at each iteration, each mixture s(n1, n2,n3, n4) (Σni=10) was carried out by tacking randomly m (=10) newindividual plants. This resulted in 50 response matrices each onerepresenting 286 single average profiles containing 14 averageflavonoid peaks. Finally, the 50 response matrices [50 × (286 ×14)] were averaged to give a smoother final response matrix(286 × 14) (Fig. 6d):

with:

C=

ps: Final mean value of flavonoid p in the mixture s, calculatedfrom 50 repetitions of this mixture.

Contributions ni of taxonsMixtures s Taxon I Taxon II Taxon III Taxon IV

s = 1 10 0 0 0

s = 2 9 1 0 0

s = 3 9 0 1 0

: : : : :

: : : : :

s = 92 3 4 2 1

: : : : :

: : : : :

s = N =286 0 0 0 10

50 iterations of response matrix

Response matrix

Scheffé's matrix

k=1

k=3

k=2

k=50.

..

Average profile

of 10 individual

profiles

Final

response

matrix

Average of 50

barycenter matrices

Flavonoids

1 2 … p ... 14

1

2

:

Barycenter :

s

profiles :

:

:

286

sC1 sC 2 psC… …sC14

Flavonoids

1 2 … p ... 14

1

Reference 2

:

barycenter :

s

profiles :

:

:

286

sC1 sC 2…

psC …sC 14

(a)

(b)

(c)

(d)

Fig. 6 Algorithm, based on Scheffé simplex matrix, for simulating a complete set of reference HPLC profiles from a complete set of mix-tures between four plant chemical taxons: (a) Scheffé matrix indicates the N possible mixtures (N = 286), each one combining m randomplants (m = 10) belonging to q taxons (q = 4). (b) The response variable issued from these mixtures consisted of a matrix containing Nbarycentric HPLC profiles of flavonoids, each one calculated from m random individual profiles making up the mixture. (c) The steps (a)and (b) were iterated 50 times to take into account the inter-individual variability of the random mixtures. (d) A unique final response matrixwas calculated by averaging the 50 iterated elementary response matrices cumulated in step (c).

=Cps =

50∑

k=1(Cps)k

50

374.qxd 7/24/2007 5:15 PM Page 146

Page 9: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

(C−

ps)k: kth elementary mean value of the flavonoid p correspondingto the kth iteration of mixture s (sth row of Scheffé matrix)

k: index of the iterations (k = 1 to 50).

Graphical and numerical analysis of smoothed relationshipsbetween flavonoids

From the final matrix of average HPLC profiles, scatter plots wereused to visualize the smoothed relationships between all theflavonoid pairs. These smoothed relationships were graphicallyanalyzed to highlight the kinds of relationships (sign, scale, shape)between flavonoids. These relationships were then numerically fit-ted using linear or polynomial models giving Pearson correlationsand co-response coefficients between the flavonoids (Kose et al.,2001). Taking into account the chemical polymorphism of the plantspecies, these results helped to discuss how the relative variationsof flavonoids and their inter-dependences led to the emergence ofthe different chemotypes in A.c. .

Results and Discussion

Preliminary correlations between flavonoids

The scatter-plot matrix representing all the flavonoid pairsshowed “noised” positive or negative trends due to a highvariability inherent to the initial HPLC database (Fig. 7a).Therefore, non parametric Spearman correlations (non sen-sitive to the high data dispersion) led to estimate the positiveand negative trends between all the pairs of flavonoids(Table 1) (Semmar et al., 2001a).

The positive Spearman correlations concerned three setof compounds:

a) set (2, 5, 6-10) representing kaempferol derivatives,b) set (1, 4) representing quercetin derivatives,c) set (11-14) representing methylated acylated derivatives.

These correlations concerned flavonoids which had asame aglycone and/or identical substitutions (Fig. 2).

The negative correlations concerned three oppositioncases:

a) the kaempferol derivatives (2, 5, 6-10) against thequercetin derivatives (1, 4),

b) the kaempferol derivatives (2, 5, 6-10) against themethylated acylated derivatives (11-14),

c) the kaempferol derivatives (2, 6-10) against thekaempferol derivative (3).

Furthermore, some Spearman correlation values showednon significant relationships, particularly between kaempferoltriglycoside (5) and both kaempferol tetraglycoside (3) and theacylated tetraglycosides of kaempferol (8-10).

Correlations between flavonoids after smoothing of theirrelationships

From the smoothed relationships resulting from the iteratedmixture design, the Pearson correlations between theflavonoid pairs were all significant with a risk α ≤ 1% .Their absolute values varied from 0.08 (flavonoid pair (2,5)) to higher than 0.9 (most cases) (Table 2). In general, thesmallest absolute values of Pearson correlations concerned

all the pairs of compound (5) (0.08 ≤ ≤ 0.56). The highestvalues concerned the flavonoids (6-10) between them on theone hand, and (11-14) between them on the other hand.

Comparisons between Pearson and Spearman correla-tions showed generally identical algebraic signs translatingcomparable results between final smoothed and initial crudedata, respectively. An exception concerned the relationshipsbetween compound (3) and compounds (2), (6-10): Pearsoncorrelations showed strong positive relationships, whereasSpearman correlations indicated slight negative trends.These two elements of information were complementaryrather than opposite, as it will be showed in the graphicalanalysis section. Finally, the absolute values of Pearson cor-relations were higher than Spearman’s, translating a directconsequence of the data smoothing (Tables 1, 2).

Graphical analysis of the smoothed relationships betweenflavonoids

The scatter plots of the smoothed HPLC data showedfine structures corresponding to strong positive or negativerelationships between the flavonoids (Fig. 7b). The datasmoothing was due to a compression and orientation of thesimplex space along the co-response direction of eachflavonoid pair. The compression level or thinness of the sim-plex space can be considered as a good graphical index of thecorrelation degree between the two concerned flavonoids.

The different trends (positive or negative) correspond-ing to these variations were compatible with analogies ordifferences between the chemical structures of compounds:negative trends opposed all kaempferol glycosides (2, 3, 5-10)to quercetin-derivative glycosides (1, 4). However, positivetrends concerned three blocs of flavonoids: (a) the kaempferolglycosides (2, 3, 5-10) between them, (b) the quercetin glyco-sides (1, 4) between them and (c) the methylated acylatedderivatives (11-14) between them.

Beyond a simple interpretation of correlation values, thescatter plots had the advantage to visualize the global shapesand the fine structures of relationships between flavonoids.A general examination of these smoothed relationshipsshowed three scale-dependent cases:

− Strong deterministic relationships corresponding tomonotonic trends covering all the variation space(Fig. 8a-8c). In this case, the simplex tetraedric spacewas compressed and reduced to a very thin orientedcontinuum of points.

− Double-scale relationships combining a global deter-ministic relationship with local systematic variations;local variations consisted of successive or repeatedblocs of points corresponding to regular variationswhich gradually intensified or attenuated along aglobal monotonic trend (Fig. 8d-8h). In these cases,the simplex tetraedric space was not entirely com-pressed giving a principal direction (the global trend)including internal repetitive variations (local system-atic variations). The sign of the local variations wasindependent of that of the global trend.

− Multi-directional relationships consisting of a tetrahe-dron shape in which the variations of points can beexamined under different directions (Fig. 8i). This

Vol. 17, 2007 A new approach to graphical and numerical analysis 147

374.qxd 7/24/2007 5:15 PM Page 147

Page 10: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

148 N. Semmar, M. Jay and S. Nouria CHEMOECOLOGY

Fig. 7 Scatter plot matrices showing the preliminary trends (a) and the smoothed relationships (b) between all pairs of flavonoids from theinitial HPLC database (a) and after iteration of the simplex mixture design (b).

374.qxd 7/24/2007 5:15 PM Page 148

Page 11: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

small compression of the simplex space could translatenon linear relationships between some flavonoids; thisis particularly true for compound (5) which showed thelowest Spearman and Pearson correlations (Tables 1, 2).

Strong deterministic relationships

Strong deterministic relationships consisted of scatter plotswith condensed points within a thin continuum shape. Thiscase concerned particularly the acylated glycosides ofkaempferol (6-10) between them (Fig. 7b, 8a), the scatterplots of which were very thin translating very precise rela-tionships. The thinness of clouds associated to high Pearsoncorrelations (0.96 ≤ ≤ 0.99) could indicate very synchronicmetabolisms of these compounds due to the same regulationgene(s) and/or enzyme(s). In fact, these acylated compoundshave very similar chemical structures consisting of akaempferol with a same tetraglycoside sequence, but they dif-fered only by the position or the nature of acyl moiety (p-coumaric or ferulic). Linear models fitting these relationships

provided more information about the regulation levels ofthese acyls; for the four identified acyls (7-10), the twelvelinear models were:

(7) = −0.98 + 1.42(8) (7) = −0.44 + 0.70(9) (7) = −0.88 + 1.20(10)(8) = 0.84 + 0.66(7) (9) = 0.91 + 1.35(7) (10) = 0.96 + 0.77(7)

(8) = 0.42 + 0.49(9) (8) = 0.07 + 0.85(10) (9) = −0.65 + 1.72(10)(9) = −0.70 + 2.00(8) (10) = −0.02 + 1.16(8) (10) = 0.43 + 0.57(9)

The positive slopes of the explained variables increasedin the sense: (8) < (10) < (7) < (9). The two first compounds(8), (10) were acylated at the 4-position (R6) of galactose,whereas the two last (7), (9) corresponded to an acylation atthe 3-position (R5) of galactose (Fig. 2). The higher slopesof (7) and (9) could translate a stronger regulation level ofthe acyl-3-galactose than acyl-4-galactose. Moreover, for asame substitution position at the galactose, the slopes ofcompounds acylated by a ferrulic acid were higher thanthose of acylated by a p-coumaric acid (slopes: (9) > (7) and

Vol. 17, 2007 A new approach to graphical and numerical analysis 149

Flavonoids

2 -0.43

3 -0.20 -0.19

4 0.56 -0.36 -0.22

5 -0.33 +0.15 NS -0.26

6 -0.31 +0,49 -0,19 -0,23 +0,11

7 -0.30 +0.48 -0.22 -0.22 +0.10 0,68

8 -0.32 +0.34 -0.14 -0.21 NS 0,75 +0.48

9 -0.29 +0.35 -0.14 -0.16 NS 0,42 +0.74 +0.52

10 -0.30 +0.24 -0.12 -0.16 NS 0,53 +0.44 +0.73 +0.61

11 +0.22 -0.36 -0.44 +0.20 -0.12 -0,32 -0.34 -0.24 -0.32 -0.23

12 +0.19 -0.30 -0.38 +0.20 -0.12 -0,35 -0.35 -0.26 -0.28 -0.26 +0.67

13 +0.16 -0.37 -0.31 +0.14 -0.13 -0,35 -0.34 -0.27 -0.27 -0.23 +0.66 +0.77

14 +0.11 -0.32 -0.33 +0.14 -0.10 -0,31 -0.30 -0.20 -0.20 -0.12 +0.67 +0.70 +0.76

1 2 3 4 5 6 7 8 9 10 11 12 13

NS: non significant correlation at α = 5%

Table 1 Spearman correlations between the 14 flavonoids calculated from the initial standardized HPLC database (%) of 404individual plants

Table 2 Pearson correlations between the 14 flavonoids calculated from the final smoothed data obtained by iterative com-binations of the initial data using a simplex mixture design

Flavonoids

2 -0.86

3 -0.76 0.63

4 0.89 -0.79 -0.96

5 -0.56 +0.08 +0.41 -0.43

6 -0.88 +0.92 +0.85 -0.94 +0.21

7 -0.88 +0.93 +0.86 -0.94 +0.22 +0.99

8 -0.84 +0.84 +0.93 -0.97 +0.25 +0.97 +0.97

9 -0.81 +0.83 +0.93 -0.96 +0.20 +0.96 +0.97 +0.99

10 -0.82 +0.81 +0.95 -0.97 0.26 +0.96 +0.96 +0.99 +0.99

11 +0.81 -0.73 -0.99 +0.98 -0.37 -0.91 -0.91 -0.97 -0.96 -0.98

12 +0.80 -0.73 -0.98 +0.97 -0.37 -0.91 -0.92 -0.96 -0.96 -0.97 +0.98

13 +0.81 -0.73 -0.99 +0.98 -0.37 -0.91 -0.92 -0.97 -0.97 -0.98 +0.99 +0.99

14 +0.80 -0.73 -0.99 +0.98 -0.36 -0.92 -0.92 -0.96 -0.97 -0.98 +0.99 +0.99 +0.99

1 2 3 4 5 6 7 8 9 10 11 12 13

374.qxd 7/24/2007 5:15 PM Page 149

Page 12: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

150 N. Semmar, M. Jay and S. Nouria CHEMOECOLOGY

(10) > (8)). These metabolic regulations can be illustrated bythe HPLC pattern of chemotype III (Fig. 3).

Strong deterministic relationships concerned also posi-tive correlations between kaempferol tetraglycoside (2) andthe kaempferol acylated tetraglycosides (6-10) (Fig. 7b, 8b).From the chemical structures, the compounds (6-10) can bedirectly issued by acylation of (2). The positive relation-ships between the compounds (2) and (6-10) were well fit-ted by second degree polynomials. Two numerical modelsrepresentative of all the others will be presented, namely (2)vs (7) and (7) vs (2):

(2) = 2.59 – 0.27(7) + 0.57(7)² (r = 0.95)(7) = + 0.51(2) – 0.013(2)² (r = 0.95)

The positive relationship of (2) vs (7) was due to a highsecond degree coefficient against a small negative firstdegree coefficient. However, the positive relationship of (7)vs (2) was due to a high first degree coefficient against asmall second degree coefficient. The first case (2 vs 7)showed that compound (2) increases when its acylatedderivative (7) reaches a high regulation level: this could cor-respond to a situation where compound (2) should be highlyregulated as a precursor, under high regulations of its acy-lated derivative(s) (7) (and 8-10). However the negative firstdegree coefficient (-0.27) could translate the signature of aslight metabolic pressure on compound (2) particularlysince its acylated derivative (7) is weakly regulated(<0.5%): the equation [-0.27(7) + 0.57(7)²] gives a negative

(a) (b) (c)

2

3

4

10

(%

)

2 3 4 5 6 7

9 (%)

1

2

3

4

5

7 (

%)

2 3 4 5 6 7 8 910 12 14 16 18

2 (%)

1

2

3

4

5

7 (

%)

4 5 6 7 8 9 10 12 14 16 18

1 (%)

(d) (e)

0

1

2

3

4

5

6

12

(%

)

0 1 2 3 4 5

11 (%)

0

1

2

3

4

14 (

%)

0 1 2 3 4 5

11 (%)

(f) (g) (h)

1

2

3

4

5

7 (

%)

20 30 40

3 (%)

2

3

4

5

6

7

8

9

4 (

%)

20 30 40

3 (%)

2

3

4

5

6

7

8

9

4 (

%)

4 5 6 7 8 9 10 1213 15 1718

1 (%)

(i)

20

30

40

3 (

%)

1 2 3 4 5 6 7 8 9 10

5 (%)

Fig. 8 Some scatter plots showing the three kinds of relationships between flavonoids smoothed by iteration of a simplex mixture designcombining the four chemotypes of Astragalus caprinus. (a-c) Strong deterministic relationships consisting of thin monotonic variation fieldsof flavonoid pairs. (d-h) Double-scale relationships showing both a global deterministic trend and local systematic variations betweenflavonoids. (i) A multi-directional relationship showing a non-linear relationship between two flavonoids

374.qxd 7/24/2007 5:15 PM Page 150

Page 13: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

balance if compound (7) has a regulation level < 0.5%.More generally, the negative first and the positive seconddegree coefficients were compatible with a situation wherecompound (2) is necessary to its acylated derivatives on theone hand, and is more and more favored by their increase onthe other hand. This metabolic relationship can be responsi-ble of the development of chemotype III (Fig. 3, 9).Concerning the relationship (7) vs (2), its high positive firstdegree coefficient showed that compound (2) would beinvested in the biosynthesis of (7) especially since the regu-lation level of (2) is lower than 39.23% (= 0.51/0.013).Beyond this level, the negative second degree term becomeshigher than the positive first degree term, resulting in a neg-ative balance of compound (7). This can translate an attenu-ation of the acylation process at the very high regulationlevels of compound (2): such attenuation could be due e.g.to a saturation of acyl-transferase by compound (2) in orderto limit (to control) the acylation levels.

A last case of strong deterministic relationship can beillustrated by negative relationships between quercetintetraglycoside (1) and the kaempferol tetraglycosides (acy-lated or no) (2), (6-10) (Fig. 7b, 8c). These negative rela-tionships are compatible with a structural dissimilarity

between the aglycones (quercetin and kaempferol) whichappears early in flavonoid metabolism. Therefore, after thebiosynthesis of quercetin and kaempferol, the glycosylationof these aglycons can be at the origin of two competitivemetabolic pathways for glycosyl-transferases. Numerically,the relationships between compound (1) and compounds(2), (6-10) were well-fitted by second degree polynomials.They will be illustrated by two cases, (1) vs (2) and (2) vs(1), representative of all the others:

(1) = 21.91 – 1.79 (2) + 0.05 (2)² (r = −0.89)

(2) = 18.31 – 1.00 (1) + 0.01 (1)² (r = −0.86)

The negative relationship between quercetin andkaempferol pathways can be numerically illustrated by highnegative first degree coefficients translating a metaboliccompetition which takes place from the lowest regulationlevels. This can be confirmed by the high intercept values(21.91 and 18.31) indicating that each compound can bene-fit from the “absence” of the other. Such strong metaboliccompetitions could contribute to reinforce the identities ofchemotypes II and III, i.e. the chemical polymorphism of

Vol. 17, 2007 A new approach to graphical and numerical analysis 151

Tri-GK, TetraGK : Triglycoside and Tetraglycoside of kaempferol ; TetraGQ : Tetraglycoside of quercetin

Dihydro-kaempferol

Kaempferol Quercetin

Glycosyl-transferases

Tri-GK 5

Tetra-GK 2

Rhamnosyl-transferase

Acylated Tetra-GK 6-10

Tetra-GQ 1

MethylatedTetra-GQ 4

Acyl-transferases

Methyl-transferases

Methylated acylated

diglycosides

11-14

Glycosyl-transférases

Acyl-transferases

Methyl-transferase

Chemotype I

Chemotype Ia

Tetra-GK 3

Apiosyl-trsf.

Chemotype II

Chemotype III

Chemotype IV

Fig. 9 Synthesis figure showing the link between the four chemotypes and the metabolic pathways of flavonoids in Astragalus caprinus

374.qxd 7/24/2007 5:15 PM Page 151

Page 14: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

152 N. Semmar, M. Jay and S. Nouria CHEMOECOLOGY

A.c. (Fig. 9). Concerning the second degree coefficients, theirvery weak positive values could translate that the two compet-itive compounds will tend to be less and less opposed as theirregulation levels reach higher values. This metabolicprocess could tolerate the expression of minor compoundsunder the major regulation of others.

Global deterministic trends with systematic variations

Some relationships between flavonoids showed a double-scale structure: in the scatter-plots, this consisted of manyblocs of points which were oriented along a global deter-ministic trend, and showing local systematic variationswhich progressively fade or intensify along the global space(Fig. 8d-8h). After numerical fitting, the co-response coeffi-cients of the global trend were used to quantify the globalaffinities or oppositions between two flavonoids within thewhole pool of all compounds. Complementary, the co-response coefficients of the repetitive systematic variationswere used to detect other relationships between the twoflavonoids, which would be hidden by the global effect ofthe metabolic pool.

A first double-scale structure relationship concerned themethylated compounds (11-14) sharing positive determinis-tic trends in which blocs of points showed positive or nega-tive systematic variations (Fig. 7b, 8d, 8e). Numericalfittings of these positive global trends showed strong linearmodels with correlation coefficients r> 0.98:

(11) = + 0.87(12) (11) = + 0.30(13) (11) = –0.12 + 1.41(14)(12) = 0.06 + 1.11(11) (13) = 0.06 + 3.30(11) (14) = 0.10 + 0.70(11) (12) = 0.03 + 0.34(13) (12) = −0.10 + 1.59(14) (13) = −0.37 + 4.70(14) (13) = + 2.93(12) (14) = 0.08 + 0.62(12) (14) = 0.09 + 0.21(13)

Apart from the positive slopes, the intercepts were eithernon significant or very near to zero, indicating very synchronicmetabolisms of these methylated derivatives. These strongpositive global trends could be attributed to cumulative effectsof glycosyl- and methyl-transferases which would be prof-itable to these methylated diglycosides. In fact the compounds(11-14) had the strong common point to be both diglycosides(with an apiose-galactose sequence), and methylated (at the 7-position of the aglycon), by opposition to all the other com-pounds which were generally non-methylated tri- ortetra-glycosides (Fig. 2). Moreover, compounds (11), (13) and(14) had a common methyl-glutaric acyl which can contributeto their positive global relationships. The synchronic metabo-lisms of compounds (11-14) represent a strong process whichwould be responsible for the development of chemotype I(Fig. 3, 9). Apart from this positive global trend, compounds(11) and (12) (Fig. 8d) showed negative systematic variationswhich corresponded to negative linear models with correlationcoefficients 0.5≤⏐r⏐≤0.7, slopes varying around -0.62, butwith intercepts increasing from 0.94-1.04 to 8.50-10.43 asthe regulation levels of (11) and (12) increase, respectively:

At the low levels of (12): (11) = 0.94 – 0.53(12)At the high levels of (12): (11) = 8.50 – 0.63(12)

At the low levels of (11): (12) = 1.04 – 0.62(11)At the high levels of (11): (12) = 10.43 – 0.63(11)

The negative slopes of these systematic variations cantranslate that independently of all the other flavonoids, com-pounds (11) and (12) are competitive, because they origi-nated from two (quercetin and kaempferol) competitivemetabolic pathways, respectively: the methylation at the 7-position of quercetin and kaempferol provides the aglyconsof compounds (11) and (12), respectively (Fig. 2). This localmetabolic competition between (11) and (12) can be graph-ically checked by the fact that compounds 11 and 14, deriv-ing together from quercetin, showed positive systematicvariations (Fig. 8e). Finally, the higher intercept of theexplained variable when the explanative variable increased,can mean that each compound tends to be more competitiveas the regulation level of the other increases, i.e. each com-pound tends to benefit more from any regulation “failure” ofthe other. In term of consequent chemical polymorphism,the global positive trend between compounds (11) and (12)would be responsible for their simultaneous expressionleading to chemotype I; however, their negative systematicvariations translated a local metabolic competition resultingin two patterns (Ia and Ib) within this chemotype (Fig. 3, 9).

A double-scale structure can also be illustrated bythe relationships between kaempferol tetraglycoside (3) and thekaempferol tetraglycosides (2, 6-10) (Fig. 8f): the global shapesof the clouds showed positive deterministic trends meaning acertain advantage shared between both these compounds (Fig.7b). Numerically, the global positive trends can be illustratedby the flavonoids’ pair (2, 3) which was better fitted usingsquare-root transformed second degree polynomials:

√(2) = −6.76 + 2.78√(3) – 0.19(3) (r=0.74)√(3) = −0.87 + 4.14√(2) – 0.58(2) (r=0.84)

The high positive first degree coefficients translated twometabolic regulations which benefited reciprocally fromeach other. This could be due to numerous structural simi-larities which allow these two kaempferol tetraglycosides totake advantage from a common metabolic pathway: in fact,compounds (3) and (2, 6-10) had both a common aglycon(kaempferol) and a same tri-glycoside sequence (xylose-rhamnose-galactose) (Fig. 2). This positive global trendbetween compound (3) and compounds (2, 6-10) can beillustrated by chemotype III (Fig. 3). However, the low neg-ative second degree coefficients translated a gradual attenu-ation of the positive global trend as the metabolicregulations increased. This could be due to a difference inthe fourth (lateral) sugar, rhamnose in (2) and apiose in (3),leading to two competitive sub-pathways between rhamno-syl- and apiosyl- transferases, respectively. In chemotypeIII, the effect of competition between compounds (2) and (3)could be illustrated by a high regulation level of (2) with amore moderate level of (3) (Fig. 3).

Moreover, this competition can be confirmed by nega-tive systematic variations which intensified at the higherregulation levels of compounds (2) and (3) (Fig. 8f). In fact,the linear models fitting these systematic variations showedmore and more negative slopes of compound (2) as the reg-ulation level of (3) increased, and vice versa:

374.qxd 7/24/2007 5:15 PM Page 152

Page 15: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

At lower levels of (3): (2) = 21.02 – 0.66(3) (r=−0.60)At middle levels of (3): (2) = 63.32 – 1.50(3) (r=−0.77)At higher levels of (3): (2) = 99.25 – 1.95(3) (r=−0.90)

At lower levels of (2): (3) = 22.65 – 0.26(2) (r=−0.60)At middle levels of (2): (3) = 39.73 – 0.39(2) (r=−0.77)At higher levels of (2): (3) = 49.92 – 0.42 (2) (r=−0.90)

In metabolic terms, this intensification could take place inthe case where the lateral sugar (rhamnose in (2), apiose in(3)) is attached on the galactose before the completebiosynthesis of the common triglycoside sequence (Fig. 2).In fact, an early substitution of the lateral sugar can leadto different molecular configurations which can generateenzymatic competitions between the two metabolic “sub-pathways” (3 and 2, 6-10) (Fig. 9). The signature of this sys-tematic competition can be illustrated by (Fig. 3):

− chemotypes I, II, IV, where compound (3) is moreimportant at the expenses of (2), and

− chemotype III which showed a better regulation ofcompound (2).

A third double-scale relationship can be illustrated by theflavonoid pairs (3, 1) and (3, 4) linking a kaempferoltetraglycoside (3) to a quercetin tetraglycoside (1) andisorhamnetin tetraglycoside (4) (methylated quercetintetraglycoside) (Fig. 8g): it consisted of a negative globaltrend with positive local variations. Numerically, the nega-tive global trend can be illustrated by the relationshipsbetween (3) and (4), which were fitted using second degreepolynomials:

(3) = 49.88 – 2.64(4) – 0.14(4)² (r= −0.96)(4) = 14.24 – 0.35(3) + 0.002(3)² (r= −0.96)

These negative relationships were mainly influenced by thefirst degree coefficients, meaning that compounds (3) and(4) are competitive especially since their regulation levelsare low. This could translate two competitive pathways dueto different aglycons of (3) and (4), i.e. kaempferol andquercetin, respectively: in fact, the biosynthesis pathways ofthese aglycons are known to be competitive for a commonprecursor, the dihydro-kaempferol (Fig. 9). Concerning thesecond degree coefficient of (3) vs (4), its negative value(−0.14) could mean that compound (4) remains competitiveagainst (3) even at high regulation levels. This metaboliccompetition in favor of compound (4) can be illustrated bythe HPLC pattern of chemotype Ia, where the quercetinteraglycosides (4) and (1) are well developed in presence ofrelatively low levels of kaempferol tetraglycoside (3)(Fig. 3). However, in the polynomial model (4) vs (3), thesecond degree coefficient was slightly positive (+0.002)meaning a slight attenuation of the competition of (3)against (4) as the regulation level of (3) increases. This canbe confirmed by the systematic variations of (4) vs (3)which were significantly positive at the highest regulationlevels of (3), allowing compound (4) to take advantage(Fig. 8g): these local variations were fitted by linear models

the slopes of which were significantly positive (0.10 to 0.17)especially when the regulation of (3) was higher than 30%;bellow 30% of (3), the slopes of the linear model (4) vs (3)became weak or non significant. The significantly positiveslopes could be linked to the common tetraglycosylation of(3) and (4), and more particularly to the presence of apiosein both these two compounds (Fig. 2): among all thekaempferol glycosides, compound (3) is specifically charac-terized by the presence of apiose; therefore, high regulationlevels of (3) within the kaempferol pathway are compatiblewith an important activity of apiosyl-transferase; this couldbe indirectly advantageous for the quercetin derivatives (4)and (1) which have both apiose and a similar glycosidesequence as (3) (Fig. 2). This metabolic advantage concern-ing compounds (3), (1), (4) can be illustrated by the HPLCpattern of chemotype II (Fig. 3).

A fourth double-scale relationship can be illustrated bythe pair of quercetin derivatives (1, 4), which presented bothglobal and local positive relationships (Fig. 8h). The posi-tive global relationship was well fitted by second degreepolynomials:

(1) = 2.11 + 2.74(4) – 0.12(4)² (r = 0.81)

(4) = 1.78 – 0.15(1) + 0.03(1)² (r = 0.83)

In the equation of (1) vs (4), the positive correlation was dueto a high first degree coefficient (+2.74), translating animportant contribution of the low regulations of (4) to thatof (1): compared with compound (1), compound (4) has thesame chemical structure but with just an additional methy-lation at the 3’-position of the aglycon (Fig. 2); in this sense,a low activity of methyl-transferase converting slightlycompound (1) into (4), could stimulate the regulation of (1)as a needed precursor of (4). This advantage of (1) at the lowregulation levels can be confirmed by the positive system-atic variations where compound (1) increased faster than (4)(Fig. 8h). However, the low negative (-0.12) second degreecoefficient showed that the positive relationship (1) vs (4)can be attenuated as the regulation levels of (4) increase: theincrease of methyl-transferase activity could be advanta-geous for (4) at the expense of (1). This can be illustrated bychemotype Ia showing the highest regulation levels of com-pound 4 (Fig. 3, 9).

Concerning the polynomial model (4) vs (1), its positivecorrelation was due to a positive (+0.03) second degreecoefficient; this can mean that compound (1), as precursorof (4), becomes more beneficial for (4) as its regulation lev-els increase. However, the first degree coefficient was neg-ative (−0.15) meaning that compound (1) has to be regulateduntil a certain sufficient level to be favorably converted into(4): in fact, the balance between the first (-0.15) and second(+0.03) degree coefficients becomes positive (in favor ofcompound (4)), when the regulation level of (1) is higher than5% . This can be illustrated by the chemotypes Ia and II, wherecompound (1) is well regulated (> 5%) leading to a goodexpression of its derivative (4) (Fig. 3, 9). Moreover, thechemotypes Ib, III and IV in which the levels of (1) < 5%showed negligible regulations of (4) (Fig. 3).

Vol. 17, 2007 A new approach to graphical and numerical analysis 153

374.qxd 7/24/2007 5:15 PM Page 153

Page 16: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

154 N. Semmar, M. Jay and S. Nouria CHEMOECOLOGY

Multi-directional relationships

A last class of relationships between flavonoids consisted ofmulti-directional relationships in which the simplex spacewere not compressed and not clearly oriented. These casesconcerned particularly compound (5) toward all the otherflavonoids (Fig. 7b, 8i). Indeed, all pairs containing com-pound (5) showed less thin variation field compared withthe other pairs. This can be due to the fact that this com-pound had the simplest chemical structure which makes itpossible to be easily transformed into different compounds:in fact, compound (5) is a kaempferol triglycoside whichcan be transformed into different tetraglycosides i.e. com-pounds (2), (3), (6-10). Consequently, compound (5) can beexposed to many transformations resulting in the biosynthe-sis of different kaempferol tetraglycosides, leading thus toits multi-directional variation. This can be illustrated by therelationship between compounds (5) and (3), including threestates (Fig. 8i): (a) at the low levels of (5), compound (3)can highly increase translating an important biosynthesisof (3) from (5); this could occur from a strong activity of

apiosyl-transferase (Fig. 2, 9). Inside the tetrahedron scatterplot, many horizontal blocs of points showed that compound(5) can increase after its derivative (3) reaches a certain reg-ulation level (b). A third relationship consisted of a positiveco-evolution of these two compounds (c), but relativelymore beneficial for compound (5); in other words, com-pound (5) increases simultaneously with compound (3), butmore rapidly than in (a). This can be due to three possiblemetabolic and/or cellular processes:

− an increase of glycosyl-transferase(s) responsible ofbiosynthesis of (5);

− an efficient accumulation of compound (5) due to acompartmentation process;

− a decrease of apiosyl-transferase activity resulting in arelative diminution of biosynthesis of compound (3)from (5) (thus, more accumulation of (5) ).

These metabolic states where compound (5) and (3)had major regulations can be illustrated by chemotype IV(Fig. 3, 9).

1

3

5

7

9

11

13

15

17

19

2 (

%)

2 3 4 5 6 7 8 9 10

5 (%)

1

3

5

7

9

11

13

15

17

19

2 (

%)

2 3 4 5 6 7 8 9 10

5 (%)

1

3

5

7

9

11

13

15

17

19

2 (

%)

2 3 4 5 6 7 8 9 10

5 (%)

1

3

5

7

9

11

13

15

17

19

2 (

%)

2 3 4 5 6 7 8 9 10

5 (%)

Chemotype III=0

= 1

= 9= 10

= 8

= 7

= 5

= 3

Developpment of . chemotype III Chemotype IV=0

= 1

= 2= 3

= 5= 6

= 7

= 9

= 10

= 8

= 4

Developpment of . chemotype IV

Chemotype II=0

= 1= 2

= 3

= 4

= 9

=10

Developpment of . chemotype II

Chemotype I=0

= 1

= 2

= 3

= 4= 7

= 5

= 6

= 10

Developpment of

chemotype I

Chemotype I Chemotype II

Chemotype III Chemotype IV

Fig. 10 Visualization of the contribution values ni (0 ≤ ni ≤ 10) of each chemotype to all the mixtures, through their density ellipses (95%)within a multi-directional cloud of two flavonoids; each ellipse contained 95% of mixture points representing a same contribution (ni). Thedensity ellipses were projected on the simplex space of the relationship: kaempferol tetraglycoside (2) vs kaempferol triglycoside (5). For agiven chemotype (I to IV), the trajectory analysis of the eleven ellipses from ni = 0 to ni = 10 helps to interpret the metabolomic “constraints”which led to the emergence of the chemotype.

374.qxd 7/24/2007 5:15 PM Page 154

Page 17: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

Although relationships of compound (5) were notprecisely oriented, they make it possible to analyze thegradual development of chemotypes through the variationsof their contributions (weights) within the simplex space.This analysis will be presented in the next section.

Graphical analysis of the link between chemotaxonomyand metabolic plasticity

Within the tetrahedron spaces representing the scatter plotsof different flavonoids vs compound (5), each point repre-sents a mixture or a combination between the four chemo-types where each chemotype i has a contribution ni (0 ≤ ni

≤10). For a given chemotype, the more ni is near the valueof 10 the more the mixture is representative of this chemotype.In other words, if ni = 10, the resulting mixture will give anaverage HPLC profile typically representative of chemotypei. Inversely, if ni = 0, the resulting mixture will be represen-tative of a state where the chemotype i is not represented,i.e. not developed. In order to analyze how a given chemo-type i emerged from the metabolomic relationships betweenflavonoids, the mixture points representing a same contribu-tion (weight) ni (0 ≤ ni ≤ 10) were delimited by a densityellipse 95% ; consequently, for each chemotype, 11 densityellipses (ni = 0 to 10) were visualized, and their relativelocalizations were analyzed. This graphical analysis onthe development of the four chemotypes was illustrated anddiscussed on the basis of the relationship betweenkaempferol triglycoside (5) and kaempferol tetraglycoside(2) (Fig. 10):

The development of chemotype I resulted from adecrease in the regulation levels of both compounds (5) and(2) (Fig. 10). Thus, the maximal contribution of this chemo-type (ni=I = 10) was reached when compounds (5) and (2)were at their minimal regulation levels. As chemotype I rep-resents a major regulation of methylated aglycon com-pounds (11-14), a preliminary or intermediate methylationof kaempferol (and quercetin), results in a sanction of thenon-methylated flavonoids (e.g. compounds (2) and (5) )(Fig. 3, 9).

The development of chemotype II resulted from a strongdecrease of compound (5) in presence of a moderate regula-tion level of compound (2) (around 7-8%) (Fig. 10). Thischemotype represents major regulation levels of thequercetin derivatives pathway. Therefore, under the strongcompetitive effect of quercetin pathway, kaempferol path-way can submit an “inside pressure” allowing kaempferoltriglycoside (5) to be transformed into tetraglycoside (2)(and/or other kaempferol tetraglycosides) resulting in a veryweak balance of (5) (Fig. 3).

The development of chemotype III was the consequenceof a strong decrease of compound (5) in favor of a strongincrease of compound (2) until a maximal regulationlevel (Fig. 10). Thus, the maximal contribution value ofchemotype III (ni = III = 10) was observed when compound(2) reached a regulation level around 18%. ChemotypeIII represented a major regulation of compound (2) whichhad a lateral rhamnose by opposition to compound (5) (Fig. 2).Therefore, the high regulation of (2) could result from ahigh activity of rhamnosyl-transferase (Fig. 9). The expression

of this enzyme during the intermediate metabolism can beexclusively in favor of compound (2) preventing thus thebiosynthesis of compound (5). In the case where the lateralrhamnosyl substitution consisted of a final metabolic step,the rahmnosyl-transferase would strongly transform com-pound (5) to maintain it at relatively low levels. This couldbe all the more possible since compound (2) needs to be syn-thesized to give many acylated derivatives (6-10).

Finally, the development of chemotype IV would be theresult of an important increase in the regulation levels ofcompound (5) (>9%) under a moderate regulation level ofcompound (2) (9%) (Fig. 10). This confirms the hypothesisthat compound (5) would be accumulated when its deriva-tives (e.g. (2) ) presented relatively low regulation levels,e.g. due to a low activity of rhamnosyl-transferase for thelateral rhamnose substitution (Fig. 9).

References

Bohm B (1998) Introduction of Flavonoids. Singapore: HardwoodAcademic Publisher

Camacho D, de la Fuente A, Mendes P (2005) The origin ofcorrelations in metabolomics data. Metabolomics 1 : 53–63

Droesbeke JJ, Fine J, Saporta G (1997) Plans d’expériences:applications à l’entreprise. Paris: Technip

Fiehn O, Kopka J, Dörmann P, Altmann T, Trethewey RN,Willmitzer L (2000). Metabolite profiling for plant functionalgenomics. Nature Biotechnol 18: 1157–1161

Harborne JB, Baxter H (1999) The Handbook of NaturalFlavonoids. Chichester: Wiley & Son

Harborne JB. (1994) Introduction to Ecological Biochemistry.London: Academic Press

Jaakola L, Määttä K, Pitillä AM, Törrönen R, Kärenlampi S,Hohtola A (2002) Expression of Genes Involved inAnthocyanin Biosynthesis in relation to anthocyanin, proto-cyanidin and flavonol levels during billerry fruit development.Plant Physiol 130 : 729–739

Jerrold HZ (1998) Biostatistical analysis fourth edition. USA:Prentice Hall

Kanoun M, Goulas MJP, Biolley, JP (2001) Effect of a chronic andmoderate ozone pollution on the phenolic pattern of beanleaves Phaseolus vulgaris L. cv Nerina: relations with visibleinjury and biomass production. Biochem Sys Ecol 29:443–457

Kose F, Weckwerth W, Linke T, Fiehn O (2001) Visualizing plantmetabolomic correlation networks using clique-metabolitematrices. Bioinformatics 17: 1198–1208

Lacaille-Dubois MA, Wagner H. (1996) Composés phénoliquesdes végétaux. Acta bot. Gallica 143: 555–562

Lindon JC, Holmes E, Nicholson JK (2006) MetabonomicsTechniques and Applications to Pharmaceutical Research &Development. Pharmaceutical Research 23: 1075–1088

Mendes P, Kell DB (1998) Non-linear optimization of biochemicalpathways: applications to metabolic engineering and parame-ter estimation. Bioinformatics 14: 869–883

Olsson LC, Veit M, Weissenböck G, Bornman JF (1998)Differential flavonoïd response to enhanced UV-B radiation inBrassica napus. Phytochemistry 49: 1021–1028

Oteiza PI, Erlejman AG, Verstraeten SV, Keen CL, Fraga CG(2005) Flavonoid-membrane interactions : A protective roleof flavonoids at the membrane surface? Clinical andDevelopmental Immunology 12: 19–25

Sado G, Sado MC (1991) Les plans d’expériences, de l’expéri-mentation à l’assurance qualité. Paris : Afnor Technique

SAS Institute Inc., 1997. JMP 3.2. SAS Institute, Carry, NorthCarolina

Scheffé H (1958) Experiments with Mixtures. J R Stat Soc SeriesB 20: 344–360

Vol. 17, 2007 A new approach to graphical and numerical analysis 155

374.qxd 7/24/2007 5:15 PM Page 155

Page 18: A new approach to graphical and numerical analysis of links between plant chemotaxonomy and secondary metabolism from HPLC data smoothed by a simplex mixture design

156 N. Semmar, M. Jay and S. Nouria CHEMOECOLOGY

Scheffé H (1963) The simplex-centroid design for experimentswith mixtures. J R Stat Soc Series B 25: 235–263

Semmar N, Jay M, Chemli R (2001a) Chemical diversificationtrends in Astragalus caprinus (Leguminosae), based on theflavonoid pathway. Biochem Sys Ecol 29: 727–738

Semmar N, Fenet B, Lacaille-Dubois M, Glucoff-Fiasson K,Chemli R, Jay M (2001b) Two new glycosides fromAstragalus caprinus. J Nat Prod 64: 656–658

Semmar N, Fenet B, Glucoff-Fiasson K, Hasan A, Jay M. (2002a)Four New Flavonol Glycosides from the leaves of Astragaluscaprinus. J Nat Prod 65: 576–579

Semmar N, Fenet B, Glucoff-Fiasson K, Compte G, Jay M (2002b)Four New acylated kaempferol glycosides in the leaves ofAstragalus caprinus. Chem Phar Bull 50: 981–984

Semmar N, Jay M., Farman M, Chemli R (2005) Chemotaxonomicanalysis of Astragalus caprinus (Fabaceae) based on theflavonic patterns. Biochem Sys Ecol 33: 187–200

Shi YQ, Fukai T, Sakagami H, Chang WJ, Yang PQ, Wang FP,Nomura T. (2001) Cytotoxic Flavonoids with IsoprenoidGroups from Morus mongolica1. J Nat Prod 64: 181–188

William B. J. (2006) The Origin of the Liebig Condenser Jensen.Chem Educ 83: 23

Winkel-Shirley B. (2002) Biosynthesis of flavonoids and effects ofstress. Curr Opin Plant Biology 5: 218–223

Received 26 August 2006; accepted 22 February 2007.Published Online First 12 June 2007.

/chemo

374.qxd 7/24/2007 5:15 PM Page 156