5
C. R. Acad. Sci. Paris, Ser. I 348 (2010) 449–453 Contents lists available at ScienceDirect C. R. Acad. Sci. Paris, Ser. I www.sciencedirect.com Statistics A semiparametric test of independence in copula models for censored data Test d’indépendance semiparamétrique dans des modèles de copule pour les données censurées Salim Bouzebda a , Amor Keziou b,a a LSTA-université Paris 6, 175, rue du Chevaleret, boîte 158, 75013 Paris, France b Laboratoire de mathématiques (FRE 3111) CNRS, université de Reims, Reims, France article info abstract Article history: Received 15 April 2009 Accepted after revision 7 February 2010 Presented by Wendelin Werner We propose a semiparametric test of independence in copula models for bivariate survival censored data. We give the limit laws of the estimate of the parameter and the proposed test statistic under the null hypothesis of independence. © 2010 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved. résumé Nous proposons un test d’indépendance dans des modèles de copule dans le cadre des données censurées. Nous obtenons les lois asymptotiques, de l’estimateur et de la statistique de test proposés, lorsque le paramètre est un point frontière de son domaine. © 2010 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved. Version française abrégée Nous développons une procédure d’estimation semiparamétrique à deux étapes pour le paramètre d’association dans des modèles de copule dans le cadre des données censurées, suivant la procédure d’estimation proposée par [16]. Nous établis- sons les propriétés asymptotiques de l’estimateur du pseudo maximum de vraisemblance lorsque le paramètre appartient à la frontière de son domaine. Nous montrons que les lois limites classiques ne sont plus vérifiées. Nous proposons la sta- tistique du rapport de vraisemblance généralisée pour tester l’indépendance. Nous obtenons la loi limite de la statistique proposée sous l’hypothèse nulle d’indépendance des marges. Ce travail constitue une extension des résultats de [3,2] au cas des données censurées. 1. Introduction and motivations Many useful multivariate models for dependence between failure times T 1 and T 2 turn out to be generated by parametric families of copulas of the form {C θ : θ Θ}, typically indexed by a parameter θ Θ R (see, e.g., [10,11,7]). One advantage of copula models is that the margins are not specified and do not depend on the choice of the dependency structure, which allows to estimate the dependency and the margins separately. The reader may refer to the following books for excellent E-mail addresses: [email protected] (S. Bouzebda), [email protected] (A. Keziou). 1631-073X/$ – see front matter © 2010 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved. doi:10.1016/j.crma.2010.02.013

A semiparametric test of independence in copula models for censored data

Embed Size (px)

Citation preview

Page 1: A semiparametric test of independence in copula models for censored data

C. R. Acad. Sci. Paris, Ser. I 348 (2010) 449–453

Contents lists available at ScienceDirect

C. R. Acad. Sci. Paris, Ser. I

www.sciencedirect.com

Statistics

A semiparametric test of independence in copula models for censoreddata

Test d’indépendance semiparamétrique dans des modèles de copule pour les donnéescensurées

Salim Bouzebda a, Amor Keziou b,a

a LSTA-université Paris 6, 175, rue du Chevaleret, boîte 158, 75013 Paris, Franceb Laboratoire de mathématiques (FRE 3111) CNRS, université de Reims, Reims, France

a r t i c l e i n f o a b s t r a c t

Article history:Received 15 April 2009Accepted after revision 7 February 2010

Presented by Wendelin Werner

We propose a semiparametric test of independence in copula models for bivariate survivalcensored data. We give the limit laws of the estimate of the parameter and the proposedtest statistic under the null hypothesis of independence.

© 2010 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved.

r é s u m é

Nous proposons un test d’indépendance dans des modèles de copule dans le cadredes données censurées. Nous obtenons les lois asymptotiques, de l’estimateur et de lastatistique de test proposés, lorsque le paramètre est un point frontière de son domaine.

© 2010 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved.

Version française abrégée

Nous développons une procédure d’estimation semiparamétrique à deux étapes pour le paramètre d’association dans desmodèles de copule dans le cadre des données censurées, suivant la procédure d’estimation proposée par [16]. Nous établis-sons les propriétés asymptotiques de l’estimateur du pseudo maximum de vraisemblance lorsque le paramètre appartientà la frontière de son domaine. Nous montrons que les lois limites classiques ne sont plus vérifiées. Nous proposons la sta-tistique du rapport de vraisemblance généralisée pour tester l’indépendance. Nous obtenons la loi limite de la statistiqueproposée sous l’hypothèse nulle d’indépendance des marges. Ce travail constitue une extension des résultats de [3,2] au casdes données censurées.

1. Introduction and motivations

Many useful multivariate models for dependence between failure times T1 and T2 turn out to be generated by parametricfamilies of copulas of the form {Cθ : θ ∈ Θ}, typically indexed by a parameter θ ∈ Θ ⊆ R (see, e.g., [10,11,7]). One advantageof copula models is that the margins are not specified and do not depend on the choice of the dependency structure, whichallows to estimate the dependency and the margins separately. The reader may refer to the following books for excellent

E-mail addresses: [email protected] (S. Bouzebda), [email protected] (A. Keziou).

1631-073X/$ – see front matter © 2010 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved.doi:10.1016/j.crma.2010.02.013

Page 2: A semiparametric test of independence in copula models for censored data

450 S. Bouzebda, A. Keziou / C. R. Acad. Sci. Paris, Ser. I 348 (2010) 449–453

expositions of the basics of copula theory: [13] and [8]. In order to estimate the unknown true value of the parameterθ ∈ Θ , which we denote, throughout the sequel, by θT ∈ Θ , some semiparametric estimation procedures, based on themaximization, on the parameter space Θ , of properly chosen pseudo-likelihood criterion, have been proposed by [14] andstudied by [6,16,18,17] among others. In each of these papers, some asymptotic normality properties are established for√

n(θn − θT ), where θn denotes a properly chosen estimator of θT . This is achieved, provided that θT lies in the interior,denoted by Θ , of the parameter space Θ ⊆ R. The case where θT ∈ ∂Θ := Θ −Θ is a boundary value of Θ , has been studiedin [2,3] in the case of complete data. However, the case of censored data has not been studied systematically until presentwhen θT is a boundary value of the parameter space. Denote θ0 the boundary value of the parameter and assume withoutloss of generality that the parameter space Θ is of the form Θ := [θ0,+∞[. Note that the case of the boundary valueθT = θ0 is very interesting since it corresponds to the hypothesis of independence of the margins for the majority of copulasmodels; see, e.g., [13] and [8]. Motivated by all this, we study the asymptotic properties of the maximum pseudo-likelihoodestimate when θT = θ0. We propose also a test of independence of margins based on the generalized pseudo-likelihood ratiostatistic, and we give its limit law under the null hypothesis of independence. We show in particular that the limit laws ofthe estimate and the test statistic are not classical. The problems connected to this type of “non-regularity”, for parametricmodels of densities with complete data, have been considered by several authors; see, e.g., [5,12,4,15,1].

The remainder of this Note is organized as follows. In the forthcoming section we present the estimation procedure, andwe study the asymptotic properties of the estimate under the null hypothesis of independence. In Section 3, we give thelimit law of the test statistic under independence. Section 4 reports some concluding remarks and possible developments.All proofs are postponed to Appendix A.

2. Estimation

Suppose that Cθ is a distribution function with density cθ on (0,1)2 with respect to the Lebesgue measure for anyθ ∈ Θ . Let (T1, T2) denote the paired failure times, and (S1, S2) and ( f1, f2) denote respectively the corresponding marginalsurvival functions and density functions. If (T1, T2) comes from CθT copula for some θT ∈ Θ , then the joint survival functionand density of (T1, T2) are given by

S(t1, t2) = CθT

(S1(t1), S2(t2)

), t1, t2 � 0, f (t1, t2) = cθT

(S1(t1), S2(t2)

)f1(t1) f2(t2), t1, t2 � 0. (1)

We recall the principle of the maximum pseudo-likelihood procedure studied by [16]. Let (C1, C2) denote paired censoringtimes. For j = 1, . . . ,n, i = 1,2, assume that (T1, j, T2, j) and (C1, j, C2, j) are independent and random samples with contin-uous survival function S and G , respectively. For each j, we observe Xi, j := Ti, j ∧ Ci, j and δi, j := 1{Xi, j=Ti, j} . We estimate S1

and S2 by the Kaplan–Meier estimators [9] denoted by S1,n and S2,n . For j = 1, . . . ,n, write (u j, v j) for (S1(X1, j), S2(X2, j)).Then given (u j, v j), j = 1, . . . ,n, the likelihood of θ is

n∏j=1

L(θ, u j, v j) =n∏

j=1

cθ (u j, v j)δ1, jδ2, j

∂Cθ (u j, v j)δ1, j(1−δ2, j)

∂u j

∂Cθ (u j, v j)δ2, j(1−δ1, j)

∂v jCθ (u j, v j)

(1−δ1, j)(1−δ2, j). (2)

Let �(θ, u j, v j) denote the log of L(θ, u j, v j), and Uθ (θ, u j, v j) the score function of θ , i.e., the derivative of log of (2) withrespect to θ . The semiparametric maximum likelihood estimator θn of θT is the solution of the estimating equation

Uθ (θ, S1,n, S2,n) :=n∑

j=1

∂�(θ, S1,n(X1, j), S2,n(X2, j))

∂θ= 0. (3)

The following notations will be needed. Let

Wθ (θ, u, v) := ∂�(θ, u, v)

∂θ, V θ (θ, u, v) := ∂2�(θ, u, v)

∂θ2,

V θ,1(θ, u, v) := ∂2�(θ, u, v)

∂θ∂u, V θ,2(θ, u, v) := ∂2�(θ, u, v)

∂θ∂v,

t01 := sup{

t: P (T1 > t, C1 > t) > 0}

and t02 := sup{

t: P (T2 > t, C2 > t) > 0}.

Letting

τ 21 := E

[−V θ

(θ0, S1(X1,1), S2(X1,2)

)] =∫A

−V θ

(θT , S1(t1), S2(t2)

)dHθT (t1, t2, δ1, δ2),

τ 22 := E

[{I1(X1,1, δ1,1, θT ) + I2(X1,2, δ1,2, θT )

}2] =∫ {

I1(t1, δ1, θT ) + I2(t2, δ2, θT )}2

dHθT (t1, t2, δ1, δ2), (4)

A

Page 3: A semiparametric test of independence in copula models for censored data

S. Bouzebda, A. Keziou / C. R. Acad. Sci. Paris, Ser. I 348 (2010) 449–453 451

where HθT is the joint distribution of (X1, j, δ1, j) and (X2, j, δ2, j), and A := [0, t01] × [0, t02]. For j = 1, . . . ,n, I1 and I2 aredefined by

I1(X1, j, δ1, j, θT ) :=∫A

V θ,1(θT , S1(t1), S2(t2)

)I01(X1, j, δ1, j)(t1)dHθT (t1, t2, δ1, δ2),

I2(X2, j, δ2, j, θT ) :=∫A

V θ,2(θT , S1(t1), S2(t2)

)I02(X2, j, δ2, j)(t2)dHθT (t1, t2, δ1, δ2),

where

I01(X1, j, δ1, j)(t1) := −S1(t1)

[ t1∫0

1

P (T1 � u, C1 � u)dN1, j(u) −

t1∫0

1{X1, j�u}P (T1 � u, C1 � u)

dΛ1(u)

],

I02(X2, j, δ2, j)(t2) := −S2(t2)

[ t2∫0

1

P (T2 � u, C2 � u)dN2, j(u) −

t2∫0

1{X2, j�u}P (T2 � u, C2 � u)

dΛ2(u)

],

and

Ni, j(u) := 1{Xi, j�u,δi, j=1}, Λi := log Si, i = 1,2; j = 1, . . . ,n.

Under conditions (C.1)–(C.2) below, and when θT is an interior point of Θ , [16] show that, as n → ∞,

√n(θn − θT ) → N

(0, τ 2) (5)

in distribution with variance τ 2 := (τ 21 + τ 2

2 )/τ 41 . In the sequel, all derivatives of �(θ, ·, ·) are taken in the appropriate side.

To describe the limiting behavior of θn , we will make use of the following conditions.

(C.1) Standard regularity conditions for the parametric maximum likelihood estimate;(C.2) Wθ (θ, S1(t1), S2(t2)), V θ (θ, S1(t1), S2(t2)), V θ,1(θ, S1(t1), S2(t2)) and V θ,2(θ, S1(t1), S2(t2)) are continuous and

bounded for (t1, t2) ∈ A.

The asymptotic properties of θn can then be summarized as follows.

Theorem 2.1. Let the conditions (C.1)–(C.2) be fulfilled. Whenever θT = θ0 , is on the boundary of Θ := [θ0,∞). Then, as n → ∞, wehave the convergence in distribution

√n(θn − θT )

d→ Z+ := Z1{Z>0}, (6)

where Zd:= N(0, σ 2) denotes a centered normal random variable with variance σ 2 := 1/τ 2

1 .

Remark 1. The asymptotic variance σ 2 in Theorem 2.1 may be consistently estimated by its empirical counterpart, as wasdone in [16, pp. 1389–1390]. Specifically, it may be obtained by replacing HθT by its empirical distribution function Hn , andS1, S2 and θT by S1,n , S2,n and θn .

Remark 2. When θT approaches the value corresponding to independence, i.e., θT → θ0, by integration by parts and bynothing that

E(Wθ (θ0, u, v)∂�(θ0, u, v)/∂u

) = E(Wθ (θ0, u, v)∂�(θ0, u, v)/∂v

) = 0,

[16] showed that I1 and I2 converge to zero. By all this, at the independence τ 22 converges to zero, which implies that τ 2

tends to 1/τ 21 , hence, θn is asymptotically efficient estimate of θT when θT approaches θ0.

3. Test of independence

In this section, we consider the independent test problem of margins in the previously considered parametric copulamodels. The null hypothesis to be tested is

H0: CθT (u1, u2) = u1u2 for all u1, u2 ∈ (0,1),

Page 4: A semiparametric test of independence in copula models for censored data

452 S. Bouzebda, A. Keziou / C. R. Acad. Sci. Paris, Ser. I 348 (2010) 449–453

which is equivalent to H0: θT = θ0 where θ0 is the boundary value of the parameter space Θ . The alternative hypothesis,H1: θT �= θ0, is naturally composite. The corresponding generalized pseudo-likelihood ratio statistic is given by

Tn := Tn(θ0, θn) := 2n∑

j=1

�(θn, S1,n(X1, j), S2,n(X2, j)

) − 2n∑

j=1

�(θ0, S1,n(X1, j), S2,n(X2, j)

).

The following theorem gives the limiting law of the statistic Tn under H0.

Theorem 3.1. Assume that the conditions of Theorem 2.1 hold. Then, under the null hypothesis H0 , the statistic Tn converges in

distribution, as n → ∞, to the random variable W 21{W >0} , where Wd:= N(0,1) is a standard normal random variable.

Remark 3. An application of Theorem 3.1, leads to reject the null hypothesis of independence H0: θT = θ0, whenever thevalue of the statistic Tn exceeds q1−α , namely, the (1 − α)-quantile of the law of the random variable W 21{W >0} . Thecorresponding test is then, asymptotically of level α, when n → ∞. The critical region is, accordingly, given by

CR := {Tn > q1−α}.

4. Concluding remarks and possible developments

We have addressed the problem of testing the independence of margins in parametric copula models, with unknownand nonparametric margins, for censored data. For the majority of copula models, the value of the parameter correspondingto the null hypothesis of independence is a boundary value of the parameter space. We have derived the limit law of thesemiparametric likelihood statistic under the null hypothesis; it is shown that the limit law of the generalized pseudo-likelihood ratio statistic is a mixture of chi-square law with one degree of freedom and Dirac measure at zero. A test ofindependence, based on this statistic, is then proposed. It would be interesting to study the asymptotic properties of thestatistic under the alternative hypothesis and its optimality in some sense.

Acknowledgements

The authors are grateful to the referees for their useful suggestions and constructive criticisms on earlier drafts of thiswork.

Appendix A

Proof of Theorem 2.1. Using similar arguments as in [15], we can show that√

n(θn − θT ) = O P (1) when θT is an interior ora boundary point of its domain Θ := [θ0,∞[. At the independence, i.e., when θT = θ0, by a Taylor expansion, we obtain forany θ satisfying θ − θ0 = O P (1/

√n ),

1

n

n∑j=1

�(θ, S1,n(X1, j), S2,n(X2, j)

) = 1

n

n∑j=1

�(θ, S1(X1, j), S2(X2, j)

) + oP (1/n), (7)

and θn − θn = oP (1/√

n), where θn is the parametric maximum likelihood, i.e.,

θn := arg maxθ∈Θ

1

n

n∑j=1

�(θ, S1(X1, j), S2(X2, j)

). (8)

Furthermore, we can write by a Taylor expansion

1

n

n∑j=1

�(θ, S1(X1, j), S2(X2, j)

) − 1

n

n∑j=1

�(θ0, S1(X1, j), S2(X2, j)

)

= −τ 21

(Zn − (θ − θ0)

)2 + n−2τ−21 Uθ (θ0, S1, S2)

2 + O P (1)|θ − θ0|3,where Zn := n−1τ−2

1 Uθ (θ0, S1, S2). Since Θ = [θ0,+∞) is a convex set, making use of [15, Lemma 1], it holds that θn −θn =oP (1/

√n ), where

θn := arg max−(Zn − (θ − θ0)

)2τ 2

1 . (9)

θ∈Θ
Page 5: A semiparametric test of independence in copula models for censored data

S. Bouzebda, A. Keziou / C. R. Acad. Sci. Paris, Ser. I 348 (2010) 449–453 453

Observe that the maximum of the quadratic positive function (9) is achieved at θn satisfying√

n(θn − θ0) = √nZn1{√nZn>0} .

Hence, the limit distribution of√

n(θn − θ0) is the distribution of the random variable Z1{Z>0} where Zd:= N(0, σ 2) denotes

a centered normal random variable with variance σ 2 := 1/τ 21 . �

Proof of Theorem 3.1. As above, by a Taylor expansion, we can show that

Tn(θ0, θn) = supθ∈Θ

−τ 21

(√nZn − √

n(θ − θ0))2 + oP (1), (10)

where Zn := n−1τ−21 Uθ (θ0, S1, S2). Note that the supremum of the quadratic function in (10), on Θ := [θ0,+∞), is achieved

at θ = θ0 + Zn1{Zn>0} . Hence, as n tens to infinity, the limit distribution of Tn(θ0, θn) is the distribution of W 21{W >0} , whereW is a standard normal random variable. �References

[1] D.W.K. Andrews, Estimation when a parameter is on a boundary, Econometrica 67 (6) (1999) 1341–1383.[2] S. Bouzebda, A. Keziou, A test of independence in some copula models, Math. Methods Statist. 17 (2) (2008) 123–137.[3] S. Bouzebda, A. Keziou, A new test procedure of independence in copula models via χ2-divergence, Comm. Statist. Theory Methods 39 (1) (2010)

1–20.[4] D. Chant, On asymptotic tests of composite hypotheses in nonstandard conditions, Biometrika 61 (1974) 291–298.[5] H. Chernoff, On the distribution of the likelihood ratio, Ann. Math. Statist. 25 (1954) 573–578.[6] C. Genest, K. Ghoudi, L.-P. Rivest, A semiparametric estimation procedure of dependence parameters in multivariate families of distributions,

Biometrika 82 (3) (1995) 543–552.[7] H. Joe, Parametric families of multivariate distributions with given margins, J. Multivariate Anal. 46 (2) (1993) 262–282.[8] H. Joe, Multivariate Models and Dependence Concepts, Monogr. Statist. Appl. Probab., vol. 73, Chapman & Hall, London, 1997.[9] E.L. Kaplan, P. Meier, Nonparametric estimation from incomplete observations, J. Amer. Statist. Assoc. 53 (1958) 457–481.

[10] G. Kimeldorf, A. Sampson, One-parameter families of bivariate distributions with fixed marginals, Comm. Statist. 4 (1975) 293–301.[11] G. Kimeldorf, A. Sampson, Uniform representations of bivariate distributions, Comm. Statist. 4 (7) (1975) 617–627.[12] P.A.P. Moran, Maximum-likelihood estimation in non-standard conditions, Proc. Cambridge Philos. Soc. 70 (1971) 441–450.[13] R.B. Nelsen, An Introduction to Copulas, Lecture Notes in Statist., vol. 139, Springer-Verlag, New York, 1999.[14] D. Oakes, Multivariate survival distributions, J. Nonparametr. Stat. 3 (3–4) (1994) 343–354.[15] S.G. Self, K.-Y. Liang, Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions, J. Amer. Statist.

Assoc. 82 (398) (1987) 605–610.[16] J.H. Shih, T.A. Louis, Inferences on the association parameter in copula models for bivariate survival data, Biometrics 51 (4) (1995) 1384–1399.[17] H. Tsukahara, Semiparametric estimation in copula models, Canad. J. Statist. 33 (3) (2005) 357–375.[18] W. Wang, A.A. Ding, On assessing the association for bivariate current status data, Biometrika 87 (4) (2000) 879–893.