• ANALYSIS OF SURVIVAL DATA: A COMPARISON OF THREE MAJOR STATISTICAL PACKAGES (SAS, SPSS, BMDP)


  •   
  • FileName: tr017.pdf [read-online]
    • Abstract: 1ANALYSIS OF SURVIVAL DATA: A COMPARISON OF THREE MAJOR STATISTICAL PACKAGES (SAS, SPSS, BMDP)Corey J. Pelz and John P. Klein, Medical College of WisconsinCorey J. Pelz, Medical College of Wisconsin, 8701 Watertown Plank Rd., Milwaukee, WI 53226

Download the ebook

1
ANALYSIS OF SURVIVAL DATA: A COMPARISON OF THREE MAJOR STATISTICAL PACKAGES (SAS, SPSS, BMDP)
Corey J. Pelz and John P. Klein, Medical College of Wisconsin
Corey J. Pelz, Medical College of Wisconsin, 8701 Watertown Plank Rd., Milwaukee, WI 53226
KEY WORDS: Survival analysis, SAS, SPSS, survival times are frequently “censored.” The
BMDP survival time of an individual is said to be censored
when the endpoint of interest has not been observed.
Survival analysis techniques have become standard
tools for the statistician in medical research. The Right censoring, which is the most common form,
application of survival models to data is valid when occurs when the exact survival time is not known.
the endpoint of interest is the “time to the occurrence All that is known is that the exact survival time
of a particular event.” Survival models may be exceeds the recorded value. This type of situation
applied to a variety of fields such as biology, can occur if the subjects do not experience the event
medicine, engineering, and economics. With of interest when the study terminates or they are lost
modern computing technology, the analysis of “time- to follow-up. Such data cannot be analyzed by
to-event” data has become inexpensive in terms of ignoring the censored observations because in
time. There are several statistical packages on the general those who tend to live longer are more likely
market today that can be used to do survival to be censored.
analyses. The most commonly used packages are
SAS, SPSS, and BMDP. These three packages are Another feature of survival data is the potential for
compared based upon their capabilities, accuracy, truncation. For left truncation only subjects that
and user-friendliness as applied to survival analysis. experience a certain intermediate event are made
Example data sets are used to demonstrate standard known to the investigator. For example, if the focus
and nonstandard conditions that occur when of the study is to look at relapse of leukemia prior to
modelling survival data in each of the packages. death, left truncation occurs because only those who
Several survival analysis applications are presented experience the intermediate event (relapse) are
to determine the agreement among the three observed.
packages. Both the univariate and multivariate
survival analysis procedures are presented for each There are both parametric and nonparametric
package. techniques available to model survival data. The
parametric methods of estimation assume that the
1. INTRODUCTION probability density function of the time to a
particular event follows a specific distribution, such
The application of survival models to data is valid as the exponential distribution, while the
when the endpoint of interest is the “time to the nonparametric methods do not. The three major
occurrence of a particular event.” Survival models statistical packages (SAS, SPSS, and BMDP) are
may be applied to a variety of fields such as biology, compared for both parametric and nonparametric
medicine, engineering, and economics. An example survival analysis methods. Recommendations are
of an application in engineering is to model the time given as to when each package is superior under
it takes for a ball-bearing to wear. The focus will be both standard and nonstandard conditions. Several
on applications in biology and medicine where the datasets are analyzed by each of the packages so that
event of interest may be time to death or time to a direct comparisons can be made. These include: (1)
particular event such as relapse of a disease. The ovarian cancer data, Edmunson et al. (1979); (2) the
standard statistical techniques for data analysis are Stanford heart transplant data, Crowley and Hu
usually not applicable to survival data. First of all (1977); (3) larynx cancer data, Kardaun (1983); (4)
survival data are typically not symmetric. A breast feeding data, National Labor Survey of Youth
histogram of survival times will indicate that they (NLSY); and (5) melanoma data, Lee (1992).
tend to be positively skewed. As a result it is not
reasonable to assume data of this type to be normally
distributed. Another feature of survival data that
makes it difficult to use standard techniques is that
2
2. COMPARISON OF THE PACKAGES each of the package’s output. Both SPSS and BMDP
have built-in modules for the test for trend and to do
There are a number of similarities between stratified analyses (Klein and Moeschberger, 1996).
SAS(version 6.09), SPSS (version 5.0), and BMDP SAS/IML can be used to obtain these tests in SAS by
(version 1990) in terms of computational methods using the proper test statistics. The results agree
for survival analysis. For the most part, the three among the three packages for stratified analyses and
packages agree with one another with respect to the test for trend.
parameter estimation and calculation of available
statistical tests. Table 1 lists the procedures that are 2.2 Cox Proportional Hazards Model
found in each of the three statistical packages that
perform the major survival analysis techniques: Each of the packages fits the Cox proportional
Kaplan-Meier method (Kaplan and Meier, 1958), hazards model using the Newton-Raphson
life table methods (Gehan, 1969), Cox proportional procedure to maximize the partial likelihood. The
hazards models (Cox, 1972), and the accelerated estimated regression parameters and their standard
failure time model (Andersen, Borgan, Gill, errors are provided as standard output. SAS’s PROC
Keiding, 1993). The life table method is not PHREG has four likelihoods (Breslow, exact,
considered in this discussion since it is no longer discrete, and Efron) that adjust for ties in the
commonly used in medical applications. Each of the survival times. The Breslow likelihood is due to
packages can handle right censored data easily. The Breslow (1974). The exact likelihood is due to
major differences among the packages are Kalbfleisch and Prentice (1980). The discrete
summarized in Table 2. likelihood replaces the proportional hazards model
by the discrete logistic model. The Efron likelihood
2.1 Kaplan-Meier Estimates and Tests is due to Efron (1977). Each of these can be used to
adjust for tied survival times. They are equivalent
The Kaplan-Meier estimates of the survival function when there are no ties. When there are a large
are available in all three packages along with number of tied survival times the exact and the
standard errors of the survival function calculated by discrete likelihoods are preferred. Both perform well
Greenwood’s formula (Greenwood, 1926). The three but the discrete likelihood is computational easier
packages provide the results of the log-rank (Collett, and faster in terms of computer time. When there
1994) and the Wilcoxon tests (Gehan, 1969) for are only a few ties both the Efron and Breslow
comparing the survival of two or more groups. The likelihoods perform well. The Breslow likelihood is
Tarone-Ware test (Tarone and Ware, 1977) is usually preferred to Efron’s since it is quite straight
available in SPSS and BMDP but not in SAS. The forward to compute. The SAS default is the Breslow
Peto-Prentice (Peto and Peto, 1972) test is available likelihood. This is the only likelihood that is
only in BMDP. SPSS has the ability to calculate all available in SPSS and BMDP.
pairwise comparisons among the groups by issuing a
single command (/compare=pairwise). More work is A nice feature of each of the packages is the ability
needed in SAS and BMDP to obtain the pairwise to construct 2 df tests. SAS uses a “test” statement
results. to obtain the results of the Wald test of linear
hypotheses about the regression parameters. SPSS
SAS conflicts with SPSS and BMDP in the way the uses its “/categorical” and “/contrast” subcommands
mean survival times are calculated. All three to obtain 2 df Wald tests. In BMDP, 2 df Wald,
packages underestimate the mean when the last likelihood ratio, and score tests are available in the
observation is censored, but SAS does so with a “/TEST” paragraph using the “ELIM” statement.
larger magnitude. SAS estimates the mean only up
to the last event time, where both SPSS and BMDP Each package also allows the output of other useful
estimate the mean up to the last observation. The information such as the estimated survival and
packages agree in estimating the median survival hazard functions, the martingale residuals, and the
time and its variance. The number at risk provided number of subjects at risk for inspection or graphics.
as standard output is incorrect for each of the Each package can account for time-dependent
packages. The risk set should be determined an covariates and they can be incorporated to test the
instant before each of the event times. Therefore the proportional hazards assumption. A graphical
numbers at risk should all be increased by one in method is also available in each package to test the
3
proportional hazards assumption. SAS, SPSS, and Augmented Data:
BMDP each have procedures to do a stratified Time Censor Group Strata
analysis. The results agree among the packages. 10 1 1 10
SPSS has a feature (/missing=include) that includes 12 0 0 10
cases in the analysis that have missing values for 12 1 0 12
covariates that are specified in the model. SAS and 15 0 1 12
BMDP automatically exclude such cases from the 15 1 1 15
analysis. The results of using the 20 0 0 15
“/missing=include” subcommand are peculiar. What 20 1 0 20
SPSS does is actually use the value of the covariate
even though it is defined as missing. For example, a BMDP code using original data:
covariate that has -9 as its defined missing value is /REGRESS
used in the analysis as a -9. We recommend not ADD = treat.
using this option. Use the default option AUX = group, enter.
“/missing=exclude”. These results agree with those PASS = 2.
of SAS and BMDP using Breslow’s likelihood. /TEST
ELIM = treat.
STAT = wald, lratio, score.
2.3 Left Truncated Data /FUNCTION
LEVEL = 1.
Left truncated data frequently arises in the analysis if (enter gt time) then LEVEL=0.
of medical data. If the truncation is ignored your treat=group.
results may be severely biased (Klein and Zhang, /END
1996).
The results of running a stratified Cox model on the
Only BMDP has a built-in routine to handle left “augmented data” are identical to the results
truncation. BMDP uses a time-dependent strata to obtained using the “original data” in BMDP’s 2L.
modify the risk sets to adjust for left truncation. It The method is not very practical at this time. If the
uses the “PASS” and “LEVEL” commands together augmented data set can be constructed in SAS/IML
to modify the risk sets. We have discovered a the method will be very useful to the SAS user.
method that allows analysis of left truncated data in
SAS. An augmented data set is constructed where at
each death time a record is added for each individual 2.4 Model Building
who has entered the study at an earlier time. The
censoring variable is set to zero and a variable There are automated variable selection routines
named “strata” which is equal to the death time is (forward, backward, and stepwise) available in each
included. A stratified Cox model is used stratifying of the statistical packages for model building. The
on the constructed variable “strata”. Both the SAS routines are very similar among the packages and
and BMDP code for left truncated data follows: tend to give the same results in terms of a final
model. A nice feature of SPSS is the ability to use a
SAS code using augmented data: 2 df Wald test to determine if a block of levels of a
proc phreg; covariate should enter the model. This feature is
strata strata; superior to those in SAS and BMDP where only 1 df
model time*censor(0)=group; tests are available during the model-building
run; process.
Original Data: 2.5 Nonstandard Conditions
Time Censor Group Enter
10 1 1 8 There are several nonstandard conditions that may
12 1 0 6 arise when modelling survival data. Two of the most
15 1 1 11 commonly encountered ones are: 1) all the events
20 1 0 14 times of one level of a covariate occur before the first
event time of another level and 2) one covariate is a
4
linear combination of another covariate. Under the toward biomedical applications such as left
first condition, the estimate of the regression truncation.
parameter is ± infinity. The second condition leads
to a singular matrix that is not invertible and ACKNOWLEDGMENTS
therefore the regression parameters cannot be
estimated. The results of how each package handles Research supported by contract 5 R01 CA54706-05
the nonstandard conditions differs among the from the National Cancer Institute and the U.S.
packages. Both SPSS and BMDP give warning Army Medical Research and Development
messages that one of the nonstandard conditions is Command.
present. SAS provides results without providing any
information that a nonstandard condition is present.
For example, when condition 1 is present, SAS BIBLIOGRAPHY
provides the results of the estimated regression
parameters after 15 iterations. These are not the Andersen, P.K., Borgan Ø., Gill, R.D., and Keiding
correct estimates because the parameter estimate for N. (1993). Statistical Models Based on Counting
the covariate with the condition present is diverging. Processes. Springer-Verlag, New York.
This can be seen by using the “/itprint” option, but
no warning message is given. BMDP provides BMDP Statistical Software Manual, Volume 2
references that provide information on how to (1992). University of California Press, California.
remedy the nonstandard conditions.
Breslow, N.E. (1972). Contribution to the discussion
2.6 The Accelerated Failure Time Model of a paper by D.R. Cox. Journal of the Royal
Statistical Society, B, 34. 216-7.
Only SAS and BMDP allow the use of the
accelerated failure time model. There is an Collett, D. (1994). Modelling Survival Data in
agreement in the results. SAS allows the use of the Medical Research. Chapman and Hall, London.
generalized gamma distribution. It can be useful is
choosing which underlying distribution to use to Cox, D.R. (1972). Regression models and life tables
model the data. One must be careful in interpreting (with discussion). Journal of the Royal Statistical
the results of both packages, because the estimates Society, 74, 187-220.
that are provided are for the transformed logarithm
of survival time. You will want to transform the Cox, D.R. (1975). Partial likelihood. Biometrika, 62,
estimates back to their original units. This can be 269-76.
done using the delta method.
Crowley, J. and Hu, M. (1977). Covariance analysis
of heart transplant survival data. Journal of the
3. CONCLUSION American Statistical Association, 72, 27-36.
SAS, SPSS, and BMDP are all very good packages Edmunson, J.H., Fleming, T.R., Decker, G.D.,
for analyzing survival analysis applications. Each Malkasan, G.D., Jorgensen, J.A., Jefferies, J.A.,
package has its advantages and pitfalls. There is not Webb, M.J., Kvols, L.K. (1979). Different
one identifiable superior package. Under an ideal chemotherapeutic sensitivities and host factors
situation, a researcher would want to have all three affecting prognosis in advanced ovarian carcinoma
packages available. The choice of which package to versus minimal residual disease. Cancer Treatment
buy is determined by the nature of analyses that will Reports, 63, 241-7.
be done. For the ordinary researcher any of the
packages would be sufficient. Efron, B. (1977). The efficiency of Cox’s likelihood
function for censored data. Journal of the American
SAS and SPSS both have matrix languages that Statistical Association, 72, 557-65.
allow the implementation of applications and tests
that are not a standard part of the package. Even
though BMDP does not have a built-in matrix
language, it has incorporated techniques geared
5
Gehan, E.A., (1969). Estimating survival functions
for the life table. Journal of Chronic Diseases, 21,
629-44.
Greenwood, M. (1926). The errors of sampling of
the survivorship tables. Reports on Public Health
and Statistical Subjects, number 33, Appendix 1,
HMSO, London.
Lee, E.T. (1992). Statistical Models and Methods for
Lifetime Data, Wilely, New York.
Kalbfleisch, J.D., and Prentice, R.L. (1980) The
Statistical Analysis of Failure Time Data, Wiley,
New York.
Kaplan E.L. and Meier, P. (1958). Nonparametric
estimation from incomplete observations. Journal of
the American Statistical Association, 53, 457-81.
Kardaun, O. (1983). Statistical survival analysis of
male larynx-cancer patients -- A case study.
Statistica Neerlandica, 37, 103-25.
Klein, J.P. and Zhang, M.J. (1996). Statistical
challenges in comparing chemotherapy and bone
marrow transplantation as a treatment for leukemia,
Lifetime Data: Models in Reliability and Survival
Analysis, N.P. Jewel, 175-85.
Klein, J.P. and Moeschberger M.L. (1996) Survival
Analysis. Springer-Verlag, New York (in press).
Peto, R. And Peto, J. (1972) Asymptotically efficient
rank invariant procedures. Journal of the Royal
Statistical Society, A, 135, 185-207.
SAS/STAT User’s Guide, Version 6, Volume 1
(1990), SAS Institute, Inc., Cary, NC.
SAS/STAT User’s Guide, Version 6, Volume 2
(1990), SAS Institute, Inc., Cary, NC.
SPSS for Unix, Advanced Statistics, Release 5
(1993), SPSS, Inc.
SPSS Statistical Algorithms, 2nd Edition (1993),
SPSS, Inc.
Tarone, R.E. and Ware, J, (1977). On distribution-
free tests for equality of survival distribution.
Biometrika, 64, 156-60.
6
Table 1: Listing of the procedures by survival analysis topic and statistical package.
Survival Analysis Topic SAS BMDP SPSS
Kaplan-Meier Method PROC LIFETEST 1L KM
Life Tables Method PROC LIFETEST 1L SURVIVAL
Cox Proportional Hazards Model PROC PHREG 2L COXREG
Accelerated Failure Time Model PROC LIFEREG 2L N/A*
* Not Available
Table 2. Survival analysis applications available in SAS, SPSS, and BMDP with recommendations and
comments.
Agree Recommended
Application Packages Available in results Package(s) Comments
Kaplan-Meier estimates SAS, SPSS, BMDP YES N/D** Mean survival time is
underestimated with a
greater degree in SAS
Testing equality of strata SAS, SPSS, BMDP YES SPSS SPSS : Tarone-Ware Test
and pairwise comparisons
Test for trend SAS*, SPSS, BMDP YES SPSS, BMDP Not built into SAS
Stratified tests SAS*, SPSS, BMDP YES SPSS, BMDP Not built into SAS
Estimation in the Cox model SAS, SPSS, BMDP YES N/D Each use maximum partial
likelihood estimation
Likelihoods for ties SAS N/A SAS SAS has four likelihoods
Test of regression parameters SAS, SPSS, BMDP YES BMDP BMDP has Wald, score,
and likelihood ratio tests
for 2 df tests
Time-dependent covariates SAS, SPSS, BMDP YES SAS Syntax easier in SAS
Left truncation SAS, SPSS, BMDP YES BMDP Built into BMDP
Model building SAS, SPSS, BMDP YES SPSS SPSS: 2 df Wald tests
techniques during selection process
Testing proportional SAS, SPSS, BMDP YES N/D 2 methods available in
hazards each package
Stratified analysis SAS, SPSS, BMDP YES N/D Easy to implement in
each package
Nonstandard conditions N/A*** NO BMDP BMDP: warnings and
references
Parametric regression SAS, BMDP YES SAS SAS includes
generalized gamma
distribution
* Available in the package with additional programming (SAS/IML) or calculation.
** N/D = Not Distinguishable
*** N/A = Not Applicable


Use: 0.0414