Interest in signal processing with wavelets has prompted statisticians to advance the ideas of C. Stein, C. Mallows, and M. Pinsker on estimation when the parameter space is of high or infinite dimension. This seminar treats consequences of this recent work for fitting linear models with many regressors.

It has been known since the 1950's that maximum likelihood or related estimators may "overfit" a high-dimensional parametric model, paying too much attention to bias and not enough to variance. Stein (1956) proved that the best unbiased estimator of the mean is inadmissible when the data consists of a discrete-time signal plus Gaussian white noise. In his rarely read 1966 paper, Stein demonstrated the merits of the following idea: express the data in terms of a suitable orthonormal basis; shrink the coefficients of the data towards zero so as to reduce estimated risk within a certain class of procedures; then use the reduced coefficients to construct an estimator of the mean. This approach might justly be called a signal processing method.

For fitting a linear model, Mallows (1973) discussed using the submodel fit that minimizes estimated risk (the CL criterion). Estimated risks may similarly be used to select the order of nested principal component regression or the tuning parameter in ridge regression. Each of these fitting techniques is equivalent to shrinking the least squares regression coefficients in a suitable canonical form of the linear model.

Pinsker (1980) showed that shrinkage estimators for the mean of a Gaussian process can achieve asymptotic minimaxity over ellipsoids in the parameter space among all estimators. His result distinguishes usefully among competing model-selection or shrinkage procedures.

Beran and Dümbgen (1997) studied shrinkage estimators that monotonically taper the coefficients of the data toward zero. Subject to the monotonicity constraint, the shrinkage factors are chosen to minimize estimated risk. When applied to a suitable canonical form of the linear model with many regressors, the monotone shrinkage procedure yields a better fit (smaller quadratic risk) than either principal component regression or ridge regression. Numerical implementation of monotone shrinkage relies on the PAV algorithm for isotonic regression. Fitted linear models obtained by monotone principal component shrinkage have an attractive asymptotic minimax property of Pinsker type and yield asymptotic confidence sets for the true mean vector.

The new methods for fitting linear models will be illustrated on real and artificial data.

