6
Statistical Training and Research / Formation et Recherche Statistiques Author(s): J. Nelder Source: International Statistical Review / Revue Internationale de Statistique, Vol. 40, No. 3 (Dec., 1972), pp. 384-388 Published by: International Statistical Institute (ISI) Stable URL: http://www.jstor.org/stable/1402476 . Accessed: 10/06/2014 20:15 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . International Statistical Institute (ISI) is collaborating with JSTOR to digitize, preserve and extend access to International Statistical Review / Revue Internationale de Statistique. http://www.jstor.org This content downloaded from 185.44.79.127 on Tue, 10 Jun 2014 20:15:24 PM All use subject to JSTOR Terms and Conditions

Statistical Training and Research / Formation et Recherche Statistiques

Embed Size (px)

Citation preview

Page 1: Statistical Training and Research / Formation et Recherche Statistiques

Statistical Training and Research / Formation et Recherche StatistiquesAuthor(s): J. NelderSource: International Statistical Review / Revue Internationale de Statistique, Vol. 40, No. 3(Dec., 1972), pp. 384-388Published by: International Statistical Institute (ISI)Stable URL: http://www.jstor.org/stable/1402476 .

Accessed: 10/06/2014 20:15

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

International Statistical Institute (ISI) is collaborating with JSTOR to digitize, preserve and extend access toInternational Statistical Review / Revue Internationale de Statistique.

http://www.jstor.org

This content downloaded from 185.44.79.127 on Tue, 10 Jun 2014 20:15:24 PMAll use subject to JSTOR Terms and Conditions

Page 2: Statistical Training and Research / Formation et Recherche Statistiques

Int. Stat. Rev., Vol. 40, No. 3, 1972, pp. 384-388/Longman Group Ltd/Printed in Great Britain

Statistical Training and Research

Formation et Recherche

Statistiques

International International conference on characterizations of probability distributions and their applications to theoretical statistics and applied fields An international conference on characterizations of probability distributions and their applications to theoretical statistics and applied fields is tentatively planned for the summer of 1973 in Europe or in the United States at the Pennsylvania State University. The exact time and place will be fixed later.

The purpose of the conference is to bring together research workers investigating problems that have motiva- tion in scientific concepts and formulations or that have application or potential use for statistical theory and applied fields. It is tentatively planned that the conference will consist of two parts.

Part I will consist of ten consolidated review sessions to be held in the mornings. Each topic selected for a consolidated review session will have two lectures followed by discussion. Colleagues who have had considerable experience and expertise in subject areas of characterizations will be invited to present unified, self-contained and critical reviews with suggestions for future directions.

Part II of the conference will be devoted to new contributions to the field of characterizations and applications. Afternoon sessions are planned on for this purpose. Any new contribution will be welcomed.

Plans are being made for publication of the proceedings in a suitable form. For improved interaction and quality, every potential presentation will have the benefit of advice and review from a couple of referees and/or pre-planned discussants.

An organizing committee has been formed and is likely to be expanded. The following colleagues have ex- pressed interest and willingness to serve on the organizing committee: W. L. Harkness (U.S.A.), I. Kotlarski (U.S.A.), Yu. V. Linnik (U.S.S.R.), E. Lukacs (U.S.A.), P. A. P. Moran (Australia), J. E. Mosimann (U.S.A.), G. P. Patil (U.S.A., Chairman), C. R. Rao (India), and Henry Teicher (U.S.A.).

Comments and suggestions are invited on any matter pertaining to the proposed conference. One or two copies of relevant literature in the form of reprints, reports, lecture notes, dissertations, books, etc., would also be welcome for display during the conference. Correspondence may be addressed to: Professor G. P. Patil, Characterizations Project, 330 McAllister Building, The Pennsylvania State University, University Park, Pennsylvania 16802, U.S.A.

International Liaison Commission on Statistical Ecology A Liaison Commission on Statistical Ecology has been established between the International Statistical Institute, Biometric Society and the International Association for Ecology. The purpose of the Commission is to communi- cate and coordinate activities relating to statistical ecology primarily among these organizations. Communi- cation and coordination with all organizations having interest in statistical ecology and/or in ecological statistics would also be explored and welcomed. While the Commission is expected to be enlarged, the present members of the Commission are: D. G. Chapman, D. R. Cox, G. P. Patil (Chairman), C. R. Rao and J. G. Skellam. One of the first functions of the Commission is to develop programme suggestions for the World Congress of Ecology being organized by the International Association for Ecology and scheduled for September 1974 in the Netherlands. The Commission has begun its activities by co-sponsoring the important ongoing research training programme of the Advanced Institutes on Statistical Ecology around the World initiated by the Statistical Ecology Section of the International Association for Ecology.

Suggestions on future activities of the Commission would be welcome. Please write to: Professor G. P. Patil, Chairman, Liaison Commission on Statistical Ecology, The Pennsylvania State University, University Park, Pennsylvania 16802, U.S.A.

Brazil - Br sil Course on agricultural statistics The Brazilian Center of Agricultural Statistics (CBEA), of the Brazilian Institute of Statistics, sponsored a course on Agricultural Statistics in order to improve the knowledge of statistical techniques to aid in develop- ment of the Brazilian Agricultural Statistical Program. The purpose of the course was to find out about and to discuss the systems and methods utilized in some countries in the sector of agricultural statistics and to prepare

This content downloaded from 185.44.79.127 on Tue, 10 Jun 2014 20:15:24 PMAll use subject to JSTOR Terms and Conditions

Page 3: Statistical Training and Research / Formation et Recherche Statistiques

385

a team of technical personnel of the CBEA, which will be in charge of implementing the future Basic National Plan for Improvement of Agricultural Statistics.

The Course, which began 5 November 1969, and lasted until 8 December, was held in the IBGE auditorium. It was divided into two stages, the first a general examination of all items in the session's technical documents

in the form of a seminar and the second, sessions of working groups, each of which was responsible for the study of specific items of the same document proposed in order to present their solutions together before the entire session.

In the closing session of the course, the Director of the CBEA spoke, recounting the history of the operations of the Center and stating what it planned to achieve in 1970.

Source: Revista Brasileira de Estatistica, year 30, No. 120, Rio de Janeiro, October-December 1969

Dominican Republic - R6publique Dominicaine ONE's training programmes As a supplement to its work in the field of investigation, the National Statistical Office (ONE) has maintained a constant interest in the training not only of the personnel in its direct service but also of other agencies that collaborate or participate in statistical work. During the last few years it has sponsored or taken charge of the conduction of various training courses.

In May of 1968 training courses were conducted simultaneously in the ONE and in the Municipal Office of the District of Sabana Grande de Palenque, for personnel that would participate in the experimental census of that District.

Between 17 and 27 November 1969 in the City of Santo Domingo a Training Course on the 1970 Population and Housing Census was carried out for officers of the ONE, teachers, employees of the administration and representatives from the governors' offices of the principal provinces, all of whom later taught similar courses for enumerators and team leaders. Through this programme around 35,000 persons were trained over the entire country.

The ONE conducted an intensive programming course from October to December 1969 for personnel entrusted with the direct responsibility for the organization, direction and management of the Computer Center.

As a step previous to the planning and programming of the Agricultural Census, the personnel of the Division responsible for that census attended a course on Programming of Activities by the Method of the Critical Approach. At the same time, in relation to this census, a course on Enumeration Techniques was conducted and attended by professionals of the Secretariat of State of Agriculture, the Agricultural Bank, University Geo- graphic Institute and the ONE.

The training course for UNIVAC key-punch operators was conducted from May to June 1970. In 1968 and 1969 the ONE conducted training courses for field personnel and persons in charge of data pro-

cessing in relation to the Survey on Family Income and Expenditures in the City of Santo Domingo. Officers of the Institute of Social Development and the ONE selected for their experience in this type of work attended. Personnel for the demographic survey carried out by ONE since 1969 in collaboration with the National Service for Eradication of Malaria also received special training.

Source: Acontecimientos Mds Sobresalientes de la ONE, Octubre 1967 - Enero 1971, ONE, January 1971

Venezuela Seminar on statistics For the purpose of increasing the interchange and technical cooperation among professional and technical organizations that play a part in the national statistical process, and evaluating experiences in the solution of different statistical problems and developing policies that serve as a basis for reorientation of the national statistical service, the First Venezuelan Seminar on Statistics was held in Caracas, 6-11 October 1969, at the headquarters of the College of Engineers. Six observers and 114 delegates attended the Seminar, which was organized by the College of Statisticians and Actuaries of Venezuela, the Central University, the Venezuelan Society of Statistics and the Ministry of Development, through the General Bureau of National Statistics and Censuses with the cooperation of other public and private organizations.

The work of the Seminary was distributed among three working committees: first, the Committee on National Statistical Policy, which included two sub-committees - the Venezuelan Program on Basic Statistics and the Census List as a Source of Current Statistics; second, the Committee on Professional and Technical Training, which in turn functioned through two sub-committees - Orientation and Field of Application and Problems of the Statistical Profession; and third, the Committee on Studies of a Statistical nature.

The proceedings, along with recommendations and resolutions, appear in the Acta Final, which summarizes all developments.

This content downloaded from 185.44.79.127 on Tue, 10 Jun 2014 20:15:24 PMAll use subject to JSTOR Terms and Conditions

Page 4: Statistical Training and Research / Formation et Recherche Statistiques

386

Among other things, the First Seminar recommended the designation of a committee responsible for pre- paring the Minimal Program of Basic National Statistics, utilizing as a working document the Inter-American Program of Basic Statistics (PIEB) and the initiation of efforts designed to update the National Statistical Inventory. It also recommended requesting the Executive Branch to consider a new project of reform of the present National Statistics and Census Law and creating a superior and autonomous organization to be responsible for the National Statistical System; approaching the Ministry of Education and the Committee on Education of the National Congress to request their support to introduce courses on statistics in the secondary schools; and creating a group in each university to plan and direct statistical instruction.

Source: Acta Final Primer Seminario Venezolano de Estadistica

United Kingdom - Royaume-Uni Rothamsted Experiment Station: Statistics Department in 1970 The department continued to build up a unified flexible set of computer programs for all kinds of statistical analysis, for use both inside and outside the department. Work on statistical methodology goes on alongside much practical application of statistical techniques to a great range of problems, originating both inside and outside the Station. Long-established collaboration with the agricultural advisory service of the Ministry of Agriculture continued, and the service for Commonwealth agricultural institutes was expanded.

Statistical programming The Genstat system of statistical programs (described in the 1969 Report) was further developed and tested. The new 4-70 computer was not usable until November, so all testing had to be done on the IBM 360 machine at Edinburgh through our direct link. The putting together in one system of many component parts programmed independently exposed some anomalies requiring correction and modification, but in general the specifications laid down for programmers worked well and the system was assembled quite smoothly.

A user's manual was given a limited distribution and useful comments have been received and acted on. The standardizing of conventions governing the presentation of data and accompanying instructions by the user greatly diminished the bulk of documentation of our programs. The manual will contain an appendix of worked examples with the associated output, to supplement the formal description. The facilities described in the manual cover:

The GENSTAT language. This is a flexible language oriented towards statistics, with a free format and controlled by a set of directives, such as "PRINT" or "REGRESS", defining the different operations. All data are described by standard structures, which can be named by the user and instructions are translated by a compiler into a coded form that drives the complementary programs.

Input-output. The "READ" directive allows data to be accepted in many different forms, in fixed or free format, serially or in parallel, on one card or many, with or without missing values, etc.

The "PRINT" directive similarly provides for the printing out of results. Standard formats and layouts are defined, but the user may modify these, e.g. to permute the ordering of tables, restrict the page width, suppress labelling, etc.

Both directives have been extensively used, tested by many people, and continuously refined. Derived variates. Standard mathematical operations are provided, with a useful extension of the customary

notation to allow various kinds of patterned operation to be written down simply. Matrix operations and a general calculus for multiway tables are provided.

Regression. The user can specify linear models containing mixtures of qualitative and quantitative indepen- dent variables. Terms can be added, dropped, or interchanged and the best new one to add, or worst old one to drop, determined. Regressions can be constrained to go through the origin, or have an intercept.

Analysis of variance. The Wilkinson algorithm required modification to cope with the accumulation of rounding-errors that occurs in machines of the IBM 360 type. Some parts of the calculation must be done in double precision, and the necessary changes were made. The algorithm can deal with complex designs, and produces full information on standard errors in tables having many different kinds of comparison.

Classification. The user can define various coefficients of similarity between individuals with various measured characteristics, and cluster them using single-linkage cluster analysis. Operations on the similarity matrix are provided.

Multivariate analysis. Principal coordinate and principal component analysis are provided, including extensive facilities for keeping parts of the output as new data structures for future use.

Storage on disc and magnetic tape. Data, with all their associated labels and plain-language description can be stored easily as a computer file for future use; they can be recovered and merged with new data in several ways. In addition the user can inspect the list of contents of GENSTAT files. These facilities give powerful general techniques for handling data, and provide as a bonus a way of "dumping" a GENSTAT program at any stage during its execution so that it can be used later, e.g. to recover from a machine failure.

Extensions to GENSTAT. Various extensions to the system are being developed. The language is to contain a "skip" directive for conditional and unconditional jumps, logical operations in derived variates, facilities for

This content downloaded from 185.44.79.127 on Tue, 10 Jun 2014 20:15:24 PMAll use subject to JSTOR Terms and Conditions

Page 5: Statistical Training and Research / Formation et Recherche Statistiques

387

plotting graphs, ways of building up the values of one structure from parts of others, and a special directive "OWN" to allow a programmer to add his own private program to the system. The regression section is being expanded to allow for hierarchical structure of data, and to incorporate iterative weighted regression. Covariance analysis has been specified to augment the analysis of variance. Canonical variate analysis will be provided more simply after the regression extensions. A program is being developed to act as an interface between GENSTAT and the General Survey Program. It will allow tables to be moved between the two programs, so that the special facilities of each system can augment the other.

Maximum likelihood program. This program does the same thing as the Orion MLP, but offers a syntax compatible with GENSTAT. It contains a suite of minimization algorithms, with standard routines for curve and distribution fitting, probit analysis and genetic problems. The user can inserrt his own Fortran subroutines. Some techniques used in this program have been published.

A program for producing contour maps on a line printer using interpolated values from equally spaced data values was written primarily to study multi-dimensional likelihood functions, but was also used to represent pictorially the concentration patterns of chemicals injected into the soil.

Theory. The extensions to general-purpose computing languages required to meet the needs of statistical computing were considered for internal data structures and for the input/output of data. Structure-defining (extensible) languages, such as Algol 68, though providing powerful new programming tools, may not diminish overlapping programming unless statisticians (or any other class of user) define carefully their own standards for data structures.

An algorithm was developed for defining groups by removing links from a minimum spanning tree. This has useful practical applications in the analysis of large sets of data, and helps to make feasible the classification method proposed by A. W. F. Edwards, by restricting the number of splits of the data set to be examined to splits of the minimum spanning tree.

Statistical theory When measurements exist of a set of characters for a set of individuals (or species, etc.), each individual can be represented by a point in many-dimensional space, and the similarity between any two individuals can be described in terms of a distance between their corresponding points. This distance depends not only on the set of characters chosen but also on their relative scales and the particular definition of distance used. Two analyses, using different constellations of characters, give two sets of distances, and it may be asked how similar these descriptions are. An analytical technique was devised whereby one configuration of points is rotated and scaled to minimize the sum of squared distances between corresponding points (R2), the value of R2 being a measure of the discrepancy. The technique is relevant to a wide range of problems in multivariate analysis.

It adds greatly to the power of an analytical method when an initial analysis, say of principal components, using a given set of measurements on given sampling units, can be modified in a stepwise way to take account of adding or dropping a sampling unit or measurement, thus avoiding complete recalculation from the beginning. Stepwise procedures for principal components, principal coordinates, canonical variate and canonical corre- lation analysis were worked out and found to depend on a single basic algorithm.

Similarities are often calculated from multivariate ecological data recording the presence or absence of species in a set of quadrats. Such similarities do not allow for the rarity or commonness of a species correctly, and it is better for very rare or very common species to be first excluded, as these species can provide no information on possible associations.

Gower considered further problems of weighting characters in similarity coefficients, and of treating correla- tions between characters in classification techniques. Krzanowski investigated measures of distance when the material concerns relative proportions of different attributes, such as blood groups in a set of populations.

In regression models of the second kind, the parameters in a model are assumed to have a probability distri- bution, rather than the observations. Thus, in a fertilizer experiment, variation in the yields obtained might be treated as arising from variation in the parameters connecting yield to amount of fertilizer applied. The estimation of parameters from models of this kind is complicated by singularities in the likelihood function, and the nature of these is being investigated. Iterative weighted linear regression techniques show promise.

Practical applications Members of the department applied a range of statistical techniques to diverse data and the following examples indicate the scope of their work. Multivariate techniques were again in demand, and canonical analysis was applied to aphid populations (Entomology Department), nematode populations (Nematology Department) and soil profiles (Weed Research Institute). Classification methods and principal coordinate analysis were helpful in analyses of amino acid in protein (Biochemistry Department) and with botanic data collected as part of the International Biological Programme (Aberdeen University). The rotational fit technique described above was used to compare different multivariate analyses of measurement on skulls from six hominid popu- lations with eight recognizable constellations of characters. Cluster analysis was used in an ecological study of organisms in polluted waters, in the search for associated groups of organisms (Water Pollution Laboratory).

An unusual problem in simulation concerned the ordering of octahedral cations in micas, when random linking is constrained by Pauling's rules for the structures of ionic compounds (Pedology Department). The simulated structures are being investigated to try to explain some of the properties of micas.

2c

This content downloaded from 185.44.79.127 on Tue, 10 Jun 2014 20:15:24 PMAll use subject to JSTOR Terms and Conditions

Page 6: Statistical Training and Research / Formation et Recherche Statistiques

388

The patterns of diseased plants in a crop may be complex and methods are required to summarize and describe them (Plant Pathology Department). Methods for studying non-randomness in spatial patterns are being surveyed, with special attention to the derivation of the various contagious distributions.

Various curves have been proposed to describe crop response to fertilizers, but little has been done to compare the different curves on the same data. Curves based on the exponential (Mitscherlich) relation are being com- pared with Nelder's inverse polynomials and with the conjunction of two straight lines proposed by Boyd, which are representable as a limiting case of a simple extension of the inverse polynomial curves. Results from experi- ments by the Ministry of Agriculture's advisory service on the nitrogen manuring of spring barley are being used initially for this work.

Data from the National Survey of Health and Development were used in two papers on measures of educa- tional attainment and the effect of various factors on attainment of children between 8 and 18. Discriminant analysis was the principal statistical technique used.

Rothamsted, J. Nelder

This content downloaded from 185.44.79.127 on Tue, 10 Jun 2014 20:15:24 PMAll use subject to JSTOR Terms and Conditions