In Chapter 6 we discussed the concept of sufficient statistics at some length. For any given problem the sufficient statistic is not unique, but we saw that it is possible to define a minimal sufficient statistic, which is unique up to one-to-one transformations. It turns out that ancillary statistics are not unique either.
It would be appealing to define an analogous concept of a maximal ancillary statistic. We would like sufficient statistics to be as small as possible so as to eliminate all the irrelevant information. In contrast, we would like ancillary statistics to be as large as possible so that we are taking into account all the relevant conditioning variables.
There are some examples where a maximal ancillary statistic exists. Suppose there is no further reduction of the problem by sufficiency, so that X 1 ,. Moreover, it can be shown that this ancillary statistic cannot be expanded in any way and that any other ancillary statistic can be mapped into this, that is it is a maximal ancillary statistic. Note that the normal distribution is excluded from this discussion, or is a trivial special case of it, because in that case there is a two-dimensional minimal sufficient statistic and hence no possibility of simplifying the problem by conditioning.
However, the Cauchy distribution, for instance, is an example of a problem for which the minimal sufficient statistic is the set of order statistics. In other cases, however, maximal ancillary statistics do not exist. It is quite possible that two ancillary statistics C1 and C2 exist, but the combined statistic C1 , C2 is not ancillary. There is nothing the least bit pathological about this: it simply reflects the fact that the marginal distributions of two random variables do not determine their joint distribution. In this case, however, there is no uniquely specified course of action. Adoption of the conditionality principle implies that conditioning on either one of C1 or C2 would be superior to conditioning on neither, but it still leaves us with at least two plausible procedures and no indication of which is preferable.
Some further ideas on this point are presented in Problem 9. In general, identifying an appropriate ancillary statistic in the first place 7. As we shall see in Chapter 9, much of modern statistical theory is based on notions of inference which respect the conditionality principle without requiring explicit specification of the conditioning ancillary. If one uses power as the sole criterion for deciding between two tests, then in our example concerning laboratory testing there are at least some circumstances where one would prefer to use Procedure 1, but this may not be sensible for other reasons.
Another point of view might be to adopt a Bayesian approach. As we saw in Chapter 3, Bayesian procedures always try to minimise the expected loss based on the observed data and do not take account of other experiments that might have been conducted but were not. Thus in the situation with two machines discussed above, a Bayesian procedure would always act conditionally on which machine was actually used, so the kind of conflict that we saw between the two statisticians would not arise. However, Fisher did not accept Bayesian methods, because of the seeming arbitrariness of choosing the prior distribution, and so this would not have resolved the difficulty for him.
However, in other cases the justification for fiducial inference is much less clear-cut, and as a result there are few modern scholars who adhere to this point of view. Although many modern statisticians support the principles of the Fisherian approach including, in particular, the principle of conditionality , there is no universal agreement about how best to implement it. The joint density of X 1 ,. This is a function of W1 , W2 , so W1 , W2 is a sufficient statistic.
It is in fact the minimal sufficient statistic. Intuitively, therefore, it seems appropriate to condition the inference on the observed value of Y2 , to ensure relevance of any probability calculation to the data at hand. But, in fact, from the perspective of hypothesis testing, failure to condition can lead to illogical conclusions, as we now demonstrate. Which one is better? In such cases, there may still be reason to construct a test based on the conditional distribution of S given C.
The reason is that such a test will then be similar. The concept of similar tests has something in common with that of unbiased tests. Moreover, in many cases where part b of the definition of an ancillary statistic holds, but not part a , we can demonstrate that a test, which is UMP among all tests based on the Two-sided tests and conditional inference conditional distribution of S given C, is UMP amongst all similar tests.
Thus we have seen two quite distinct arguments for conditioning. In the first, when the conditioning statistic is ancillary, we have seen that the failure to condition may lead to paradoxical situations in which two analysts may form completely different viewpoints of the same data, though we also saw that the application of this principle may run counter to the strict Neyman—Pearson viewpoint of maximising power.
The second point of view is based on power, and shows that under certain circumstances a conditional test may satisfy the conditions needed to be UMP similar or UMPU. We also write Ti in place of ti X. Then by Lemma 5. In fact such tests do turn out to be UMPU, though we shall not attempt to fill in the details of this: the somewhat intricate argument is given by Ferguson In cases where the distribution of T1 is continuous, the optimal one-sided test will then be of the following form.
Note that, in contrast to what has been assumed throughout the rest of the chapter, in this case the distribution is discrete and therefore it may be necessary to consider randomised tests, but this point has no effect on the general principles concerning the desirability of a conditional test. We will use this terminology extensively in Chapter The main difficulty with building an entire theory around pivotal quantities is that for many problems no pivotal quantity exists.
This prompts us to seek more general constructions. Thus there is a one-to-one correspondence between confidence sets of a given level and hypothesis tests of the corresponding size. In cases where there are nuisance parameters, there is a good argument for using similar tests, because in this case the coverage level of the confidence set for the parameter of interest will not depend on the true values of the nuisance parameters.
One motivation for doing this in the context of confidence sets is as follows. In the more general case, where a confidence interval is replaced by a confidence set, the expected length of the interval is replaced by the expected measure of the set. However, one-sided tests lead to one-sided confidence intervals that is, in 7. The alternative is to construct two-sided intervals from two-sided tests. In this case, we saw that the concept of unbiasedness was useful in restricting the class of tests considered, and in that case we can often find UMPU tests.
In the confidence set context, we make the following definition. If we can find a UMPU test, and invert this to obtain an unbiased confidence set, then this test will minimise the expected measure of the confidence set among all unbiased confidence sets of a given level. Find the form of this conditional distribution. What is this conditional distribution?
Suppose instead that we condition on the outcome of the coin toss in construction of the tests. Can you verify from first principles that the test is in fact UMPU? Show that, as p and q range over [0, 1], the joint distributions of X and Y form an exponential family. In an experiment to test the efficacy of a new drug for treatment of stomach ulcers, five patients are given the new drug and six patients are given a control drug.
Of the patients given the new drug, four report an improvement in their condition, while only one of the patients given the control drug reports improvement. Is the confidence set ever empty? Some definitions and basic properties of maximum likelihood estimators are given in Section 8. In regular cases this is given by solving the likelihood equation s formed by setting the first-order derivative s of log L to 0. We establish in Section 8. Under certain circumstances the CRLB is attained exactly — there is an interesting connection with the theory of exponential families here — though in general the CRLB is only an approximate guide to what is attainable in practice.
After some preliminaries about convergence of random variables in Section 8. Detailed proofs are given only for certain one-dimensional cases, though the basic results apply in any dimension. These results are at the centre of much modern theory based on maximum likelihood estimation. Section 8.
After observing x, the likelihood 8. A number of questions immediately arise: i ii iii iv Do the likelihood equations 8. If so, is the solution unique? Is it a local maximum? To answer this we need to check second derivatives. Is it a global maximum? Each of the questions i — iv may have a negative answer, though we shall be concentrating on so-called regular problems for which the maximum likelihood estimator is indeed given by a local maximum of the log-likelihood function. For many problems it turns out that the MLE is biased though it is asymptotically unbiased and efficient see Section 8.
Example 8. Equation 8. So in this case the MLE is not given by solving the likelihood equation. In statistical inference, the objective is to draw conclusions about the underlying distribution of a random variable X , on the basis of its observed value x. The likelihood principle is a formal expression of this idea. Then, if models.
Under this principle, quantities that depend on the sampling distribution of a statistic, which is in general not a function of the likelihood function alone, are irrelevant for statistical inference. The Taylor expansion is generalised to a function of several variables in a straightforward manner. It is valid for most regular problems, but there are counterexamples!
However, there is no guarantee that any estimator exists which achieves this lower bound exactly, a point to which we return momentarily. We can argue as follows.
Essential Statistical Inference - Theory and Methods | Dennis D. Boos | Springer
The inequality in 8. Note that this is different from the natural parametrisation referred to throughout Chapters 5—7. While there have been many occasions when we have used exponential family models as convenient examples, this is the first time we have indicated that some property of a problem requires that the model be of exponential family form. For all the discussion which follows we require some regularity conditions on the family, though the exact conditions required differ from one result to the next — for example, consistency Section 8.
At a minimum the latter properties require that the MLE is given by the solution of the likelihood equation and that the conditions required for the CRLB are satisfied. These requirements exclude, for instance, Example 8. We say that p a. Moreover, the inequality in 8. Note that this is a very general argument which does not require differentiability of the log-likelihood.
For example, in the case of Example 8. This argument still says nothing about the uniqueness of such an estimator but an obvious consequence is that, if it can be shown by other means that with probability 1 the solution of the likelihood equations is unique for all sufficiently large n, then the solution defines a strongly consistent MLE. By the arguments just given in Section 8.
- Energy Conservation in the Process Industries.
- Join Kobo & start eReading today?
- Shop by category.
To show 8. In this sense, the MLE is asymptotically efficient. We have therefore established the three fundamental properties which are primarily responsible for the great popularity of the maximum likelihood method in practical statistics: consistency, asymptotic normality and asymptotic efficiency. These properties are valid under quite weak conditions, but the conditions on differentiability of the log-likelihood, validity of the CRLB etc. Dependent data problems, random processes etc. Then the multidimensional analogue of 8.
Most of them use some form of Newton or quasi-Newton optimisation. At first sight it may seem that it would be preferable to use the theoretical Fisher information matrix, if it were easy to calculate, but in fact an extensive body of theory and practice suggests that in most cases the inverse of the observed information matrix gives a better approximation to the true covariance matrix of the estimators, and this is therefore the preferred method in most applications.
An especially important reference here is Efron and Hinkley It is possible to use the results of Section 8. Instead, for certain types of hypotheses, there is a very direct and straightforward method based on likelihood ratio tests. The distribution of a likelihood ratio test statistic, when the null hypothesis is true, can in rare cases be computed exactly, but in most cases one has to resort to asymptotic theory. The result involves arguments similar to those in Section 8. With suitable reparametrisation and reordering of the components, many hypotheses can be expressed in this way.
Note that, if we have the wherewithal to calculate maximum likelihood estimates, then there is hardly any additional difficulty in calculating likelihood ratio statistics as well, since the maximisations in 8. Suppose H0 is true. Consider just the one-parameter case for simplicity. Assume that 8. Once again, the extension to the multiparameter case is essentially straightforward: some details are given in Section 8. There is yet a third procedure, which is essentially the locally most powerful test mentioned briefly in Chapter 4.
Again, the extension to the multiparameter case is straightforward: see Section 8. Chapter 9 gives a review of such procedures. Many of these tests implicitly involve conditioning on exactly or approximately ancillary statistics, so indirectly calling on arguments of the kind given in Section 7. There are many ways of testing H0. Again, it may be written as lnp if it is to be stressed that it is based on a sample of size n. Then, using the formulae for the inverse of a partitioned matrix, as given in Section 8.
The vector case follows similarly. In practice, when the three tests are evaluated numerically, they often lead to substantially different answers. There is no clear-cut theory to establish which procedure is best, but there is a substantial body of literature pointing towards the conclusion that, of the three, the likelihood ratio procedure has the best agreement between the true and asymptotic distributions.
Cox and Hinkley surveyed the literature on this field up to the time that book was published; a much more up-to-date but also much more advanced treatment is Barndorff-Nielsen and Cox In i , the number of heads has a binomial distribution, while in ii the number of tosses performed has a negative binomial distribution. What is the likelihood function for p in the two cases?
The likelihood principle would demand that identical inference about p would be drawn in the two cases. What is the asymptotic distribution of the maximum likelihood estimator? Is it consistent? Is Tn a sensible estimator in practice? This is an example of asymptotic superefficiency. This is a general feature of an asymptotically superefficient estimator of a one-dimensional parameter.
It is our primary purpose in this chapter to describe refinements to that asymptotic theory, our discussion having two main origins. One motivation is to improve on the first-order limit results of Chapter 8, so as to obtain approximations whose asymptotic accuracy is higher by one or two orders. The other is the Fisherian proposition that inferences on the parameter of interest should be obtained by conditioning on an ancillary statistic, rather than from the original model.
We introduce also in this chapter some rather more advanced ideas of statistical theory, which provide important underpinning of statistical methods applied in many contexts.
Essential Statistical Inference
Some mathematical preliminaries are described in Section 9. The concept of parameter orthogonality, and its consequences for inference, are discussed in Section 9. Section 9. Parametrisation invariance Section 9. Two particularly important forms of asymptotic expansion, Edgeworth expansion and saddlepoint expansion, are described in Section 9. The Laplace approximation method for approximation of integrals is described briefly in Section 9.
The remainder of the chapter is concerned more with inferential procedures. This formula leads to adjusted forms of the signed root likelihood ratio statistic, relevant to inference on a scalar parameter of interest and distributed as N 0, 1 to a high degree of accuracy. Conditional inference to eliminate nuisance parameters in exponential families, as discussed already in Section 7. The chapter concludes in Section 9. To illustrate the use of the notation, key results from Chapter 8 may be described in terms of this notation as follows. These results are easily verified by careful accounting, in the analysis given in Chapter 8 of the orders of magnitude of various terms in expansions.
Actually, it is necessary that these quantities are precisely of these stated orders for the results of Chapter 8 to hold. The third and fourth cumulants are called the skewness and kurtosis respectively. For the normal distribution, all cumulants of third and higher order are 0. Also, if X 1 ,. Extension of these notions to multivariate X involves no conceptual complication: see Pace and Salvan Chapter 3. Here we consider first two important general ideas, those of asymptotic expansion and stochastic asymptotic expansion. Asymptotic expansions typically arise in the following way.
We will concentrate here on asymptotic expansions for densities, but describe some of the key formulae in distribution function estimation. Now take expectations. In order to draw inferences about the parameter of interest, we must deal with the nuisance parameter. Conditional and marginal likelihoods are particular instances of pseudo-likelihood functions.
The term pseudo-likelihood is used to indicate any function of the data which depends only on the parameter of interest and which behaves, in some respects, as if it were a genuine likelihood so that the score has zero null expectation, the maximum likelihood estimator has an asymptotic normal distribution etc. It may, of course, also be difficult to find such a statistic T.
Note that we make two assumptions here about S. Note that factorisations of the kind that we have assumed in the definitions of conditional and marginal likelihoods arise essentially only in exponential families and transformation families. Outside these cases more general notions of pseudo-likelihood must be found. A particular use of the principle of parametrisation invariance is to decide between different test procedures. For example, of the three test procedures based on likelihood quantities Higher-order theory that we have described, the likelihood ratio test and the score test are parametrisation invariant, while the Wald test is not.
Extensions to the multivariate and discrete cases are straightforward and are summarised, for example, by Severini Chapter 2. The asymptotic expansion 9. The leading term in the Edgeworth expansion is the standard normal density, as is appropriate from CLT. The remaining terms may be considered as higher-order correction terms. In particular, Edgeworth approximations tend to be poor, and may even be negative, in the tails of the distribution, as x increases.
Integrating the Edgeworth expansion 9. Further details and references are given by Severini Chapter 2. The derivation of the Edgeworth expansion stems from the result that the density of a random variable can be obtained by inversion of its characteristic function. For details, see Feller Chapter In essence, the Edgeworth expansion 9. The saddlepoint expansion is quite different in form from the Edgeworth expansion. The leading term in 9. This suggests that the main correction for skewness has been absorbed by the leading term, which is in fact the case. Observe that, crucially, the saddlepoint expansion is stated with a relative error, while the Edgeworth expansion is stated with an absolute error.
Table of contents
The approximation obtained from the leading term of 9. In particular, the saddlepoint approximation tends to be much more accurate than an Edgeworth approximation in the tails of the distribution. In distributions that differ from the normal density in terms of asymmetry, such as the Gamma distribution, the saddlepoint approximation is extremely accurate throughout the range of x.
For scalar random variables this happens only in the case of the normal, Gamma and inverse Gaussian distributions. The latter will be considered in Chapter The saddlepoint approximation is usually derived by one of two methods. The first Daniels, uses the inversion formula 9. We sketch instead a more statistical derivation, as described by Barndorff-Nielsen and Cox Then 9. It is not easy to integrate the right-hand side of the saddlepoint approximation 9. The other cases may be treated in a similar manner.
The Laplace approximations are particularly useful in Bayesian inference: see Section 9. Under a transformation model, the maximal invariant statistic serves as the ancillary.
In a full m, m exponential model the MLE is minimal sufficient and no ancillary is called for. Example 9. A statistic a is, broadly speaking, approximately ancillary if its asymptotic distribution does not depend on the parameter. Useful approximate ancillaries can often be constructed from signed log-likelihood ratios or from score statistics. Severini Section 6.
One particularly important approximate ancillary is the Efron—Hinkley ancillary Efron and Hinkley, A particularly powerful result, which we will not prove but which amplifies comments made in Chapter 8 about the use of observed rather than Fisher information being preferable, is the following. A simple example of construction of this ancillary is provided by the exponential hyperbola. Under this model, X 1 , Y1 ,. Equation 9. The location-scale model provides a prototypical example, with the configuration statistic as the ancillary. Among models for which 9.
Comparing 9. The above is expressed in terms of a one-parameter model. We have commented that this is difficult in general, outside full exponential family and transformation models. This distribution is widely assumed in many biological and agricultural problems. Hence, the conditional density approximation is exact for this model.
As an example of calculation of the derivatives required by this approximation, and as used also, for example, in 9. Note than an Edgeworth or saddlepoint approximation to the marginal distribution of U is easy to obtain in the case when U is a sum of independent, identically distributed variates. The log-likelihood based on the full data x1 ,.
We consider an approximation to the marginal distribution of S, based on a saddlepoint approximation to the density of S, evaluated at its observed value s. This notion is developed in detail in Section 9. Note that the effect of the Bartlett correction is due to the special character of the likelihood ratio statistic, and the same device applied to, for instance, the score test does not have a similar effect.
Modified profile likelihood is intended as a remedy for this type of problem. An instructive example to look at to grasp the notation is the case of X 1 ,. Similarly, under 9. Proofs of both results are given by Barndorff-Nielsen and Cox Chapter 8. In this case, 9. The version of modified profile likelihood defined by 9. It is easy to construct and seems to give reasonable results in applications.
- Unlocking Mathematics Teaching.
- Competence to Consent (Clinical Medical Ethics series)!
- GameMaker Essentials;
- Statistical inference for noisy nonlinear ecological dynamic systems..
- Guilty Robots, Happy Dogs: The Question of Alien Minds.
- Essential Statistical Inference: Theory and Methods - Dennis D. Boos, L A Stefanski - Google книги?
- Chris P. Bacon: My Life So Far…;
- ISBN 13: 9781489987938?
- Informal inferential reasoning;
- Analysis and Simulation of Semiconductor Devices.
- John Gregory and the Invention of Professional Medical Ethics and the Profession of Medicine (Philosophy and Medicine)!
It is easier to compute than 9. A simple Bayesian motivation for 9. Now argue as follows. In many circumstances, it will be adequate to have easily computed analytic approximations to these.
In this section we review briefly the asymptotic theory of Bayesian inference. The results provide demonstration of the application of asymptotic approximations discussed earlier, in particular Laplace approximations. Key references in such use of Laplace approximation in Bayesian asymptotics include Tierney and Kadane and Tierney, Kass and Kadane The key result is that the posterior distribution given data x is asymptotically normal. Then we have. A more accurate approximation to the posterior is provided by the following.
We can rewrite as. Recall that such expectations arise as the solutions to Bayes decision problems. If the integrals are approximated in their unmodified form the result is not as accurate. Find the log-likelihood function. Find the score function and the expected and observed information. Show that the conditional distribution of Y1 ,. Show that the profile log-likelihood is invariant under this reparametrisation. Obtain the form of the profile log-likelihood. Show that the profile score has an expectation which is non-zero.
In this model the maximum likelihood estimators are not sufficient and an ancillary statistic is needed. What is the distribution of S? Sometimes, interest lies instead in assessing the values of future, unobserved values from the same probability distribution, typically the next observation. We saw in Section 3.
However, a variety of other approaches to prediction have been proposed. The prediction problem is as follows. As a simple case, we might have X formed from independent and identically distributed random variables X 1 ,. A more complicated example is that of time series prediction, where the observations are correlated and prediction of a future value depends directly on the observed value as well as on any unknown parameters that have to be estimated.
Example Apart from the fully Bayesian approach of Section 3. In this chapter we provide brief outlines of each of these methods. Book-length treatments of predictive inference are due to Aitchison and Dunsmore and Geisser Both focus primarily on the Bayesian approach. Cox and Barndorff-Nielsen and Cox described two general approaches for constructing an exact prediction set.
The first idea, due to Guttman , uses similar tests. Because of A second method uses pivotal statistics. Then, for given X 1 ,. Details are in Problem However, it stands to reason that this cannot be the only criterion for comparing two predictive procedures. For example, there may be two different ways of constructing a prediction interval, both having exactly the desired coverage probability, but the first of which always results in a shorter interval than the second.
In this case it seems obvious that we would prefer the first method, but our discussion so far does not incorporate this as a criterion. An alternative viewpoint is to define a loss function between the true and predicted probability densities, and apply the standard concepts of decision theory that we have discussed in Chapter 3. Typically X will be a random sample X 1 ,. To convert Of course this could have been anticipated from the very general properties of Bayesian decision procedures given in Chapter 3, but much of the interest in this topic is that Bayesian predictive densities often have desirable properties even when evaluated from a non-Bayesian perspective.
Hence the left-hand side of Those two cases are apparently the only known examples when a strict ordering holds for any sample size n; however, it is Harris developed an alternative approach related to the bootstrap Chapter 11 and showed that this also improves asymptotically on the estimative approach in the case of exponential families; this approach is described further in Section In a very broadly based asymptotic approach, Komaki derived a general construction for improving estimative approaches by means of a shift in a direction orthogonal to the model.
He also defined general conditions under which this shift is achieved by a Bayesian predictive density. Unfortunately a full presentation of this approach lies well beyond the scope of the present discussion. On the other hand, the concept has been criticised on the grounds that frequentist coverage probabilities of predictive likelihoodbased prediction intervals or prediction sets do not necessarily improve on those of more naive procedures; we briefly discuss that aspect at the end of this section. We do not attempt to review all the different methods, but concentrate on some of the leading developments in roughly chronological order.
To distinguish predictive likelihood from its Bayesian relatives, we shall use the symbol L z x. Lauritzen and Hinkley defined an alternative approach by exploiting the properties of sufficient statistics. Thus Assume T is determined uniquely by R, S. Then the predictive likelihood of T is again defined by The conditions of Definition 2 are now satisfied. Suppose X, Z is transformed to R, U , where R is minimal sufficient and the components of U are locally orthogonal to each other and to R. For other cases, including those where X and Z are dependent, Butler suggested that For situations in which no nontrivial minimal sufficient statistic exists, various approximations to predictive likelihood were suggested by Leonard , Davison and Butler Davison applied The approximation This is equivalent to approximating a Bayesian predictive density under a flat prior.
The appearance of Jacobian terms in the numerator and denominator of On the other hand, there is no direct justification for a uniform prior; for example, the formula In this respect, the approximate predictive likelihood suffers from the same objection as has often been voiced against the Bayesian predictive likelihood, namely, that there is no clear-cut basis for assigning a prior density and assuming a uniform prior does not resolve this difficulty. Therefore, they cannot be expected to correct for the dominant term in an expansion of coverage error of a prediction interval.
On the other hand, Butler , argued that conditional forms of calibration conditioning on an exact ancillary statistic if one is available, or otherwise on an approximate ancillary are more appropriate than unconditional calibration. The central idea is as follows. Suppose we have some statistic T a function of X 1 ,. In most problems, however, no exact pivotal exists. Problem Each of the Examples See Example To simplify the notation, we do not indicate the possible dependence on x, but the following arguments as far as An estimator. The implication is that, if we perform Bayesian prediction with a matching prior, the resulting prediction interval has coverage probability very close to the nominal value.
Full details of the asymptotic calculations use second-order asymptotics Chapter 9 , which lie beyond the scope of the present discussion. The argument follows Smith , and extends an earlier argument of Cox , Barndorff-Nielsen and Cox We also assume that asymptotic expressions may be differentiated term by term without rigorously justifying the interchange of limit and derivative. This kind of expansion typically holds with maximum likelihood estimators, Bayesian estimators, etc. Recall from our earlier example We apply the expansion Although in this case the asymptotic calculation is not needed to provide an accurate prediction interval, the example has been given here to provide an illustration of a very general approach.
The present section, however, describes some simple forms of parametric bootstrap applied to prediction, and can be read independently of Chapter Here I show that this impasse can be resolved in a simple and general manner, using a method that requires only the ability to simulate the observed data on a system from the dynamic model about which inferences are required. The raw data series are reduced to phase-insensitive summary statistics, quantifying local dynamic structure and the distribution of observations.
Simulation is used to obtain the mean and the covariance matrix of the statistics, given model parameters, allowing the construction of a 'synthetic likelihood' that assesses model fit. This likelihood can be explored using a straightforward Markov chain Monte Carlo sampler, but one further post-processing step returns pure likelihood-based inference. Skip to Main Content. Aspects of Statistical Inference Author s : A. First published: 26 September All rights reserved. About this book Relevant, concrete, and thorough--the essential data-based text on statistical inference The ability to formulate abstract concepts and draw conclusions from data is fundamental to mastering statistics.
Aspects of Statistical Inference equips advanced undergraduate and graduate students with a comprehensive grounding in statistical inference, including nonstandard topics such as robustness, randomization, and finite population inference. Welsh goes beyond the standard texts and expertly synthesizes broad, critical theory with concrete data and relevant topics. The text follows a historical framework, uses real-data sets and statistical graphics, and treats multiparameter problems, yet is ultimately about the concepts themselves.