# Talk:Optimal design

WikiProject Statistics (Rated C-class, High-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C  This article has been rated as C-Class on the quality scale.
High  This article has been rated as High-importance on the importance scale.
WikiProject Mathematics (Rated C-class, High-importance)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 C Class
 High Importance
Field: Probability and statistics (historical)

I cleaned up this page, because Winterfors and this humble author seem to have implemented all the previously discussed changes.

(If anybody (!) wants to see previous discussions, then please refer to the history of this TALK:Optimal design for this discussion!)

Kiefer.Wolfowitz (talk) 22:13, 27 May 2009 (UTC)

NOTICE: I was wrong to remove such discussions. Wikipedia has robots that archive old topics, so there is no need to delete things. In fact, such deletions violate the talk guidelines, I have been told. (I have resolved to go and sin no more.) Kiefer.Wolfowitz (talk) 13:42, 29 June 2009 (UTC)

In the article the matrix X'X is called the information matrix, but is it not its inverse, the covariance? Judging by the notation it is implied the matrix is X transposed times X, which is how you would usually compute the covariance, X being zero-mean observations. —Preceding unsigned comment added by 130.235.3.30 (talk) 14:54, 15 November 2010 (UTC)

## Mathematical Details: Matrix semidefinitions, and vexing concavities

THREE problems remain:

In the footnote on the Loewner order, I forgot to explain Kiefer and Loewner optimality! So the whole point is lost! (I'll have to fix that another day.)

#### Matrix theory

I wrote the matrix theory section off the top of my head. It would be useful to revise the matrix terminology to be consistent with Pukelsheim (and to be internally consistent). I avoided disasters in the main text, but could somebody with fresh eyes (and ideally a better memory of our master's magnus opus) examine the footnotes on matrix theory, please?

#### Maximizing concave (information) functions (versus Minimizing convex variance functions)

The article discusses maximizing Fisher information in the theory-section and then reverts to convex minimization (without mentioning that minimizing a convex function is like maximizing a concave function). Notwithstanding Pukelsheim's preference for concavity, we should use convex minimization (because that is standard in science especially in non pre-Rockafellar optimization, and certainly in Wikipedia). Kiefer.Wolfowitz (talk) 22:12, 27 May 2009 (UTC)

## Examples (e.g. graphical) are needed

The article needs examples, e.g. Kôno-Kiefer or Atkinson (or Gossett) designs for RSM. Graphics would be great! —Preceding unsigned comment added by Kiefer.Wolfowitz (talkcontribs) 16:43, 8 June 2009 (UTC)

## Introduction: Providing sufficient context for Wikipedia readers?

Responding to Melcombe's just concerns, I wrote an introductory paragraph. Winterforss wrote a similar paragraph 1-2 months ago, I believe, which was somewhat restrictive (linear regression) and so was deleted. Is the new introduction satisfactory? Kiefer.Wolfowitz (talk) 13:40, 29 June 2009 (UTC)

There are three points to consider: (i) understandability ; (ii) summary of content; (iii) delimitation of context. For (iii), there is the problem of having a short title like "optimal design" with the content containing only (I think) material about "experiments" as opposed to "surveys" (where there are results about things like optimal strategies for stratification):but possibly design of surveys is meant to be (or already) covered. For (i), it may be worth borrowing some phrases from the intro to design of experiments to say what it is that an experimenter can adjust. And, on a more general level, is not part of the practical considerations of applying the theory of optimal design to recognise that, for a given specified number of samples, there may be a design with only a few extra samples that will have much improved properties (particularly if several criteria are considered jointly). Melcombe (talk) 09:05, 30 June 2009 (UTC)
Regarding the short title (iii), I get a headache just imagining moving this article to Optimal design of experiments, which already is redirected here, because of the necessary updating on many linking pages.
* It would be better to create an article on "Optimal design of sampling plans" or something similar, because observational studies are not (Baconian) experiments. The literature on optimal allocation of observations (following Gustav Elfving, and being most used for response surface designs) is closest to survey sampling, and I could add a section on optimal allocations and say a word about sampling.
Now, let me address the questions about other criteria, a concern raised by Box's polemic on "alphabetic optimality" for example: You raise a concern about optimal designs for one criterion not being optimal for other criteria, e.g. multiple criteria. This concern is dealt with already, in the discussion of Flexible optimality criteria and convexity (see e.g. Atkinson et alia or Pukelsheim or the uncited papers by S. Gutmair and by Alberto Seeger) and with greater accessibility in Model dependence and robustness.
* [BTW, it is well known that a Pareto optima for twice differentiable, convex objective functions can be supported as an optimum of a convex combination of those utility functions. This is one reason why I discussed (convex combinations or positive linear combinations of convex functions). (But maybe a footnote about Pareto efficiency of vector criteria would be appropriate, given that this article is rather technical already.)]
More concretely, optimal designs can be (and often are, I'll state here) more robust on other criteria than what Box and his copycats call "traditional designs". See the references to Wu & Hamada and to Atkinson for Box's principle RSM designs, Box Behnken being particularly inefficient by most criteria for dimensions greater than 3. Robustness with respect to changing models is discussed in the article's footnotes, where I reference papers by Kiefer on RSM, giving page references: "See Kiefer ("Optimum Designs for Fitting Biased Multiresponse Surfaces" pages 289--299)". I tried to be neutral in the POV in the article, and just stated that it's important to consider alternative models and alternative criteria, and be informed by practical experience.
Please let me know if the article's existing content doesn't address your concerns sufficiently. I am sure that the existing content can be made more accessible to readers, of course, and it would be useful to add additional content, especially practical examples with graphics. I believe that it would be most informative to compare a Box-Behnken design with an approximately optimal design of Atkinson/Kiefer/Kôno or an optimal fixed-run design from Hardin & Sloane. I can look for an industrial example from the unpublished papers of Hardin & Sloane. Best regards, Kiefer.Wolfowitz (talk) 11:20, 30 June 2009 (UTC)
First, regarding renaming: it is possible to ask for help from administrators to do anything even moderately complicated. You will find details on the help page referenced within the page brought up by the "Move" tab. It is mainly question of working out what needs to be moved where, and setting this out. (As for dealing with "what links here" articles... this might actually be simpler than you think if you have not noticed that these include all those linking via a navbox.) If you don't think it worth renaming it would be good to find an existing article as a helpful starting point for those looking for other possible interpretations of the title. Second, the point I was trying raise wasn't directly about changing/comparing criteria but about the effect of changing the required sample size ... the effect of this on the optimality criteria ...with the consequence of the possibility that a small extra cost in terms of sample size might yield a big benefit. Perhaps I am thinking of the situation where it is either possible or impossible to find designs with a given sample size which will yield uncorrelated estimates of constrasts, but it might well be the case where a plot of the optimal value of a criterion against sample size has sharp jumps (and need it be monotonic?). Melcombe (talk) 12:48, 30 June 2009 (UTC)
A short answer to your question is that the Wald & Kiefer approach traditionally ignored the cost of experimental runs, leading to probability measures as solutions. For the Kiefer-Kôno design, the optimal design can be supported on a subset of the cube with 3 levels---that's the good news. The bad news is (I recall) that Kiefer's RSM design for the cube had algebraic allocations in $\mathbb{Q}(17)$! That's a mathematically optimal design, which is out of reach for the rational allocations needed for practice. Then Pukelsheim and others have described discretization methods, which lead to approximately optimal designs. Heuristics using continued-fraction approximants can suggest some natural numbers of replications, but in general a simultaneous rational-approximation requires Lovasz-Lenstra-Lenstra style basis-reduction algorithms, and don't seem to have been implemented anywhere.
In practice, I would suggest that people look at the catalogs available in Hardin & Sloane's gosset system, for various replication/run sizes, and look around---which is what you suggest. (I added earlier today or yesterday a short description of Hardin & Sloane's approach, and listed it above the Wald-Kiefer approach, because gossett seems to be more practical for most readers.)Kiefer.Wolfowitz (talk) 13:36, 30 June 2009 (UTC)

Following Melcombe's suggestion, I added a paragraph about the prudence of examing optimal designs with runs greater than or equal to the number of runs initially contemplated. (I find discussions here and on other the more mathematical-statistics topics much more productive than on discussing foundational schismatics; I thank the discussants past and present for catching errors and good work!)19:04, 30 June 2009 (UTC)Kiefer.Wolfowitz (talk) 19:05, 30 June 2009 (UTC)

## Move Optimal design ?

Melcombe has suggested moving the page. (I suppose that I can do it sometime this month.) What is the best name?Kiefer.Wolfowitz (talk) 13:42, 30 June 2009 (UTC)

I agree that a name change would be a good idea, and that the two main options are "Optimal experimental design" or "Optimal design of experiments". According to Google, the first term is about 60% more commonly used than the second, but I would still go with "Optimal design of experiments" to be consistent with the "Design of experiments" article. Winterfors (talk) 22:08, 30 June 2009 (UTC)

## Optimality criteria

I've moved the material explaining D-optimality, E-optimality, etc., from the footnote to the main body of the article. Please review.

I couldn't find this material anywhere else in WP, and I thought it shouldn't be hidden in a footnote. Have also created redirects for D-optimal design and E-optimal design to this page. Regards, --JN466 16:45, 8 July 2009 (UTC)

Your move adds more substance to the article. (My moving the optimality criteria to the footnote was to protect the reader afraid of mathematics. However, in retrospect, this move left the article very abstract and perhaps useless for most readers.) Good work! Cheers, Kiefer.Wolfowitz (talk) 20:57, 8 July 2009 (UTC)
Thanks, glad you didn't mind. Given how much coverage there is of D-optimal design, E-optimal design etc. we could or should probably have dedicated articles for them at some point. JN466 12:10, 13 July 2009 (UTC)
Why separate articles? An alternative would be to have what was there (before moving the stuff out of a footnote) as an "overview" section, with additional new sections giving fuller details of individual approaches. This would have the advantage of not having to define the same background material in each separate article. Melcombe (talk) 12:53, 13 July 2009 (UTC)
The first thing would be to expand the sections in this article, as you say. Once they grow beyond what can comfortably be accommodated here (if they ever grow that much), then it might make sense to spin out standalone articles and retain shorter summaries here. We are very far away from that though. First things first. Cheers, JN466 13:07, 13 July 2009 (UTC)

## Facts on Wald

There were a couple "[citation needed]" tags: I provided a quotation and detailed references for the relation of optimal design and Wald. The other fact concerned Wald and sequential analysis, which is so well known that I just removed it; see the many references in this paragraph or Master-level textbooks like Hogg and Craig. (Apparently Brainard or somebody had some unpublished method in WWII in the U.K., but Wald had publications, without errors). Kiefer.Wolfowitz (talk) 17:48, 11 February 2010 (UTC)

I lifted the Wald citation from the article on sequential analysis.
Citing the primary source (Wald) looks like "original research" but it is substantiated in so many sources that it's a waste of time to footnote another "authority", imho. Kiefer.Wolfowitz (talk) 18:05, 11 February 2010 (UTC)

## Is Information matrix refered to as |(X'X)−1|, is the Fisher information matrix?

As you know Fisher Information Matrix is defined based on the log-probability. but in this article for D-optimality, it is said that

"D-optimality (determinant) A popular criterion is D-optimality, which seeks to minimize |(X'X)−1|, or equivalently maximize the determinant of the information matrix X'X of the design. This criterion results in maximizing the differential Shannon information content of the parameter estimates."

Is this information matrix and Fisher Information matrix the same? if so, how it is proven. Because Fisher Information Matrix is defined based on the log-probability with respect to parameters. — Preceding unsigned comment added by 217.219.244.140 (talk) 11:19, 25 August 2013 (UTC)