Talk:German tank problem

From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Mathematics (Rated B-class, Low-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
B Class
Low Importance
 Field: Probability and statistics
WikiProject Statistics (Rated B-class, Low-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

B-Class article B  This article has been rated as B-Class on the quality scale.
 Low  This article has been rated as Low-importance on the importance scale.
 
WikiProject Military history (Rated C-Class)
MILHIST This article is within the scope of the Military history WikiProject. If you would like to participate, please visit the project page, where you can join the project and see a list of open tasks. To use this banner, please see the full instructions.
C This article has been rated as C-Class on the quality assessment scale.

(init)[edit]

Title added for discussion —Nils von Barth (nbarth) (talk) 23:37, 16 February 2009 (UTC)

Hi guys, I really like this problem, and I didn't see it on Wikipedia, so I decided to write one myself! I hope I didn't screw things up too much :P Themandotcom (talk) 18:25, 24 March 2008 (UTC)

Title?[edit]

I’ve seen this problem at various times, to illustrate differences in estimation, but I haven’t heard a pithy name – has it one?

(I named it “Maximum of a discrete uniform distribution” to start to avoid the sesquipedalian “Estimation of the maximum of a discrete uniform distribution”, which is admittedly more precise.)

—Nils von Barth (nbarth) (talk) 16:41, 16 February 2009 (UTC)

Ok, looks like German tank problem already exists (and this was the pithy example that I had heard), so I’ve merged the article there (hat tip to Michael Hardy).
—Nils von Barth (nbarth) (talk) 23:40, 16 February 2009 (UTC)

Median discussion has errors[edit]

The median-unbiased estimator (with maximum concentration on symmetric convex sets) differs from the umvu estimator: See van der Vaart's book, which has both examples. The umvu estimator is not median-unbiased, and so there must be an error in the description of the umvu estimator. Thank you. (I apologize for being brief today.) Sincerely, Kiefer.Wolfowitz (talk) 14:27, 27 July 2009 (UTC)

Be bold and remove or correct it! Bo Jacoby (talk) 08:35, 28 July 2009 (UTC).

iPhone production[edit]

I don't think it is appropriate for this article to include a discussion of iPhone production. This is an article on a mathematical problem. The connection to estimation of wartime production is appropriate because of the strong historical ties. There is no such connection to estimating the production of iPhones. It would be inappropriate to use this article as an archive of situations where this problem has arisen. I think we should restrict the scope of this article to (1) a mathematical treatment of the problem and (2) a discussion of the historical applications. iPhones meet neither of these criteria. Nippashish (talk) 16:50, 20 April 2010 (UTC)

Agree. (The iPhone production material has since been removed.) 98.210.208.107 (talk) 14:08, 19 February 2011 (UTC)
Totally disagree. The iphone production was a neat and high profile use of the technique 128.114.23.110 (talk) 06:03, 8 March 2011 (UTC)

cleanup needed[edit]

The article looks messy in mine eyes. The structure needs clean-up. Some part of the article tacitly assumes that only one tank has been observed. The distinction between frequentist and Bayesian approaches is not clear. Somebody please help. Bo Jacoby (talk) 04:31, 29 May 2011 (UTC).

I have now done a lot of cleanup myself. The section 'Observing one tank' is not very enlightning, and I want to remove it. Any objections? Any comments? Bo Jacoby (talk) 18:22, 30 June 2011 (UTC).
Under "Specific data" it says "Applying the above formula..." but there's no formula above. — Preceding unsigned comment added by 94.237.38.23 (talk) 16:27, 31 October 2011 (UTC)

"circular reasoning"[edit]

The following could use some polish:

"Note that one cannot naively use m/k (or rather (m + m/k − 1)/k) as an estimate of the standard error SE, as the standard error of an estimator is based on the population maximum (a parameter), and using an estimate to estimate the error in that very estimate is circular reasoning."

If psi = f(theta), it is perfectly legitimate to estimate psi by f(theta-hat) ... it isn't circular reasoning. Might not produce a good estimator, but that's a different matter.

Floombottle (talk) 20:30, 4 April 2012 (UTC)

mean, sd, pmf[edit]

Does anyone have a reference for these? The pmf was clearly wrong until I changed it just now. I changed the normalizing constant so now it sums to 1 and agrees with the mean and sd that are given on the wiki page. Here is a verification in R.

> tanks <- function(n, k, m){(k-1) / k * choose(m-1, k-1) / choose(n, k)}
> m <- 14
> k <- 4
> n <- 14:1000000
> sum(tanks(n, k, m))
[1] 1
> sum(tanks(n, k, m)*n) # mean (with approximation error)
[1] 19.5
> (m-1)*(k-1) / (k-2) # mean from wikipedia
[1] 19.5
> sqrt(sum(tanks(n, k, m)*n^2) - sum(tanks(n, k, m)*n)^2) # sd (with approximation error)
[1] 10.35591
> sqrt(3 * 13 * 11 / (1 * 4)) # sd from wikipedia
[1] 10.35616

I was lazy and just verified these results, but I did not take the time to actually go through the math and prove that the normalizing constant is (k-1)/k instead of k/(k-1) (which is what it used to say).

Can someone also give some intuition behind the pmf? I understand that the {m-1 \choose k-1} is coming from choosing where the observed serial numbers (except the maximum) come between 1 and m-1. I also understand the {n \choose k} since we are sampling k tanks from the n that we are considering. But I have no good intuition behind the (k-1)/k part.

AustenWHead (talk) 02:25, 11 July 2012 (UTC)

rounding mean value[edit]

Editor HugoMe just changed N \approx \mu \pm \sigma = 20 \pm 10 into N \approx \mu \pm \sigma = 19.5 \pm 10 . The rounded value is to be preferred, IMO. Bo Jacoby (talk) 16:24, 28 December 2012 (UTC).

"The" frequentist estimate and "the" Bayesian estimate??[edit]

In this article in its present form, we are told that are certain expressions.

That's silly. Either an MLE or an unbiased estimate would be a frequentist estimate, and they're different. And there are as many Bayesian estimates as there are priors. Whoever wrote this didn't say which criteria were intended to be satisfied by "the" [sic] frequentist estimate or which prior was used in obtaining "the" [sic] Bayesian estimate. Omission of that sort of thing is disrespectful to the reader. Michael Hardy (talk) 21:23, 30 July 2013 (UTC)

I totally agree, though it is nice to contrast the two different approaches to the problem. I think it can be fixed by just being clear that The frequentist/Bayesian formula uses refers to the estimates given in the text below. 84.93.172.231 (talk) 16:46, 10 September 2013 (UTC)
"There are as many Bayesian estimates as there are priors". If you suggest another prior then please include it. Bo Jacoby (talk) 04:06, 9 July 2014 (UTC).

Historical Problem[edit]

It seems to me that Panther tank had two series of eight wheels on each side, so that there should be thirty-two wheels on each tank, and not forty-eight, as it is written here. However, the total of ninety-six wheels could be true if we suppose there were three tanks instead of two. By the way, the article says SHAEF "obtained" two tanks. It would be interesting to know by which means. Can we assume that those tanks were captured in full condition at Anzio ? — Preceding unsigned comment added by DrJosef (talkcontribs) 16:58, 24 August 2013 (UTC)

I agree on the number of wheels. I've found a citation that it was 2 tanks, so assuming no spare wheels mounted or wheels lost, that gives 64 in total. I've edited the article. On where these tanks were captured the citation mentions Anzio but not specifically as where they were captured.--Flexdream (talk) 11:33, 7 July 2014 (UTC)

Move 'Example' section?[edit]

The 'Example' section appears at the start of the body of the article. Shouldn't the 2 examples be moved to their relevant approaches i.e. the frequentist estimate and the Bayesian estimate? Rather than stand apart at the start?--Flexdream (talk) 11:10, 7 July 2014 (UTC)

Uh... "Credibility"??[edit]

I'm no expert but I've never heard "credibility" used as the Bayesian equivalent of probability. I've heard of "credibility intervals," but when talking about probabilities and distributions, everything I've ever read so far just uses "probability" in the usual way, which seems perfectly reasonable. This smells like a neologism to me, and might even count as original research. I'm going to leave this up for a few days and change it back to probability if nobody objects. Solemnavalanche (talk) 15:58, 23 September 2014 (UTC)

An event may be more or less probable to happen, and a hypothesis may be more or less credible to be true. Hypotheses are not more or less probable. So it is OK to talk about the probability of an event, and the credibility of a hypothesis. Bo Jacoby (talk) 22:41, 24 December 2015 (UTC).

Another approach[edit]

Another approach to solving the German tank production is to assume that the average of the serial numbers, is similar to the real average. Then multiple by two to give a figure. Then compare to the max of the serial number and this figure.

Here is some examples, say I have a series of serial numbers 5943 3641 5948 6592 6891 6967 5402 124 1131 8702 3947 1697 325 2164 2888 2755 6829 9760 6574 2737 4998 335 1556 3538 6152 5973 9036 3611 16 5462

The maximum serial number is 9760

The average = 4390 So the top number is the maximum of (2 x 4390, 9760) = 9760 BernardZ (talk) 08:50, 22 December 2015 (UTC)

German tank production using the Zimmermann method[edit]

(I moved this conversation from my talk page. Bo Jacoby (talk) 22:13, 24 December 2015 (UTC).)

At work this is how we solve the German tank production problem/ We assume that the average of the serial numbers, is similar to the real average.

If so then multiple this average by two to give a figure. Then compare to the max of the serial number you have and this figure.

Here is an example, say I have a series of serial numbers 5943 3641 5948 6592 6891 6967 5402 124 1131 8702 3947 1697 325 2164 2888 2755 6829 9760 6574 2737 4998 335 1556 3538 6152 5973 9036 3611 16 5462

The maximum serial number is 9760

The average = 4390 So the top number is the maximum of (2 x 4390, 9760) = 9760

Which is pretty close to what it is, 10000. BernardZ (talk) 08:55, 22 December 2015 (UTC)

I have three objections to your contribution. The first objection is formal: you did not provide a reference, and so the contribution is 'original research', which is not allowed on wikipedia. See wp:OR. The second objection is that you method is theoretically unfounded. Once you know the number, k, of observed sequence numbers, and the highest observed sequence number, m, the other observed sequence numbers contain no additional information as to the estimation of the total number of tanks, N. The third objection is that your method does not estimate the uncertainty, so you cannot tell what 'pretty close' means. With your sample data, k=30 and m=9760, the bayesian estimate is
N ≃ (m−1)(k−1)(k−2)−1 ± (m−1)1/2(k−1)1/2(k−2)−1(k−3)−1/2(mk+1)1/2 = 10107.5 ± 360.7
In this case your method is inferior to the bayesian method. Your estimate, 9760, is 0.96 bayesian standard deviations below the bayesian mean while the correct value, 10000, is only 0.30 bayesian standard deviations below the bayesian mean.
Bo Jacoby (talk) 11:26, 23 December 2015 (UTC).
Obviously you do not know much statistical history and modern methods. http://wikieducator.org/Point_estimation_-_German_tank_problem check out Mean times 2 estimator. We use it at work to estimate max street numbers delivered by walkers. BernardZ (talk)
Suppose you have conquered k=5 tanks with the highest serial number m=10. If the conquered tanks had serial numbers 1, 2, 3, 4, and 10, then the mean value is 4 and your estimate is N ≃ max(8,10) = 10. If the conquered tanks had serial numbers 6, 7, 8, 9, and 10, then the mean value is 8 and your estimate is N ≃ max(16,10) = 16. In any case you know that the tanks 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10 have been produced. So the estimated values of N should not differ. The bayesian estimate is N ≃ 12 ± 3.46 . Your method is far inferior. You should do better. Bo Jacoby (talk) 20:42, 25 December 2015 (UTC).
Whether you can do better is irrelevant, this is an encyclopedia which should include all methods used. BernardZ (talk)
Should we document a method just because it is used? It is not helpful to include arithmetic errors in an encyclopedia. Bo Jacoby (talk) 04:53, 29 December 2015 (UTC).
The answer is as the page was written yes, what I did is change the page slightly so its clear its discussing this method only which bypasses the problem. I do not know what you mean by arithmetic errors????

ps the zimmermann method is quite accurate for very small sample sizes. BernardZ (talk)