Jump to content

User talk:Stpasha: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
→‎MLE and Bayes: Le Cam and Ferguson: Bernstein-Von Mises Theorem implies that posterior asymptotics are independent of (positive) priors and depend on the Fisher (sic.) information
Line 242: Line 242:
::A remaining problem is to check whether the article's asymptotic results are proved for only consistent sequences of zeros of the score function, although the results seem to be stated for only maxima (certainly in the introduction).
::A remaining problem is to check whether the article's asymptotic results are proved for only consistent sequences of zeros of the score function, although the results seem to be stated for only maxima (certainly in the introduction).
::Best regards, [[User:Kiefer.Wolfowitz|Kiefer.Wolfowitz]] ([[User talk:Kiefer.Wolfowitz|talk]]) 12:27, 27 January 2010 (UTC)
::Best regards, [[User:Kiefer.Wolfowitz|Kiefer.Wolfowitz]] ([[User talk:Kiefer.Wolfowitz|talk]]) 12:27, 27 January 2010 (UTC)

== Please do not undo changes made to the Binomial distribution ==
I recently looked up the binomial formula on the corresponding Wikipedia page, and had to correct the formula. I quickly looked through the page history, and discovered you had been the last person to make the following change to the formula.

In the notation used in the page discussing the binomial distribution; the distribution describes the probability of having n "successes" after k trials. Thus the probability should read n choose k times p^n * (1-p)^(k-n). The formula after to edited it read: n choose k times p^k * (1-p)^(k-n).

Just briefly, the logic behind the formula is very simple. The binomial distribution can be conceptually thought of as a series of k independent trials, with a probability p for success. p(k;n) describes the probability of having n successes after k trials. Since the order of the failures/successes do not matter (just the total number of trials, and successes), the factor of n choose k is there to account for all the possible orderings I can have of successes. Finally the probability of having a specific ordering of n successes after k trials, is the probability of an individual success raised to the n power (that is p^n) times the probability of having k-n failures, which is (1-p)^(k-n).

Sorry for the wording and brevity, but I'm in a hurry (I was working on homework). Hopefully this make sense, if not go through some thought experiments (think about what trends you expect for the formula for simple examples like p = 0.9999 or p = 0.00001 or N -> infinity)
-BOB

Revision as of 23:55, 16 February 2010

Too much TeX

I think you're using TeX too much in this edit. I would much prefer to see xi rather than . "Diplayed", as opposed to "inline", TeX looks good on Wikipedia. With "inline" TeX the characters often look three or four times as big as the surrounding characters, the TeX is often conspicuously higher or lower than the surrounding text, and periods or commas or the like get badly misaligned or even shifted to the next line.

See spot run
. Spot runs fast.

The period after the first sentence above should be on the same line as the word "run", not the next line. Michael Hardy (talk) 01:22, 2 July 2009 (UTC)[reply]

I totally agree that TEX should not be abused in inline formulas, although 3-4 times seems to me like an overexaggeration, it is closer to 1.5 times in my browser. Might depend on browser settings though... do you have text size set to "normal" in your browser? some old browsers (like IE) scale down text size without scaling down picture sizes, which could probably generate your 3-4 times difference.
However in my defense i'd like to point out that not every <math></math> tag gets interpolated into a picture. Simple formulas get converted into HTML markup: compare for example ''x<sub>i</sub>'' (which produces xi) versus <math>x_i</math> (which produces ). Second example is both more readable when editing the article, and is closer to true TEX output since it's using serif fonts.
In the "explanation of notation" section i deliberately forced formulas to render as pictures, with the idea that when they look larger they are more like captions to each bullet point. Well maybe this idea wasn't too sound, but seemed ok to me when i made the edit. // Stpasha (talk) 05:03, 2 July 2009 (UTC)[reply]

SmackBot is killing my HTML :(

SmackBot converts following wiki code

<ul>
<li> item 1
     <p> blah-blah-blah
<li> item 2
</ul>

into

<ul>
<li> item 1

blah-blah-blah

<li> item 2 </ul>

Which renders differently in a browser:

  • item 1

    blah-blah-blah

  • item 2

versus

  • item 1 blah-blah-blah
  • item 2

// Stpasha (talk) 19:14, 2 July 2009 (UTC)[reply]

Hm, tricky.

You could look at using

* item 1 <br> blah-blah-blah
* item 2
  • item 1
    blah-blah-blah
  • item 2

or

* item 1 
:blah-blah-blah
* item 2
  • item 1
blah-blah-blah
  • item 2
Rich Farmbrough, 07:08, 3 July 2009 (UTC).[reply]

Words and mathematics

Regarding your recent comment in the article on bias, I can think of three reasons for combining English and mathematics. First, writing English sentences helps explain the meaning of the mathematics, and beginning mathematical writers often write mathematical formula and arguments whose shortcomings would become apparent upon being spoken aloud. Second, the formula serves as a mnemonic, so is useful as an addition to the English. Third, some visually impaired Wikipedia-readers may be able to use the written text but not the symbols. It may be that in general wordiness can be removed; it may be that your edits are correct, but I just thought I'd say a word for the loyal opposition. Best regards, Kiefer.Wolfowitz (talk) 12:48, 7 July 2009 (UTC)[reply]

Hehe, I welcome opposition :) But will vigorously defend myself! I do think that there are certain places where some spelling/pronounciation hints ought to be added. For example when formula might contain one of the rarer greek letters (ξ, ζ, υ, etc), or even hebrew (eg, is commonly used to denote the power of countable sets, but many people will be puzzled how to read this cartoon of a letter). In other cases words will not help and even distract from understanding the formula. Compare “” and “an unbeknownst quantity being multiplied with itself and then taken together with the same quantity fivefold would yield six”. As for visually impaired people, the lead sentence of the article already states the definition using words only so they should be ok. // Stpasha (talk) 23:28, 7 July 2009 (UTC)[reply]

thanks

didn't realize. just copied and pasted it from the Lvivske article. fixed both now.--Львівське (talk) 19:04, 21 July 2009 (UTC)[reply]

Curly quotes

Howdy. At the MOS thread, you mentioned that the "Wikimedia foundation is unwilling to support Q html tags". I was wondering if you'd happen to have a handy link to wherein this is mentioned/alluded to/discussed? Much thanks. :) -- Quiddity (talk) 19:35, 23 July 2009 (UTC)[reply]

Hi Stpasha. If you haven't already noticed it, as you requested back in July: I have fixed so the curly quotation marks are grouped and locked together in the edittools. See MediaWiki talk:Edittools#Quotation marks. If you don't see it in the edit window, then you need to bypass your browser cache. (Some web browsers cache the Wikipedia javascripts for up to one month.)
--David Göthberg (talk) 22:50, 22 November 2009 (UTC)[reply]
Thanks! Although by now I already figured how to type them on a keyboard (see [1]), this is still a change in the right direction.  … stpasha »  23:32, 22 November 2009 (UTC)[reply]

MSE

Hi; thanks for your enhancements to the article. I've changed some of the greek letters back to TeX b/c it looks much cleaner than HTML and is just as compact. Secondly, I have a copy of Cheng & Amin 1982 as a PDF which I bought. The article is also on JSTOR http://www.jstor.org/pss/2345411. I know JSTOR allows those with access to retreive copies freely on a limited basis, so there should be no problem for me to e-mail you a copy of the pdf if you wish. -- Avi (talk) 02:08, 2 August 2009 (UTC)[reply]

And as an aside, I for one have no trouble reading Hebrew . -- Avi (talk) 02:09, 2 August 2009 (UTC)[reply]
Hehe, I learned how to read א and ב, and luckily no other Hebrew letters are currently used in mathematics — otherwise that would have been a disaster! :) I also was harboring vengeful ideas to use Ж, Ю, or Ъ as mathematical symbols in an article, but then I’d have to come up with an article first…
I for once used to like TeX more too, but then I noticed that it incorrectly renders certain letters in upright font instead of italics, sometimes inserts too many spaces, not to mention that certain simple TeX fails to render as simple HTML and gets converted into images which are larger than surrounding text …
“true” TeX inline HTML inline TeX Scriptstyle TeX
θ
n
(n + 1)−1
θ ∈ Θ
σε2
½ or 12
... stpasha » talk » 05:53, 2 August 2009 (UTC)[reply]

Hi, Stpasha. There is another option for TeX, using \scriptstyle. I've added it to your table above in blue, for comparison. -- Avi (talk) 15:19, 2 August 2009 (UTC)[reply]

Hm, nice find. I already forgot about it, as \scriptsyle’s use is to generate font in the size of the subscript. Now regarding that paper you linked — I downloaded a copy myself from jstor — when discussing consistency it first says that consistency was proved in Cheng & Amin (1979), which is a technical report at Cardiff university (I’m not even sure the report exists anywhere except at that university), and then they say that in Cheng & Amin (1982) (another technical report at Cardiff) consistency was proven for 3 particular distributions: Weibull, Gamma and lognormal. So basically they cite unaccessible papers. ... stpasha » talk » 17:12, 2 August 2009 (UTC)[reply]
True, the best we can do, I believe, is what I did: cite the accessible paper. The article makes no more claim than does the cited Cheng&Amin paper; beyond that, perhaps we can e-mail Dr. Cheng directly for a copy of those papers. I did e-mail him a number of months ago when starting the article. Oh, and I am completely jealous that you have JSTOR access :) Have you thought about adding yourself to Category:Wikipedians who have access to JSTOR so paupers like me can bombard you with requests :D ?-- Avi (talk) 17:47, 2 August 2009 (UTC)[reply]
Regarding the notation, x(0) = −∞ and x(n+1) = +∞ is used in Anatolyev+Kosenok’s article in disguise (that article can be downloaded freely from the authors’ website), and also by Ekström in the (2001) “Consistency of generalized maximum spacing estimates” (I can send you the pdf if you tell me your address; my gmail address is the same as my wikipedia’s username). That article also proves the consistency for the generalized MSE, we only need to figure out which of those conditions are superficial in the “regular” MSE case.
I also wonder if there is a point in replacing simple ''θ'' with <math alt="theta">\theta</math>, and also what those “alt” attributes do (I thought the engine already generates alt text for images with the TeX code of the content). ... stpasha » talk » 23:30, 2 August 2009 (UTC)[reply]
Re: Ekstrom, e-mail on the way. Re: "alt", I am not sure, and there is a line on WP:ALT about adding manual alt notes, although that is impossible for display mode. If that is not necessary, I'd be glad to get rid of the alt text. -- Avi (talk) 23:36, 2 August 2009 (UTC)[reply]

Inkscape

Well, I guess I can console myself that I just started using it last week, what you did was fantastic. There is so much one can learn, and so little time to learn it. Do you have any suggestions for quick primers? Thank you again! -- Avi (talk) 22:45, 3 August 2009 (UTC)[reply]

Well, that’s my second picture in Inkscape, first one for the Ordinary least squares article (“geometry of OLS”). So I already learned that it’s best to vectorize all text labels (“Object to Path” function). An alternative would probably be to embed the definition of all glyphs used into the image, but I haven’t figured out how to do that… ... stpasha » talk » 06:52, 4 August 2009 (UTC)[reply]

Can you please add a citation or a justification for your statement "This theorem follows from the continuous mapping theorem and the portmanteau theorem"? In Grimmett & Stirzaker it is really just an exercise without a solution, and the proofs I saw in the literature are a bit more complicated than that. Please see this edit which points out that it does not follow immediately from the continuous mapping theorem (which I wrote in the article earlier myself). Also, there is Slutsky's theorem in n-dimensions, but I did not see it anywhere in infinite dimension while the continuous mapping theorem and the portmanteau theorem do hold in infinite dimension. Thanks! Jmath666 (talk) 15:48, 13 September 2009 (UTC)[reply]

I was going to add the proof to the theorem yesterday, but fell asleep :) Now it’s there. The proof becomes quite simple once we establish the fact that (X_n, Y_n) converges in distribution to (X, c) — the fact which can be shown using the portmanteau lemma. As for the infinite dimensions, I believe it does hold in that generality, only most textbooks never consider infinitely-dimensional random variables, so there is no need for them to overexert themselves. ... stpasha » talk » 19:48, 13 September 2009 (UTC)[reply]
Thanks! If you know of reference that shows that (X_n, Y_n) converges in distribution to (X, c) in infinite dimension that would be nice. Jmath666 (talk) 22:25, 13 September 2009 (UTC)[reply]
I added the proof to the Convergence of random variables/Proofs#propB3 page; and it doesn’t seem it uses any finite-dimensional assumptions. Hope the article won’t get deleted by the time you get there :) ... stpasha » talk » 05:48, 14 September 2009 (UTC)[reply]

mos for images

at normal distribution, you mention a MOS for images. Where is this? PDBailey (talk) 17:14, 23 September 2009 (UTC)[reply]

I’m not aware of a specific MoS for images, but as far as the image contains text labels, those text labels are governed by the usual WP:MOS and the conventions for displaying math formulas. There is also the tutorial WP:HCGWA, but they never emphasize the point that if the graph is to be displayed in thumbnail, it should be scaled down properly, so that all lines and labels are still legible. Some of the examples on that tutorial page show exactly the opposite situation.
As an example take a look at pictures in the “Convergence of random variables” article: the first one (convergence in distribution) is clear and legible, while the second (section convergence in probability) is quite to the contrary barely visible and the text labels can't be seen at all (well that’s not even to mention that the picture in fact demonstrates convergence almost surely, but that’s already details)  … stpasha »  17:57, 23 September 2009 (UTC)[reply]

Initial context-setting

Please look at this edit. I don't think "measure theory", by itself, tells the lay reader that mathematics is what it's about. "Number theory" or "algebra" or "geometry" or "calculus" suffices, but "measure theory" is something most lay readers have never heard of. Michael Hardy (talk) 00:38, 7 December 2009 (UTC)[reply]

Never confused?

If we leave only one definition (leaving the other one as a short subsection describing the differences in the alternative formulation), it has following advantages: (1) the reader will never get confused regarding which definition is used on the page,

I'm amazed that you say the reader will never get confused. We had years of experience with this in geometric distribution. The article was full of emphatic notices that there were two different definitions, and we still had idiots coming along incessantly citing a textbook that gave a different mean or mfg from the one the article gave and expressing shock that such an error would be there, and then "correcting" it. Only putting BOTH columns there put an end to (most of) that. That's what happened when readers never got confused. Michael Hardy (talk) 22:00, 12 December 2009 (UTC)[reply]

Ok, maybe “never confused” is an overstatement, yet I stand by my initial suggestion. Of course, the two-column infobox might be fixing the problem with overzealous idiots, however it also introduces some new problems, which may not be immediately obvious:
  • The infobox doesn't explain why there are two columns in it, and what do they mean. Of course it is mentioned in the lead of the article, however as a rule an infobox must be self-sufficient and not rely on the surrounding text to put it into context.
  • The two-column infobox is confusing to the new readers who come to the page without preliminary knowledge of what the geometric (or negative binomial) distribution is.
  • The infobox is too wide, on low-resolution monitors (or in browsers which are not opened in fullscreen) this infobox breaks the page layout, making it difficult to read. As a general rule, the page should be readable on 1024px width monitor.
  • There is no clear indication in the subsequent text on the page regarding which of two definitions provided is used. For example, the very first section says “the expected value of a geometric distribution is 1/p”... Of course all such issues can be carefully fixed, however it'll either unnecessarily bloat the amount of text on the page, or will leave the reader with impression that not all cases were actually covered.
  • The template for 2-column infobox is not derived from the single-column template. As such, any changes introduced into the single-column template will not be applied to the 2-column template, which will gradually lead to style inconsistencies (well, actually it already have lead).
Now it seems to me that an alternative solution can be used to deter the self-confident editors: use a standard 1-column infobox, but put a nice noticeable warning in bold font right at the top of the infobox; a warning about the 2 (or more) alternative definitions.  … stpasha »  08:56, 13 December 2009 (UTC)[reply]

Revert comment

I changed Ordinary least squares/Proofs to a redirect as part of an effort to clean up math articles with titles that do not conform to Wikipedia naming conventions. See the talk page for that article for more information. I apologize if I was hasty in my evaluation of the content of the article, However, I don't appreciate my change being marked as vandalism when it wasn't. In the future, please WP:Assume good faith.--RDBury (talk) 15:37, 16 January 2010 (UTC)[reply]

I'm sorry for misunderstanding your intentions; however it is my point of view that the contents of the article should be kept, and I gave my reasons at the article’s talk page.  … stpasha »  17:56, 16 January 2010 (UTC)[reply]

Negative binomial distribution

You said:

Please, before reverting the edit and asserting that I “completely ignored the discussion on the talk page”, make sure you have read that discussion. There was a 1-month old proposal (section Major Changes) to simplify the exposition down to only 1 main definition, and that proposal was met with a (cautious) support.
The main reason why we had 2-column infobox is because readers were frequently confused when they seen apparent discrepancies between the article and their textbooks. Such confusion can be avoided either by bloating the infobox (and actually there are more than 2 possible definitions), or by including a very noticeable alertbox, warning about potential discrepancies when comparing this info to existing textbooks, which is the way we are dealing with the problem right now. stpasha » 18:55, 19 January 2010 (UTC)[reply]

WP:EP says:

Be cautious with major changes: consider discussing them first. With large proposed deletions or replacements, it may be best to suggest changes in a discussion, to prevent edit warring and disillusioning either other editors or yourself (if your hard work is rejected by others). One person's improvement is another's desecration, and nobody likes to see their work "destroyed" without prior notice. If you choose to be very bold, take extra care to justify your changes in detail on the article talk page. This will make it less likely that editors will end up reverting the article back and forth between their preferred versions.

Whatever discussion there was was not explicitly about the types of changes you have made, which should have been proposed in detail. Melcombe (talk) 10:21, 20 January 2010 (UTC)[reply]

Sigma algebra too technical?

Hi Stpasha, please see talk:Sigma-algebra#Too technical?, thanks. Paul August 15:18, 20 January 2010 (UTC)[reply]

law of the unconscious statistician

Looks like I was unconscious when I added it. It was in the wrong place. it is that if Y = g(x) then E(Y) = E(g(x)) = int(g(x)f(x)dx) where f(x) is the pdf. 018 (talk) 19:00, 20 January 2010 (UTC)[reply]

Your Proof is wrong !!!

In the consistency of maximum likelihood, you put

should be

which dose not support your conclusion !!!

I knew the 4 conditions is more general in supporting the conclusion, but i don't think it intuitive enough for people to understand the proof.

I think my previous is making more sense. —Preceding unsigned comment added by Xappppp (talkcontribs) 06:52, 25 January 2010 (UTC)[reply]

The proof is correct. We denote
and its probability limit when n → ∞ as
 … stpasha »  07:46, 25 January 2010 (UTC)[reply]
Under your notation, still the consistency is not clear in your proof. Xappppp (talk) 15:17, 25 January 2010 (UTC)[reply]
Because this is not the proof of consistency, only its most crucial part. The proof given in "Consistency" section establishes a remarkable fact that if the model is identified, then the likelihood function will have a unique maximum. From that point on, the consistency follows from the general theory for extremum estimators.  … stpasha »  06:54, 26 January 2010 (UTC)[reply]

MLE and Bayes: Le Cam and Ferguson

Dear Stpasha,

You reversed an edit of mine, saying that Bayes estimators had better asymptotic properties than mle, with the statement that my sources did support that statement. This statement was unfortunate because Le Cam and Ferguson do support that statement, as I'll document.

My text referred to Le Cam---not Le Cam and Yang. Le Cam's 1986 book discusses Bayesian estimators e.g. in Chapter 17.7 and then goes from wine to dish-water with mle, showing that (no "the" btw) "mle is not that trustworthy" (page 623). These examples can also be found in the ISI article I listed, which is available in JSTOR.

I believe that you are referring to the second edition of Le Cam & Yang, which has an example of bad behavior of Bayes, however using the axiom of choice's implication that the real line is the union of a set of measure zero and another set meager in the Baire categoric sense. (This would be relevant to the discussion if the mle had better behavior, and I would be very surprised if Le Cam had any statement like that!)

I believe Le Cam comments that the log-normal distribution wasn't a mathematical demon conjured to bedevil Ronald Fisher!

Chapter 21 of Ferguson discusses the asymptotic normality of posterior distributions, and the Bernstein von-Mises Theorem shows "something slightly stronger", about convergence of densities. Exercise 2 in that chapter is the uniform distribution on the open interval, where no mle exists, but Bayes has no trouble, but he doesn't spell out this triviality. Ferguson does state that the mle is asymptotically equivalen to Bayes, when both exists. Ferguson gives examples where mle doesn't. Ferguson's treatment of the mle follows Cramér --- if you insist on "maximum" likelihood estimation, you can only prove weaker results.

Kiefer.Wolfowitz (talk) 21:24, 26 January 2010 (UTC)[reply]

I'm sorry for this misunderstanding from my part. The reference said “Le Cam”, and I immediately assumed that it refers to that Le Cam’s book that I have >.<
Now regarding the "MLE is not that trustworthy" comment of Le Cam. (Haven't had read it myself), I bet what he really means is that MLE is not really a panacea for every occasion. For example, the consistency theorem contains 1 substantial and 3 technical conditions, and one give examples of models where those conditions are violated, rendering MLE inapplicable. And of course the estimators for those models can still be constructed (they can even be the Bayes estimates).
However I don't think there is a single method which works in every single case. And I don't think that Bayesian methods are "superior" to MLE, meaning that they work within a strictly weaker set of assumptions. For once, they require additional knowledge of the prior, and the properties of Bayesian estimator will depend upon whether the prior was good or not…
In any case, the possible comparison of MLE with other methods can be done in its own section, but probably shouldn't be interspersed in the main text.  … stpasha »  07:40, 27 January 2010 (UTC)[reply]
Dear Stpasha:
Thank you for your understanding. However, you are mis-informed about Bayesian estimators (at least in terms of their asymptotic properties for finite-dimensional problems, which were the only place I discussed them). The Bernstein-von Mises (Laplace) theorem concludes that the posterior asymptotics are independent of the prior (that is positive on the parameter space, obviously).
(This is why Laplace was able to reduce asymptotics to the Fisher (sic.) information!)
I forget the Freedman-Diaconis example sketched in Le Cam and Yang, but I believe it is infinite dimensional.
LeCam proved a theorem that it suffices to start with any consistent estimator and take one Newton step to recover the (first-order) asymptotics of mle (Ferguson). Such results falsify the article's uniqueness claims, which are widely believed, despite 50 years of counter-examples.
A remaining problem is to check whether the article's asymptotic results are proved for only consistent sequences of zeros of the score function, although the results seem to be stated for only maxima (certainly in the introduction).
Best regards, Kiefer.Wolfowitz (talk) 12:27, 27 January 2010 (UTC)[reply]

Please do not undo changes made to the Binomial distribution

I recently looked up the binomial formula on the corresponding Wikipedia page, and had to correct the formula. I quickly looked through the page history, and discovered you had been the last person to make the following change to the formula.

In the notation used in the page discussing the binomial distribution; the distribution describes the probability of having n "successes" after k trials. Thus the probability should read n choose k times p^n * (1-p)^(k-n). The formula after to edited it read: n choose k times p^k * (1-p)^(k-n).

Just briefly, the logic behind the formula is very simple. The binomial distribution can be conceptually thought of as a series of k independent trials, with a probability p for success. p(k;n) describes the probability of having n successes after k trials. Since the order of the failures/successes do not matter (just the total number of trials, and successes), the factor of n choose k is there to account for all the possible orderings I can have of successes. Finally the probability of having a specific ordering of n successes after k trials, is the probability of an individual success raised to the n power (that is p^n) times the probability of having k-n failures, which is (1-p)^(k-n).

Sorry for the wording and brevity, but I'm in a hurry (I was working on homework). Hopefully this make sense, if not go through some thought experiments (think about what trends you expect for the formula for simple examples like p = 0.9999 or p = 0.00001 or N -> infinity) -BOB