Talk:Bayesian information criterion: Difference between revisions

Content deleted Content added

Inline

Revision as of 23:42, 3 January 2011

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
???	This article has not yet received a rating on Wikipedia's content assessment scale.
???	This article has not yet received a rating on the importance scale.

Mathematics Start‑class Mid‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
Start	This article has been rated as Start-class on Wikipedia's content assessment scale.
Mid	This article has been rated as Mid-priority on the project's priority scale.

SIC vs BIC

As far as I know, SC and BIC are different. What is described is BIC.

The formula given matches that derived by Schwarz (1978)

There does some to be some confusion here though, as e.g. Bengtsson and Cavanaugh define SIC as given here, and BIC differently.

Can anyone provide some authoritative references to the modern use of the terms BIC and SIC?

--Ged.R 15:18, 22 January 2007 (UTC)[reply]

Should "SIC" in the first formula be "BIC"? The acronym "SIC" is never defined in the article.

The first formula is bizarre. geez. "The formula for the BIC is exp(-SIC/2) ???" What's up with that?

The entire article sounds like it is written by wannabe experts who aren't exactly certain of what they're talking about. Read this from the perspective of someone who is trying to ascertain the basic formula for BIC. (That is my situation. I have a software package that purports to calculate AIC and BIC scores but it (the software package) doesn't say exactly what it is calculating. Neither does this article. It introduces all these terms, xbar, etc and never says what they are. A "constant becoming trivial" is a pretty weird notion if you ask me. This could potentially be a very important article as statistical model selection invades more and more fields but as it currently is it is useless as a first approximation. You should keep the constants that become trivial. An article written by people who know this stuff, sort of, for people who also sort of know this stuff is useless.

81.231.127.12 (talk) 22:09, 18 December 2010 (UTC)[reply]

Schwartz criterion

I would like to redirect Schwartz criterion to Schwartz set rather than here, since this is a term used in voting theory. Is there any objection to this? It seems to me that it only redirects here in case of a spelling mistake. CRGreathouse 02:21, 20 July 2006 (UTC)[reply]

Well, unfortunately this is a very frequent spelling mistake. There are loads of books who use the wrong spelling. And the Schwarz Bayesian IC is rather important afaik. I'd prefer to leave the redirection like this or to create a disambiguation page... Gtx, Frank1101 11:00, 20 July 2006 (UTC)[reply]

From what I can tell (and what I was taught in my MS in stats program) BIC is the more common moniker for this. I think the article should reflect this. --Chrispounds 00:51, 29 October 2006 (UTC)[reply]

I agree with Chrispounds, and so does google:

"Bayesian information criterion": 110,000
"Schwarz criterion": 59,600
"Schwarz information criterion": 15,400
"Schwartz criterion": 11,900
"Schwartz information criterion": 531

Any objection to the article being renamed to "Bayesian information criterion", replacing the current redirect with no history? John Vandenberg 07:24, 31 October 2006 (UTC)[reply]

I have made a proposal to reduce the confusion between Schwartz set and Schwarz criterion on Talk:Schwartz set#Schwarz criterion. John Vandenberg 01:14, 9 November 2006 (UTC)[reply]

Linear Model expression

The second formula:

Under the assumption that the model errors or disturbances are normally distributed, this becomes:
 $\mathrm {SIC} =n\ln \left({\mathrm {RSS}  \over n}\right)+k\ln(n).\$

seems wrong to me, $-2\cdot \ln {L}=\left({\mathrm {RSS} \over \sigma ^{2}}\right)$ , right? And not $n\ln \left({\mathrm {RSS} \over n}\right)$ as stated here. --Ged.R 15:18, 22 January 2007 (UTC)[reply]

--

n\ln \left({\mathrm {RSS}  \over n}\right)

is correct because we are dealing with the maximized likelihood. For a linear model, we have

{\hat {\sigma }}^{2}={\mathrm {RSS}  \over n}

. The loglikelihood is of the form:

l(\beta ,\sigma ^{2};Y)=-{\frac {n}{2}}\log(2\pi \sigma ^{2})-{\frac {1}{2\sigma ^{2}}}\sum _{i=1}^{n}\varepsilon _{i}^{2}

Evaluating this at the maximum likelihood estimates for

\sigma ^{2}

and

\beta

, we obtain:

l({\hat {\beta }},{\hat {\sigma }}^{2};Y)=-{\frac {n}{2}}\log(2\pi {\hat {\sigma }}^{2})-{\frac {1}{2{\hat {\sigma }}^{2}}}\sum _{i=1}^{n}{\hat {\varepsilon }}_{i}^{2}

=-{\frac {n}{2}}\log(2\pi {\mathrm {RSS}  \over n})-{\frac {1}{2}}{\frac {n}{\mathrm {RSS} }}\sum _{i=1}^{n}{\hat {\varepsilon }}_{i}^{2}

=-{\frac {n}{2}}\log(2\pi {\mathrm {RSS}  \over n})-{\frac {1}{2}}{\frac {n}{\mathrm {RSS} }}\mathrm {RSS}

=-{\frac {n}{2}}\log(2\pi {\mathrm {RSS}  \over n})-{\frac {n}{2}}

This gives the above expression for

-2\cdot \ln {L}

, up to an additive constant that depends only on

n

.

Wolf87 (talk) 01:40, 5 October 2008 (UTC)[reply]

Does anybody else find it fishy that the BIC here depends on the scaling of the data? Actually, wouldn't this be a problem using the likelihood function of any continuous domain probability distribution? —Preceding unsigned comment added by 216.15.124.160 (talk) 02:09, 6 December 2008 (UTC)[reply]

BIC does not depend upon the scaling of the data. BIC is defined only up to an additive constant that will be the same across all models being compared; that constant incorporates the scaling (at least in the linear model case given above) because any scaling factors come out as additive constants from the $\log(2\pi {\mathrm {RSS} \over n})$ term. --Wolf87 (talk) 21:02, 14 March 2009 (UTC)[reply]

Bayesian?

This seems to be rather unbayesian, notably in the use of maximum likelihood, no prior distribution, the absence of any integration, and more. Compare this with Bayes factor. --Henrygb 17:08, 15 March 2007 (UTC)[reply]

It's Bayesian to the extent that it represents an approximation to integrating over the detailed parameters of the model (which are assumed to have a flat prior), to give the marginal likelihood for the model as a whole. The argument is that in the limit of infinite data, the BIC would approach the Bayesian marginal likelihood. That contrasts with the Akaike criterion, attempts to find the most probable model parametrisation, rather than the most probable model. It also contrasts with frequestists, who cannot integrate over nuisance parameters to compute marginal likelihoods.

But I'd agree, the article should spell out much more clearly how, exactly, BIC is an approximation to the Bayesian marginal likelihood. Jheald 18:00, 15 March 2007 (UTC).[reply]

What is the "dependent variable"?

I am confused by this sentence:

"It is important to keep in mind that the BIC can be used to compare estimated models only when the numerical values of the dependent variable are identical for all estimates being compared."

What is the dependent variable? And it unclear why "variable" is singular and everything else is plural.

Imran 09:02, 12 April 2007 (UTC)[reply]

This formula for BIC may potentially confuse people who read the AIC entry.

The version of BIC as described here is not compatible with the definition of AIC in wikipedia. There is a divisor n stated with BIC, but not AIC in the Wikipedia entries. It would save confusion if they were consistently defined!

I would favour not dividing by n: i.e.

BIC = -2log L + k ln(n)

AIC = -2log L + 2k

One can then clearly compare the two, and see they are similar for small n, but BIC favours more parsiminious models for large n. —The preceding unsigned comment was added by 128.243.220.42 (talk) 13:47, 10 May 2007 (UTC).[reply]

In fact I have noticed that the formula was only changed recently on 21st April, 2007. It really needs changing back I think to what it was before!

--

I also believe that the definition without n is more common. See for example http://xxx.adelaide.edu.au/pdf/astro-ph/0701113, which gives a lucid, accessible review and comparison of the AIC, AICc, BIC and Deviance Information Criterion (DIC).

Every paper I have seen has it without the n.

The standard simplification for using $\chi ^{2}$ for model selection has been pointed out above, namely that $-2\log L=\chi ^{2}$ . I think this is worth including on the page, as I had to go look in several journal articles to satisfy myself that this is the proper definition of log-likelihood.

Velocidex 12:54, 25 June 2007 (UTC)[reply]

Definition of L

Hi,

Is L in the formula for the BIC really the log-likelihood? It seems to me that L is the likelihood, s.t. ln L would be the log-likelihood and the -2 ln L term is the same term as in the AIC. Am I missing something?

Mpas76 01:05, 17 October 2007 (UTC)[reply]

I think you are right. L is the likelihood function and -2*ln(L) is the same as that in AIC formula. —Preceding unsigned comment added by Shaohuawu (talk • contribs) 16:37, 27 October 2007 (UTC)[reply]

Error variance

Possibly I haven't understood this properly, but surely the formula for so-called 'error variance' in the article is wrong:

${\hat {\sigma _{e}^{2}}}={\frac {1}{n}}\sum _{i=1}^{n}(x_{i}-{\overline {x}})^{2}$

If the x's are datapoints this appears to be the variance of the datapoints, whereas what we want is something like the RSS of the AIC article, presumably the mean squared error: ${\hat {\sigma _{e}^{2}}}={\frac {1}{n}}\sum _{i=1}^{n}(x_{i}-{\hat {x_{i}}})^{2}$

93.96.236.8 (talk) 16:42, 11 September 2010 (UTC)[reply]

@@ Line 9: / Line 9: @@
 :--[[User:Ged.R|Ged.R]] 15:18, 22 January 2007 (UTC)
 :Should "SIC" in the first formula be "BIC"? The acronym "SIC" is never defined in the article.
+The first formula is bizarre. geez. "The formula for the BIC is exp(-SIC/2) ???" What's up with that?
+The entire article sounds like it is written by wannabe experts who aren't exactly certain of what they're talking about. Read this from the perspective of someone who is trying to ascertain the basic formula for BIC. (That is my situation. I have a software package that purports to calculate AIC and BIC scores but it (the software package) doesn't say exactly what it is calculating. Neither does this article. It introduces all these terms, xbar, etc and never says what they are. A "constant becoming trivial" is a pretty weird notion if you ask me. This could potentially be a very important article as statistical model selection invades more and more fields but as it currently is it is useless as a first approximation. You should keep the constants that become trivial. An article written by people who know this stuff, sort of, for people who also sort of know this stuff is useless.
 :[[Special:Contributions/81.231.127.12|81.231.127.12]] ([[User talk:81.231.127.12|talk]]) 22:09, 18 December 2010 (UTC)