Talk:Bayesian information criterion: Difference between revisions

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 01:48, 5 October 2008

SIC vs BIC

As far as I know, SC and BIC are different. What is described is BIC.

The formula given matches that derived by Schwarz (1978)

There does some to be some confusion here though, as e.g. Bengtsson and Cavanaugh define SIC as given here, and BIC differently.

Can anyone provide some authoritative references to the modern use of the terms BIC and SIC?

--Ged.R 15:18, 22 January 2007 (UTC)[reply]

Schwartz criterion

I would like to redirect Schwartz criterion to Schwartz set rather than here, since this is a term used in voting theory. Is there any objection to this? It seems to me that it only redirects here in case of a spelling mistake. CRGreathouse 02:21, 20 July 2006 (UTC)[reply]

Well, unfortunately this is a very frequent spelling mistake. There are loads of books who use the wrong spelling. And the Schwarz Bayesian IC is rather important afaik. I'd prefer to leave the redirection like this or to create a disambiguation page... Gtx, Frank1101 11:00, 20 July 2006 (UTC)[reply]

From what I can tell (and what I was taught in my MS in stats program) BIC is the more common moniker for this. I think the article should reflect this. --Chrispounds 00:51, 29 October 2006 (UTC)[reply]

I agree with Chrispounds, and so does google:

"Bayesian information criterion": 110,000
"Schwarz criterion": 59,600
"Schwarz information criterion": 15,400
"Schwartz criterion": 11,900
"Schwartz information criterion": 531

Any objection to the article being renamed to "Bayesian information criterion", replacing the current redirect with no history? John Vandenberg 07:24, 31 October 2006 (UTC)[reply]

I have made a proposal to reduce the confusion between Schwartz set and Schwarz criterion on Talk:Schwartz set#Schwarz criterion. John Vandenberg 01:14, 9 November 2006 (UTC)[reply]

Linear Model expression

The second formula:

Under the assumption that the model errors or disturbances are normally distributed, this becomes:
 $\mathrm {SIC} =n\ln \left({\mathrm {RSS}  \over n}\right)+k\ln(n).\$

seems wrong to me, $-2\cdot \ln {L}=\left({\mathrm {RSS} \over \sigma ^{2}}\right)$ , right? And not $n\ln \left({\mathrm {RSS} \over n}\right)$ as stated here. --Ged.R 15:18, 22 January 2007 (UTC)[reply]

--

n\ln \left({\mathrm {RSS}  \over n}\right)

is correct because we are dealing with the maximized likelihood. For a linear model, we have

{\hat {\sigma }}^{2}={\mathrm {RSS}  \over n}

. The loglikelihood is of the form:

l(\beta ,\sigma ^{2};Y)=-{\frac {n}{2}}\log(2\pi \sigma ^{2})-{\frac {1}{2\sigma ^{2}}}\sum _{i=1}^{n}\varepsilon _{i}^{2}

Evaluating this at the maximum likelihood estimates for

\sigma ^{2}

and

\beta

, we obtain:

l({\hat {\beta }},{\hat {\sigma }}^{2};Y)=-{\frac {n}{2}}\log(2\pi {\hat {\sigma }}^{2})-{\frac {1}{2{\hat {\sigma }}^{2}}}\sum _{i=1}^{n}{\hat {\varepsilon }}_{i}^{2}

=-{\frac {n}{2}}\log(2\pi {\mathrm {RSS}  \over n})-{\frac {1}{2}}{\frac {n}{\mathrm {RSS} }}\sum _{i=1}^{n}{\hat {\varepsilon }}_{i}^{2}

=-{\frac {n}{2}}\log(2\pi {\mathrm {RSS}  \over n})-{\frac {1}{2}}{\frac {n}{\mathrm {RSS} }}\mathrm {RSS}

=-{\frac {n}{2}}\log(2\pi {\mathrm {RSS}  \over n})-{\frac {n}{2}}

This gives the above expression for

-2\cdot \ln {L}

given previously, up to an additive constant that depends only on

n

.

Wolf87 (talk) 01:40, 5 October 2008 (UTC)[reply]

Bayesian?

This seems to be rather unbayesian, notably in the use of maximum likelihood, no prior distribution, the absence of any integration, and more. Compare this with Bayes factor. --Henrygb 17:08, 15 March 2007 (UTC)[reply]

It's Bayesian to the extent that it represents an approximation to integrating over the detailed parameters of the model (which are assumed to have a flat prior), to give the marginal likelihood for the model as a whole. The argument is that in the limit of infinite data, the BIC would approach the Bayesian marginal likelihood. That contrasts with the Akaike criterion, attempts to find the most probable model parametrisation, rather than the most probable model. It also contrasts with frequestists, who cannot integrate over nuisance parameters to compute marginal likelihoods.

But I'd agree, the article should spell out much more clearly how, exactly, BIC is an approximation to the Bayesian marginal likelihood. Jheald 18:00, 15 March 2007 (UTC).[reply]

What is the "dependent variable"?

I am confused by this sentence:

"It is important to keep in mind that the BIC can be used to compare estimated models only when the numerical values of the dependent variable are identical for all estimates being compared."

What is the dependent variable? And it unclear why "variable" is singular and everything else is plural.

Imran 09:02, 12 April 2007 (UTC)[reply]

This formula for BIC may potentially confuse people who read the AIC entry.

The version of BIC as described here is not compatible with the definition of AIC in wikipedia. There is a divisor n stated with BIC, but not AIC in the Wikipedia entries. It would save confusion if they were consistently defined!

I would favour not dividing by n: i.e.

BIC = -2log L + k ln(n)

AIC = -2log L + 2k

One can then clearly compare the two, and see they are similar for small n, but BIC favours more parsiminious models for large n. —The preceding unsigned comment was added by 128.243.220.42 (talk) 13:47, 10 May 2007 (UTC).[reply]

In fact I have noticed that the formula was only changed recently on 21st April, 2007. It really needs changing back I think to what it was before!

--

I also believe that the definition without n is more common. See for example http://xxx.adelaide.edu.au/pdf/astro-ph/0701113, which gives a lucid, accessible review and comparison of the AIC, AICc, BIC and Deviance Information Criterion (DIC).

Every paper I have seen has it without the n.

The standard simplification for using $\chi ^{2}$ for model selection has been pointed out above, namely that $-2\log L=\chi ^{2}$ . I think this is worth including on the page, as I had to go look in several journal articles to satisfy myself that this is the proper definition of log-likelihood.

Velocidex 12:54, 25 June 2007 (UTC)[reply]

Definition of L

Hi,

Is L in the formula for the BIC really the log-likelihood? It seems to me that L is the likelihood, s.t. ln L would be the log-likelihood and the -2 ln L term is the same term as in the AIC. Am I missing something?

Mpas76 01:05, 17 October 2007 (UTC)[reply]

I think you are right. L is the likelihood function and -2*ln(L) is the same as that in AIC formula. —Preceding unsigned comment added by Shaohuawu (talk • contribs) 16:37, 27 October 2007 (UTC)[reply]

@@ Line 30: / Line 30: @@
 --[[User:Ged.R|Ged.R]] 15:18, 22 January 2007 (UTC)
+--
-<math>n\ln\left({\mathrm{RSS} \over n}\right)</math> is correct because we are dealing with the maximized likelihood. For a linear model, we have <math>\hat{\sigma}^2 = {\mathrm{RSS} \over n}</math>. The loglikelihood is of the form:
+:<math>n\ln\left({\mathrm{RSS} \over n}\right)</math> is correct because we are dealing with the maximized likelihood. For a linear model, we have <math>\hat{\sigma}^2 = {\mathrm{RSS} \over n}</math>. The loglikelihood is of the form:
 :<math>l(\beta, \sigma^2; Y) = -\frac{n}{2} \log (2 \pi \sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^n \varepsilon_i^2</math>
-Evaluating this at the maximum likelihood estimates for <math>\sigma^2</math> and <math>\beta</math>, we obtain:
+:Evaluating this at the maximum likelihood estimates for <math>\sigma^2</math> and <math>\beta</math>, we obtain:
-:<math>l(\hat{\beta}, \hat{\sigma}^2; Y) = -\frac{n}{2} \log (2 \pi \hat{\sigma}^2) - \frac{1}{2\hat{\sigma}^2} \sum_{i=1}^n \hat{\varepsilon}_i^2</math>
+::<math>l(\hat{\beta}, \hat{\sigma}^2; Y) = -\frac{n}{2} \log (2 \pi \hat{\sigma}^2) - \frac{1}{2\hat{\sigma}^2} \sum_{i=1}^n \hat{\varepsilon}_i^2</math>
-:<math>= -\frac{n}{2} \log (2 \pi {\mathrm{RSS} \over n}) - \frac{1}{2} \frac{n}{\mathrm{RSS}} \sum_{i=1}^n \hat{\varepsilon}_i^2</math>
+::<math>= -\frac{n}{2} \log (2 \pi {\mathrm{RSS} \over n}) - \frac{1}{2} \frac{n}{\mathrm{RSS}} \sum_{i=1}^n \hat{\varepsilon}_i^2</math>
-:<math>= -\frac{n}{2} \log (2 \pi {\mathrm{RSS} \over n}) - \frac{1}{2} \frac{n}{\mathrm{RSS}} \mathrm{RSS}</math>
+::<math>= -\frac{n}{2} \log (2 \pi {\mathrm{RSS} \over n}) - \frac{1}{2} \frac{n}{\mathrm{RSS}} \mathrm{RSS}</math>
-:<math>= -\frac{n}{2} \log (2 \pi {\mathrm{RSS} \over n}) - \frac{n}{2}</math>
+::<math>= -\frac{n}{2} \log (2 \pi {\mathrm{RSS} \over n}) - \frac{n}{2}</math>
-This gives the above expression for <math>-2 \cdot \ln{L}</math> given previously, up to an additive constant that depends only on <math>n</math>.
+:This gives the above expression for <math>-2 \cdot \ln{L}</math> given previously, up to an additive constant that depends only on <math>n</math>.
-[[User:Wolf87|Wolf87]] ([[User talk:Wolf87|talk]]) 01:40, 5 October 2008 (UTC)
+:[[User:Wolf87|Wolf87]] ([[User talk:Wolf87|talk]]) 01:40, 5 October 2008 (UTC)
 ==Bayesian?==