# Talk:Cross entropy

WikiProject Statistics (Rated Start-class, Low-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Start  This article has been rated as Start-Class on the quality scale.
Low  This article has been rated as Low-importance on the importance scale.
WikiProject Mathematics (Rated Start-class, Low-importance)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 Start Class
 Low Importance
Field: Probability and statistics
WikiProject Physics (Rated Start-class, Low-importance)
This article is within the scope of WikiProject Physics, a collaborative effort to improve the coverage of Physics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start  This article has been rated as Start-Class on the project's quality scale.
Low  This article has been rated as Low-importance on the project's importance scale.

## Untitled

This article uses the notation KL(p, q) and also DKL(p || m) when talking about Kullback-Leibler divergence. Are these notations two ways of expressing the same idea? If so, the article may want to indicate this equivalence.

The log-likelihood of the training data for a multinomial model is the same as the cross-entropy of the data. (Elements of Statistical Learning, page 32)

L(theta) = sum (all classes k) I(G=k) log Pr(G=k | X = x)

I guess "I(G=k)" is p and Pr(G=k | X=x) is q here.