Al-Nuwayri Article Evaluation:

This stub-class article could benefit from a great deal more editing; in general, the article gives the bare minimum about Al-Nuwayri's life, and for what little is provided, no sources are cited. Although there are sources in the citations list, these sources are not linked to the article's claims, and one cannot follow the links for confirmation. Though the information in the article is not dated (since it's a historical biography), it is correct–however, it is not easy to check this correctness. Additionally, the talk page is empty, meaning that very few Wikipedians are interested in improving this article.

Article Editing Selection

Option One: Al-Nuwayri---the article is currently a stub, so this would make sense in the context of the course (and especially given the instructor)
Option Two: scATAC-seq Bioinformatics Assays---an area where perhaps I have more specialist information than many other
SGLD -- No article on this yet

Stochastic Gradient Langevin Dynamics

Stochastic Gradient Langevin Dynamics (abbreviated as SGLD), is an optimization technique composed of characteristics from Stochastic Gradient Descent, a Robbins-Monro optimization algorithm, and Langevin Dynamics, a mathematical extension of molecular dynamics models. Like Stochastic Gradient Descent, SGLD is an iterative optimization algorithm which introduces additional noise to the stochastic gradient estimator used in SGD to optimize a differentiable objective function.^[1] Unlike traditional SGD, SGLD can be used for Bayesian learning, since the method produces samples from a posterior distribution of parameters based on available data. First described by Welling and Teh in 2011, the method has applications in many algorithms which require optimization, and is most notably applied in machine learning problems.

Formal Definition

Given some parameter vector $\theta$ , its prior distribution $p(\theta )$ , and a set of data points $X=\left\{x_{i}\right\}_{i=1}^{N}$ , Stochastic Gradient Langevin dynamics samples from the posterior distribution $p(\theta |X)\propto p(\theta )\prod _{i=1}^{N}p\left(x_{i}|\theta \right)$ by updating the chain:

$\Delta \theta _{t}={\frac {\epsilon _{t}}{2}}\left(\nabla \log p\left(\theta _{t}\right)+\sum _{i=1}^{N}\nabla \log p\left(x_{i}|\theta _{t}\right)\right)+\eta _{t}$

where $\eta _{t}\sim N(0,\epsilon _{t})$ is Gaussian noise and our step sizes $\epsilon _{t}$ satisfy the following conditions:

$\sum _{t=1}^{\infty }\epsilon _{t}=\infty \quad \sum _{t=1}^{\infty }\epsilon _{t}^{2}<\infty$

For early iterations of the algorithm, each parameter update mimics Stochastic Gradient Descent; however, as the algorithm approaches a local minima or maxima, the gradient shrinks to zero and the chain produces samples surrounding the maximum a posteriori mode allowing for posterior inference.

Application

SGDL is applicable in any optimization context for which it is desirable to quickly obtain posterior samples instead of a maximum a posteriori mode. In doing so, the method maintains the computational efficiency of stochastic gradient descent when compared to traditional gradient descent while providing additional information regarding the landscape around the critical point of the objective function. In practice, SGLD can be applied to the training of Neural Networks in Deep Learning, a task in which the method provides a distribution over model parameters. By introducing information about the variance of these parameters, SGLD provides a method by which to characterizes the generalizability of these models at certain points in training.^[2] Additionally, obtaining samples from a posterior distribution permits uncertainty quantification by means of confidence intervals, a feature which is not possible using traditional stochastic gradient descent.

User:Zachary kaplan/sandbox

Al-Nuwayri Article Evaluation:

Article Editing Selection

Stochastic Gradient Langevin Dynamics

Formal Definition

Application

Further Reading