= Dwell time (information retrieval) =

In information retrieval, dwell time denotes the time which a user spends viewing a document or other piece of content after clicking a link on a search engine results page (SERP) or receiving it as part of a "feed" in environments like Instagram or TikTok. Formal descriptions of modern dwell time systems first appear in the literature in early 2010 patent filings. The term "dwell time," however, was not in common use until a year later, having been popularized by Duane Forrester (a Senior Project Manager at Bing) in 2011. The term gained popularity in the multimedia library context in 2012 when it was adopted as a primary ranking coefficient in the then-new 2012 YouTube algorithm.

Note that notation norms in expressing dwell time concepts mathematically vary, particularly when comparing private sector "white paper" output from sources like Microsoft Research or Google/YouTube with peer-reviewed academic work.

== Basics of Dwell Time and Early Modeling Approaches ==
Dwell time is the duration between when a user clicks on a search engine result or is served a piece of content and when the user returns from or abandons that piece of content. It is a relevance indicator of the search result or content presented correctly satisfying the intent of the user. Short dwell times indicate the user's query intent was not satisfied by viewing the result. Long dwell times indicate the user's query intent was satisfied. Google has used dwell time in page ranking and YouTube adopted dwell time as its dominant ranking coefficient in 2012.

Implementations of the dwell time concept vary and are often proprietary (or guarded as trade secrets), but in academia researchers have shared various implementations. Among the earliest well-documented elucidations of dwell time is this one from Karl T. Muth's February 2010 seminar at the University of Chicago:

$R_A(i) = \sum_{j=1}^{n} \left( \frac{D_{ij}}{L_i} \cdot \alpha_j \right) + \beta_R$

where $D_{ij}$ represents attention units spent by user $j$ on media item $i$. The variable $L_i$ is the total duration of the media item, creating a "completion ratio," while $\alpha_j$ is a simple weight assigned to the user's historical retention patterns. Finally, $\beta_R$ is the contextual relevance of the item to the search query, which dwell time as adopted by YouTube in 2012 and TikTok in 2016 used to optimize $t_x \to t_{x+1} \to t_{x+n}$ relevance iteratively using observed user experience data.

== Statistical Modeling of Dwell Time ==
Dwell time distributions observed in web search logs are strictly positive and highly right-skewed, meaning users most frequently exhibit very short reading times, with a long tail of users staying for extended periods. Rather than following a normal distribution, empirical studies in information retrieval demonstrate that dwell time $t$ is more accurately modeled using heavy-tailed distributions, such as the Weibull distribution or Gamma distribution.

In a Weibull model, the probability density function for a user's dwell time is given by:

$f(t; k, \lambda) = \frac{k}{\lambda} \left(\frac{t}{\lambda}\right)^{k-1} e^{-(t/\lambda)^k}$

where $k > 0$ is the shape parameter and $\lambda > 0$ is the scale parameter. Research has demonstrated that the shape parameter $k$ can reliably distinguish between different types of user tasks. For instance, informational queries exhibit different decay rates in user attention than navigational queries, allowing search algorithms to mathematically infer user intent based on the shape of the dwell time distribution.

== Integration into Probabilistic Click Models ==
Standard click models, such as the Position-Based Model (PBM) or the Cascade Model, traditionally treat all clicks uniformly (a click is either present or absent). To increase mathematical rigor in relevance estimation, modern algorithmic models introduce dwell time as a continuous variable to calculate the probability of user satisfaction $S$ given a click $C$.

A fundamental threshold-based approach defines a "satisfactory click" (often termed a "long click") if the dwell time $t$ exceeds a query-dependent threshold $\tau$:

$P(S=1 | C=1, t) = \begin{cases} 1 & \text{if } t \ge \tau \\ 0 & \text{if } t < \tau \end{cases}$

Because a strict binary threshold fails to capture the nuance of moderate dwell times, more advanced frameworks, such as the Time-Aware Click Model (TCM), treat the probability of document relevance as a continuous function of time. These models utilize a cumulative distribution function (CDF) to assign a fractional relevance score to documents:

$P(S=1 | C=1, t) = 1 - e^{-\alpha t}$

where $\alpha$ is a decay constant tuned to the specific search vertical or document length. By factoring in $t$ as an exponential decay function, search algorithms can mathematically penalize "pogo-sticking" behavior (where a user rapidly transitions back to the SERP or "home feed") while proportionately rewarding content elements or media items that retain user attention.
