# Activity selection problem

The activity selection problem is a mathematical optimization problem concerning the selection of non-conflicting activities to perform within a given time frame, given a set of activities each marked by a start time (si) and finish time (fi). The problem is to select the maximum number of activities that can be performed by a single person or machine, assuming that a person can only work on a single activity at a time.

A classic application of this problem is in scheduling a room for multiple competing events, each having its own time requirements (start and end time), and many more arise within the framework of operations research.

## Formal definition

Assume there exist n activities with each of them being represented by a start time si and finish time fi. Two activities i and j are said to be non-conflicting if sifj or sjfi. The activity selection problem consists in finding the maximal solution set (S) of non-conflicting activities, or more precisely there must exist no solution set S' such that |S'| > |S| in the case that multiple maximal solutions have equal sizes.

## Optimal Solution

The activity selection problem is notable in that using a greedy algorithm to find a solution will always result in an optimal solution. A pseudocode sketch of the algorithm and a proof of the optimality of its result are included below.

### Algorithm

Sort the set of activities by finishing time (f[i])
S = {1}
f = f[1]
for i=2 to n
if s[i] ≥ f
S = S U i
f = f[i]
end if

### Proof of optimality

Let $S = \{1, 2, \ldots , n\}$ be the set of activities ordered by finish time. Thus activity 1 has the earliest finish time.

Suppose A is a subset of S is an optimal solution and let activities in A be ordered by finish time. Suppose that the first activity in A is k ≠ 1, that is, this optimal solution does not start with the "greedy choice." We want to show that there is another solution B that begins with the greedy choice, activity 1.

Let $B = (A \setminus \{k\}) \cup \{1\}$. Because $f_1 \leq f_k$, the activities in B are disjoint and since B has same number of activities as A, i.e., |A| = |B|, B is also optimal.

Once the greedy choice is made, the problem reduces to finding an optimal solution for the subproblem. If A is an optimal solution to the original problem S, then $A^\prime = A \setminus \{1\}$ is an optimal solution to the activity-selection problem $S' = \{i \in S: s_i \geq f_1\}$.

Why? If we could find a solution B′ to S′ with more activities then A′, adding 1 to B′ would yield a solution B to S with more activities than A, contradicting the optimality.

### Weighted Activity Selection Problem

The generalized version of the activity selection problem involves selecting an optimal set of non-overlapping activities such that the total weight is maximized. Unlike the unweighted version, there is no greedy solution to the weighted activity selection problem. However, a dynamic programming solution can readily be formed using the following approach:[1]

Consider an optimal solution containing activity $k$. We now have non-overlapping activities on the left and right of $k$. We can recursively find solutions for these two sets because of optimal sub-structure. As we don't know $k$, we can try each of the activities. This approach leads to an $O(n^3)$ solution. This can be optimized further considering that for each set of activities in $(i, j)$, we can find the optimal solution if we had known the solution for $(i, t)$, where $t$ is the last non-overlapping interval with $j$ in $(i, j)$. This yields an $O(n^2)$ solution. This can be further optimized considering the fact that we do not need to consider all ranges $(i, j)$ but instead just $(1, j)$. The following algorithm thus yields an $O(n log(n))$ solution:

procedure WeightedActivitySelection( S : list of activities )
sort s by finish time
opt[0] = 0
for i = 1 to n
t = binary search to find activity with finish time <= start time for i
opt[i] = MAX(opt[i-1], opt[t] + w(i))
return opt[n]
end procedure