# Activity selection problem

The activity selection problem is a combinatorial optimization problem concerning the selection of non-conflicting activities to perform within a given time frame, given a set of activities each marked by a start time (si) and finish time (fi). The problem is to select the maximum number of activities that can be performed by a single person or machine, assuming that a person can only work on a single activity at a time.

A classic application of this problem is in scheduling a room for multiple competing events, each having its own time requirements (start and end time), and many more arise within the framework of operations research.

## Formal definition

Assume there exist n activities with each of them being represented by a start time si and finish time fi. Two activities i and j are said to be non-conflicting if sifj or sjfi. The activity selection problem consists in finding the maximal solution set (S) of non-conflicting activities, or more precisely there must exist no solution set S' such that |S'| > |S| in the case that multiple maximal solutions have equal sizes.

## Optimal Solution

The activity selection problem is notable in that using a greedy algorithm to find a solution will always result in an optimal solution. A pseudocode sketch of the iterative version of the algorithm and a proof of the optimality of its result are included below.

### Algorithm

 1 Greedy-Iterative-Activity-Selector(A, s, f):
2
3     Sort A by finish times stored in f'
4
5     S = {A[1]}
6     k = 1
7
8     n = A.length
9
10     for i = 2 to n:
11        if s[i] ≥ f[k]:
12            S = S U {A[i]}
13            k = i
14
15     return S


#### Explanation

Line 1: This algorithm is called Greedy-Iterative-Activity-Selector, because it is first of all a greedy algorithm, and then it is iterative. There's also a recursive version of this greedy algorithm.

• ${\displaystyle A}$ is an array containing the activities.
• ${\displaystyle s}$ is an array containing the start times of the activities in ${\displaystyle A}$.
• ${\displaystyle f}$ is an array containing the finish times of the activities in ${\displaystyle A}$.

Note that these arrays are indexed starting from 1 up to the length of the corresponding array.

Line 3: Sorts in increasing order of finish times the array of activities ${\displaystyle A}$ by using the finish times stored in the array ${\displaystyle f}$. This operation can be done in ${\displaystyle O(n\cdot \log _{2}n)}$ time, using for example merge sort, heap sort, or quick sort algorithms.

Line 5: Creates a set ${\displaystyle S}$ to store the selected activities, and initialises it with the first activity ${\displaystyle A[1]}$. Note that, since the ${\displaystyle A}$ has already been sorted according to the finish times in ${\displaystyle f}$, ${\displaystyle A[1]}$ is the activity with the smallest finish time.

Line 6: Creates a variable ${\displaystyle k}$ that keeps track of the index of the last selected activity.

Line 10: Starts iterating from the second element of that array ${\displaystyle A}$ up to its last element.

Line 11: If the start time ${\displaystyle s[i]}$ of the ${\displaystyle ith}$ activity (${\displaystyle A[i]}$) is greater or equal to the finish time ${\displaystyle f[k]}$ of the last selected activity (${\displaystyle A[k]}$), then ${\displaystyle A[i]}$ is compatible to the selected activities in the set ${\displaystyle S}$, and thus it can be added to ${\displaystyle S}$; this is what is done in line 12.

Line 13: The index of the last selected activity is updated to the just added activity ${\displaystyle A[i]}$.

### Proof of optimality

Let ${\displaystyle S=\{1,2,\ldots ,n\}}$ be the set of activities ordered by finish time. Thus activity 1 has the earliest finish time.

Suppose A is a subset of S is an optimal solution and let activities in A be ordered by finish time. Suppose that the first activity in A is k ≠ 1, that is, this optimal solution does not start with the "greedy choice." We want to show that there is another solution B that begins with the greedy choice, activity 1.

Let ${\displaystyle B=(A\setminus \{k\})\cup \{1\}}$. Because ${\displaystyle f_{1}\leq f_{k}}$, the activities in B are disjoint and since B has same number of activities as A, i.e., |A| = |B|, B is also optimal.

Once the greedy choice is made, the problem reduces to finding an optimal solution for the subproblem. If A is an optimal solution to the original problem S, then ${\displaystyle A^{\prime }=A\setminus \{1\}}$ is an optimal solution to the activity-selection problem ${\displaystyle S'=\{i\in S:s_{i}\geq f_{1}\}}$.

Why? If we could find a solution B′ to S′ with more activities then A′, adding 1 to B′ would yield a solution B to S with more activities than A, contradicting the optimality.

### Weighted Activity Selection Problem

The generalized version of the activity selection problem involves selecting an optimal set of non-overlapping activities such that the total weight is maximized. Unlike the unweighted version, there is no greedy solution to the weighted activity selection problem. However, a dynamic programming solution can readily be formed using the following approach:[1]

Consider an optimal solution containing activity ${\displaystyle k}$. We now have non-overlapping activities on the left and right of ${\displaystyle k}$. We can recursively find solutions for these two sets because of optimal sub-structure. As we don't know ${\displaystyle k}$, we can try each of the activities. This approach leads to an ${\displaystyle O(n^{3})}$ solution. This can be optimized further considering that for each set of activities in ${\displaystyle (i,j)}$, we can find the optimal solution if we had known the solution for ${\displaystyle (i,t)}$, where ${\displaystyle t}$ is the last non-overlapping interval with ${\displaystyle j}$ in ${\displaystyle (i,j)}$. This yields an ${\displaystyle O(n^{2})}$ solution. This can be further optimized considering the fact that we do not need to consider all ranges ${\displaystyle (i,j)}$ but instead just ${\displaystyle (1,j)}$. The following algorithm thus yields an ${\displaystyle O(nlog(n))}$ solution:

 1 Weighted-Activity-Selection(S):  // S = list of activities
2
3     sort S by finish time
4     opt[0] = 0
5
6     for i = 1 to n:
7         t = binary search to find activity with finish time <= start time for i
8         opt[i] = MAX(opt[i-1], opt[t] + w(i))
9
10    return opt[n]