Algorithmic Thinking: A Problem-Based Introduction Homepage for the book Algorithmic Thinking: A Problem-Based Introduction

Solving a Segment Tree Problem, World of World of Warcraft: The Conclusion

Hi everyone,

Last week, we started solving a segment tree problem from the 2010 Canadian Computing Olympiad (CCO). To get up to speed:

  1. Refresh yourself on Chapter 7 of the Algorithmic Thinking book. That’s all the segment tree material we’ll need.
  2. Check out Last week’s post.

Last week, I presented some initial code for solving this thing. We used a segment tree in a cool way to be able to handle each of the three operations quickly, and to respond correctly to the “Q” operations.

You’ll recall though that we were stuck with three issues:

  1. We couldn’t support the “M” (modify rating) operation at all, because we didn’t know how to go from a friend’s name to their current rating. Without their current rating, we couldn’t find that old rating in the tree and zero it out. We need to add something to our code that lets us retrieve the rating for a specified friend; that is, to go from a friend to their rating.

  2. We could only partially support the “Q” operation, because we were outputting the $k$th-highest rating, not the friend with that rating. The problem was that we had no way to go from a rating to the friend with that rating. We need to add something to our code that lets us retrieve the friend that has a specified rating; that is, to go from a rating to the friend with that rating. Wait wait – what if there are multiple friends with the same rating? No, no: the problem description disallows that.

  3. We could only support ratings between 1 and 1 million. We need to support ratings up to 100 million – but we couldn’t, because, the way we were doing it, the segment tree would outstrip the available memory.

Let’s address the first two of these issues first. Then we’ll come back for the third.

Friend to Rating, Rating to Friend

To solve the first two issues, we need a way to go from a friend to their rating, and a way to go from a rating to the friend with that rating. We’re going to add two arrays to do this: friend_to_rating to go from a friend to their rating, and rating_to_friend to go the other way (from a rating to the corresponding friend). We’ll keep these arrays up-to-date whenever a rating changes, and we’ll use them to fully implement the “M” and “Q” operations.

Here’s the updated code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_RATING 1000000
#define MAX_FRIENDS 1000000


typedef struct segtree_node {
  int left, right;
  int num_ratings; // Number of people with ratings in the range left..right
} segtree_node;

void init_segtree(segtree_node segtree[], int node,
                  int left, int right) {
  int mid;
  segtree[node].left = left;
  segtree[node].right = right;
  if (left == right)
    return;
  mid = (left + right) / 2;
  init_segtree(segtree, node * 2, left, mid);
  init_segtree(segtree, node * 2 + 1, mid + 1, right);
}

int query_segtree(segtree_node segtree[], int node, int k) {
  int in_right;

  if (segtree[node].left == segtree[node].right)
    return segtree[node].left;

  in_right = segtree[node * 2 + 1].num_ratings;
  if (in_right >= k)
    return query_segtree(segtree, node * 2 + 1, k);
  else
    return query_segtree(segtree, node * 2, k - in_right);
}

void bump_rating_segtree(segtree_node segtree[], int node, int rating,
                         int up_down) {
  int left_max_index;

  segtree[node].num_ratings += up_down;

  if (segtree[node].left == segtree[node].right)
    return;

  left_max_index = segtree[node * 2].right;
  if (rating <= left_max_index)
    bump_rating_segtree(segtree, node * 2, rating, up_down);
  else
    bump_rating_segtree(segtree, node * 2 + 1, rating, up_down);
}

int main(void) {
  static segtree_node segtree[MAX_RATING * 4 + 1];
  static int friend_to_rating[MAX_FRIENDS + 1];
  static int rating_to_friend[MAX_RATING + 1];
  int n, i, friend, rating, k, old_rating, result;
  char code;

  init_segtree(segtree, 1, 1, MAX_RATING);

  scanf("%d", &n);
  for (i = 0; i < n; i++) {
    scanf(" %c", &code);
    if (code == 'N') { // New friend
      scanf("%d%d", &friend, &rating);
      friend_to_rating[friend] = rating;
      rating_to_friend[rating] = friend;
      bump_rating_segtree(segtree, 1, rating, 1);
    } else if (code == 'M') { // Modify friend rating
      scanf("%d%d", &friend, &rating);
      old_rating = friend_to_rating[friend];
      bump_rating_segtree(segtree, 1, old_rating, -1);
      bump_rating_segtree(segtree, 1, rating, 1);
      friend_to_rating[friend] = rating;
      rating_to_friend[old_rating] = 0;
      rating_to_friend[rating] = friend;
    } else { // Query kth highest rating
      scanf("%d", &k);
      result = query_segtree(segtree, 1, k);
      printf("%d\n", rating_to_friend[result]);
    }
  }
  return 0;
}

All of the new action is in the main function. Our new arrays are defined near the beginning of the function (57-58). The rating_to_friend array has one index per rating, so we have another problem here if we try to support the full 100 million ratings (rather than only a million ratings). Don’t worry about that for now… remember that we’re going to solve that particular issue later.

Now let’s talk about the changes that we needed to make to the code for each operation.

OK! Two issues down, one to go.

Supporting 100 Million Ratings

(I flippin’ love this technique…)

It’s true that the ratings for friends can be between 1 and 100 million. Someone could have a rating of 1. Someone else could have a rating of 100 million. But – here’s the leverage we have – only up to 1 million of those ratings can actually be used. That’s because there are at most 1 million operations in the test case, and each operation can refer to at most one new rating.

So, only one million out of a possible 100 million ratings can actually be used; that is, a maximum of 1 per cent of the ratings. That’s very sparse usage of the ratings. Trying to support 100 million unique ratings, even if we could somehow do it, is therefore overkill.

Let’s go back to the test case from last week:

8
N 3 700
N 1 500
Q 2
N 7 1000
Q 2
M 3 45000
Q 2
Q 1

The only ratings in use here are 700, 500, 1000, and 45000. Only four ratings! Yet, with the techniques developed to this point, we’d need a segment tree to support the ratings from 1 to 45000. All but four of those ratings would go unused!

There’s no reason, however, for us to be stuck with those huge ratings. All that matters is that we keep the ratings the same relative to each other. As long as we know that 500 is the smallest rating, 700 is next, 1000 is next, and 45000 is the biggest, we’ll have lost nothing. It doesn’t matter what we call the ratings, as long as their order is maintained.

The idea, then, is to assign our own small rating to each of the original ratings. Our ratings will start at 1. We’ll use 1 for the smallest rating, 2 for the second-smallest rating, 3 for the third-smallest, and so on. Here, we’ll assign 1 to rating 500, 2 to rating 700, 3 to rating 1000, and 45000 to rating 4. Then we’ll translate as needed between the original, big ratings and our new, small ratings. Someone wants to change the 45000 rating? No — they’re really referring to our small, 4 rating. That is, we will translate a big rating from the outside to our own small rating that we use internally.

One problem: how do we know what all of the used ratings are? We need them all, up front, so that we can assign a small rating (1, 2, 3, etc.) to each of them. But we read each operation, one at a time, and then that operation’s info goes away. What do we do?

Simple: we process each operation twice. The first time we process the operations, we are just looking through the stuff to identify all of the ratings that are in play. We store the operations so that we can run through them again. (If we didn’t store them, then they’d be gone: there’s certainly no way to back up and read the operations again from the input.) The second time we process the operations, we actually perform them, translating between big ratings and small ratings as needed.

We’re ready at last to fully solve this problem. Here’s the final code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_FRIENDS 1000000
#define MAX_OPERATIONS 1000000

typedef struct segtree_node {
  int left, right;
  int num_ratings; // Number of people with ratings in the range left..right
} segtree_node;

typedef struct operation {
  char code;
  int arg1, arg2;
} operation;

void init_segtree(segtree_node segtree[], int node,
                  int left, int right) {
  int mid;
  segtree[node].left = left;
  segtree[node].right = right;
  if (left == right)
    return;
  mid = (left + right) / 2;
  init_segtree(segtree, node * 2, left, mid);
  init_segtree(segtree, node * 2 + 1, mid + 1, right);
}

int query_segtree(segtree_node segtree[], int node, int k) {
  int in_right;

  if (segtree[node].left == segtree[node].right)
    return segtree[node].left;

  in_right = segtree[node * 2 + 1].num_ratings;
  if (in_right >= k)
    return query_segtree(segtree, node * 2 + 1, k);
  else
    return query_segtree(segtree, node * 2, k - in_right);
}

void bump_rating_segtree(segtree_node segtree[], int node, int rating,
                         int up_down) {
  int left_max_index;

  segtree[node].num_ratings += up_down;

  if (segtree[node].left == segtree[node].right)
    return;

  left_max_index = segtree[node * 2].right;
  if (rating <= left_max_index)
    bump_rating_segtree(segtree, node * 2, rating, up_down);
  else
    bump_rating_segtree(segtree, node * 2 + 1, rating, up_down);
}

int compare(const void *v1, const void *v2) {
  const int n1 = *(const int *)v1;
  const int n2 = *(const int *)v2;
  return n1 - n2;
}

int main(void) {
  static segtree_node segtree[MAX_OPERATIONS * 4 + 1];
  static int friend_to_rating[MAX_FRIENDS + 1];
  static int rating_to_friend[MAX_OPERATIONS + 1];
  static operation operations[MAX_OPERATIONS + 1];
  static int all_ratings[MAX_OPERATIONS];
  int n, i, friend, rating, k, result, total_ratings;
  int rating_location, old_rating_location;

  init_segtree(segtree, 1, 1, MAX_OPERATIONS);

  scanf("%d", &n);
  total_ratings = 0;

  // Learn about the ratings used. Store the operations for later
  all_ratings[total_ratings++] = 0;
  for (i = 0; i < n; i++) {
    scanf(" %c", &operations[i].code);
    if (operations[i].code == 'N') {
      scanf("%d%d", &operations[i].arg1, &operations[i].arg2);
      all_ratings[total_ratings++] = operations[i].arg2;
    } else if (operations[i].code == 'M') {
      scanf("%d%d", &operations[i].arg1, &operations[i].arg2);
      all_ratings[total_ratings++] = operations[i].arg2;
    } else {
      scanf("%d", &operations[i].arg1);
    }
  }

  qsort(all_ratings, total_ratings, sizeof(int), compare);

  // Now we do the operations for real
  for (i = 0; i < n; i++) {
    if (operations[i].code == 'N') { // New friend
      friend = operations[i].arg1;
      rating = operations[i].arg2;
      rating_location = (int *)bsearch(&rating, all_ratings, total_ratings,
                                       sizeof(int), compare) - all_ratings;
      friend_to_rating[friend] = rating_location;
      rating_to_friend[rating_location] = friend;
      bump_rating_segtree(segtree, 1, rating_location, 1);
    } else if (operations[i].code == 'M') { // Modify friend rating
      friend = operations[i].arg1;
      rating = operations[i].arg2;
      old_rating_location = friend_to_rating[friend];
      bump_rating_segtree(segtree, 1, old_rating_location, -1);
      rating_location = (int *)bsearch(&rating, all_ratings, total_ratings,
                                       sizeof(int), compare) - all_ratings;
      bump_rating_segtree(segtree, 1, rating_location, 1);
      friend_to_rating[friend] = rating_location;
      rating_to_friend[old_rating_location] = 0;
      rating_to_friend[rating_location] = friend;
    } else { // Query kth highest rating
      k = operations[i].arg1;
      result = query_segtree(segtree, 1, k);
      printf("%d\n", rating_to_friend[result]);
    }
  }
  return 0;
}

We have a new type to store each operation as we read it (13-16). We use code for the operation’s code (N, M, or Q). We use arg1 and arg2 for the second and (for N and M) third piece of information from the operation’s line.

Again, the new stuff is in the main function, so let’s jump there now. Instead of one pass through the operations, we now have two. The first pass stores the operations as it reads them in the new operations array, and stores each rating in the new all_ratings array (85). That all_ratings array starts with a 0 rating (80), just so each real rating gets placed at an index that starts at 1 (not 0).

Now we’ve got all of the ratings stored in the all_ratings array. Our next step is to sort them (94). This gives us an array where each original rating in sorted order is assigned to the indices 1, 2, 3, and so on. The location (index) of an original rating in this array tells us the small rating that it corresponds to. For example, if all_ratings were [0, 500, 700, 1000, 45000], then the original rating of 1000 would correspond to our new, small rating of 3.

Now for the second pass through the ratings, this time performing them for real. Let’s see how the code for each operation has changed.

In Closing

In one way, the techniques we used today (an array to go from friend to rating, an array to go from rating to friend, a binary search on a sorted array of ratings) are nothing new. But combined with a segment tree, they enable us to solve a very challenging problem. Really, today’s material had nothing to do specifically with a segment tree; the data structure knowledge was already deployed last week. So if you’re ever struggling with a problem that a data structure seems to almost solve, think about whether you can just augment it a little to get it to do what you want. Of course, it’s possible that it’s the wrong data structure, but don’t throw away what you have until you’re sure!

Thank you for joining me! Until next time…