Algorithmic Thinking: A Problem-Based Introduction Homepage for the book Algorithmic Thinking: A Problem-Based Introduction

[Chapter 3, Dynamic Programming] Solving Baltic OI 2005- Ancient Manuscript

Hi everyone,

At the end of Chapter 3 of Algorithmic Thinking, I included one example of using memoization to compute the number of valid solutions to a problem. I’ve gotten a few requests to cover another example, so I’ll do that here! I’ll assume that you’re familiar with the memoization/dynamic programming material from Chapter 3.

The problem we’ll solve is Baltic OI 2005 - Ancient Manuscript. It’s a competitive programming problem from the Baltic Olympiad in Informatics (BOI).

In the book, I rewrote each problem for consistent presentation and editing, so I’ll do that here, too. I do encourage you to read the original problem statement so that you can practice reading problem statements in their original form. Following the problem description, we’ll look at some test cases, and then work our way through solving the problem. (In the language of competitive programming, what I’m providing here is often called a problem editorial. If you’re ever stuck on a competitive programming problem and are looking for a hint, searching for an editorial for the problem might help.)

The Problem

Baltic scientists are trying to understand an ancient word. Some characters of the ancient word are present, but others are damaged and are therefore unknown. A damaged character is written as an asterisk *.

The scientists want to recover the ancient word by fixing the damaged characters, and they know the rules for what the recovered word must look like:

In how many ways can the word be recovered?

Input

The input contains one test case, the information for which is spread over two lines as follows:

Output

Output the number of ways that the word can be recovered. The answer is guaranteed to fit in a long long signed integer.

The time limit for solving the test case is two seconds.

Two Examples

I encourage you to work through the sample inputs and outputs provided in the original problem description. I’ll work through two examples here.

First, consider:

1 1 1 1
a**

Here we are allowed at most one vowel in a row and at most one consonant in a row. The first * is therefore constrained to be a consonant: if we let it be a vowel, then we have two vowels in a row which is not allowed. With the first * replaced by a consonant, the second * is constrained to be a vowel. So we have a choice of 21 consonants for the first * and 5 vowels for the second *. If we multiply these, we get our answer: $21*5$, or 105 ways to recover the word. (The reason we multiply here is that for each choice of consonant, we have 5 choices of vowel. So for b we have 5 choices, for c we have 5 choices, for d we have 5 choices and so on. The total number of choices is therefore obtained by adding 5 a total of 21 times or, equivalently, multiplying 21 by 5.)

Second, consider:

1 2 1 1
a**

This time, we’re allowed two vowels in a row but only one equal vowel in a row. And, like last time, we’re allowed only one consonant in a row.

The first * can be a consonant. If we do that, then the second * must be a vowel. This gives us $21*5 = 105$ solutions.

But that’s not all, because the first * can be set to one of four vowels: e, i, o, or u. It can’t be a, because that would be two equal vowels in a row, which isn’t allowed. For each of these 4 vowels, we can choose any consonant for the second *, giving us $4*21 = 84$ more solutions.

The total number of solutions is therefore $105+84=189$.

What do Solutions Look Like?

Suppose we’re considering a character of the ancient word. That character tells us what the solutions are:

  1. If the character is a vowel (a, e, i, o, or u), then there’s no choice for what to do: we must use that specific vowel here. Using that vowel means that, when we continue, we’re allowed to use one fewer vowel in a row. If this vowel is the same as the previous character that we used, then we’re allowed to use one fewer equal vowel in a row as well; otherwise, we can use one less than the maximum number of equal vowels in a row. (For example, if we’re allowed to use 3 vowels in a row and 2 equal vowels in a row, and if the previous vowel was e and this vowel is e too, then after this value we’re allowed to use 2 vowels in a row and 1 equal vowel in a row.) As for consonants, since we have a vowel here, we’re back to being able to use the maximum number of consonants in a row and the maximum number of equal consonants in a row.

  2. If the character is a consonant, then there’s no choice for what to do: we must use that specific consonant here. Using that consonant means that, when we continue, we’re allowed to use one fewer consonant in a row. If this consonant is the same as the previous character that we used, then we’re allowed to use one fewer equal consonant in a row as well; otherwise, we can use one less than the maximum number of equal consonants in a row. As for vowels, since we have a consonant here, we’re back to being able to use the maximum number of vowels in a row and the maximum number of equal vowels in a row.

  3. If the character is a *, then we have a lot of options!

    • If the last character we used was a vowel, then we can try using the same vowel again.
    • We can try using each of the other vowels in turn.
    • If the last character we used was a consonant, then we can try using the same consonant again.
    • We can try using each of the other consonants in turn.

A Memoized Solution

In the book, I suggest working through these sorts of problems using a four-step procedure. I encourage you to work through each of those steps here for extra practice, but here I’ll cut to the chase and present a memoized solution. The key function is the solve function. It takes quite a few parameters; the purpose of each is provided in the code.

Unlike in the book, here I’m using some global variables. The solve function has so many parameters already that I didn’t want to add even more.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define SIZE 15 // Max number of chars in word
#define MAX_CONSEC 4 // Max number of relevant consecutive chars
#define MAX_CHAR 256 // Maximum int value of vowels/consonants

#define VOWELS "aeiou"
#define CONSONANTS "bcdfghjklmnpqrstvwxyz"

int max_consec_con, max_consec_vow;
int max_consec_con_eq, max_consec_vow_eq;
char word[SIZE + 2];
long long memo[SIZE + 2][MAX_CONSEC + 1][MAX_CONSEC + 1][MAX_CONSEC + 1]\
              [MAX_CONSEC + 1][MAX_CHAR];

// Return 1 if ch is a vowel, 0 otherwise
int is_vowel(int ch) {
  return strchr(VOWELS, ch) != NULL;
}

// Return 1 if ch is a consonant, 0 otherwise
int is_consonant(int ch) {
  return strchr(CONSONANTS, ch) != NULL;
}


// Return the solution, given:
// i: the current position in word
// consec_con: the number of allowed consecutive consonants
// consec_vow: the number of allowed consecutive vowels
// consec_con_eq: the number of allowed consecutive equal consonants
// consec_vow_eq: the number of allowed consecutive equal vowels
// last: the last character of word that we processed

long long solve(int i, int consec_con, int consec_vow,
                int consec_con_eq, int consec_vow_eq, int last) {
  long long result = 0;

  // Constraints not satisfied -> 0 solutions
  if (consec_con < 0 || consec_vow < 0 ||
      consec_con_eq < 0 || consec_vow_eq < 0)
    return 0;

  // Processed entire word -> found one solution
  if (i == 0)
    return 1;

  // Answer already in memo -> return answer immediately
  if (memo[i][consec_con][consec_vow][consec_con_eq][consec_vow_eq][last] != -1)
    return memo[i][consec_con][consec_vow][consec_con_eq][consec_vow_eq][last];

  // Current char is a vowel -> have to use it
  if (is_vowel(word[i])) {
    if (word[i] == last)
      result = solve(i - 1, max_consec_con, consec_vow - 1,
                     max_consec_con_eq, consec_vow_eq - 1, word[i]);
    else
      result = solve(i - 1, max_consec_con, consec_vow - 1,
                     max_consec_con_eq, max_consec_vow_eq - 1, word[i]);
    memo[i][consec_con][consec_vow][consec_con_eq][consec_vow_eq][last] = result;
    return result;
  }

  // Current char is a consonant -> have to use it
  if (is_consonant(word[i])) {
    if (word[i] == last)
      result = solve(i - 1, consec_con - 1, max_consec_vow,
                     consec_con_eq - 1 , max_consec_vow_eq, word[i]);
    else
      result = solve(i - 1, consec_con - 1, max_consec_vow,
                     max_consec_con_eq - 1, max_consec_vow_eq, word[i]);
    memo[i][consec_con][consec_vow][consec_con_eq][consec_vow_eq][last] = result;
    return result;
  }

  // Here we know that current char is '*'. Fill it in all possible ways

  if (is_vowel(last))
    result += solve(i - 1, max_consec_con, consec_vow - 1,
                    max_consec_con_eq, consec_vow_eq - 1, last);

  for (int v = 0; v < strlen(VOWELS); v++)
    if (VOWELS[v] != last)
      result += solve(i - 1, max_consec_con, consec_vow - 1,
                      max_consec_con_eq, max_consec_vow_eq - 1, VOWELS[v]);

  if (is_consonant(last))
    result += solve(i - 1, consec_con - 1, max_consec_vow,
                    consec_con_eq - 1, max_consec_vow_eq, last);

  for (int c = 0; c < strlen(CONSONANTS); c++)
    if (CONSONANTS[c] != last)
      result += solve(i - 1, consec_con - 1, max_consec_vow,
                      max_consec_con_eq - 1, max_consec_vow_eq, CONSONANTS[c]);

  memo[i][consec_con][consec_vow][consec_con_eq][consec_vow_eq][last] = result;
  return result;
}


int main(void) {
  scanf("%d%d", &max_consec_vow_eq, &max_consec_vow);
  scanf("%d%d\n", &max_consec_con_eq, &max_consec_con);

  // Starting word at index 1 so that I can use 0 to indicate
  // that the entire string has been processed
  gets(word + 1);

  memset(memo, -1, sizeof(memo));
  long long res = solve(strlen(word + 1), max_consec_con, max_consec_vow,
                        max_consec_con_eq, max_consec_vow_eq, ' ');
  printf("%lld\n", res);
  return 0;
}

The memo array here (15) is six dimensions. Yeah, six! The examples in the book use only one, two, or three dimensions, but remember from that discussion that the problem can dictate the number of dimensions that we need. Here, we have six things to remember:

  1. What index in the word are we looking at? We need this so that we know when we’re done processing the word.
  2. How many consonants in a row are allowed?
  3. How many vowels in a row are allowed?
  4. How many equal consonants in a row are allowed?
  5. How many equal vowels in a row are allowed?
  6. What was the last character that we used? We need this so that we can make the next character the same or different than this character, as needed.

The final dimension of memo is a character, not an integer, so you might wonder why we’re using the value 256 for its number of elements. The reason is that, in C, characters are stored and can be used as small integers. The lowercase characters that we’re using are all less than 256. (You could even scale them to the numbers 0-25 and save some space on this dimension. Try it!)

In the solve function, we begin by checking whether the current state is invalid (42). It’s invalid if we go negative on the number of allowed or equal consonants/vowels in a row. If we’ve processed the entire word without any of those values going negative, then we’ve found one valid solution (47).

Before continuing and possibly doing unnecessary work, we check whether we’ve already solved this subproblem (51), returning immediately if so.

The rest of the code for this function determines the number of solutions based on whether the current character is a vowel, consonant, or *. The trickiest part of the code is getting the arguments straight in the recursive calls. For example, let’s consider the case when the current character is a vowel (55). There are two subcases:

  1. The last character and the current character are the same. Notice in this case that we subtract 1 from both the number of allowed vowels in a row and the number of allowed equal vowels in a row.
  2. The last character and the current character are different. Notice this time that while we still subtract 1 from the number of allowed vowels in a row, we now set the number of allowed equal vowels in a row to 1 less than the maximum (because we have used only 1 equal vowel in a row now!).

Do We Really Need Memoization?

This solution requires memoization to be accepted within the time limit. Without memoization, already this test case takes way too long on my laptop:

1 1 1 1
********

That’s because it’s very easy to end up in a repeated state. For example, if we use b for the first character, c for the second , and d for the third, we end up in the same state had we used c for the first character, b for the second, and d for the third.

Thank You

Thank you for reading. I hope this example was helpful. Please feel free to let me know how this worked out for you. I appreciate the ongoing feedback!