CSSS Coding Competition

CSSS Coding Competition -- October 2002

Problem 1

A rule (x,y) consists of two strings of letters x and y. A rule (x,y) can be applied to a string s if x is a substring of s, in which case x is replaced by y in s. For example, the rule (ab,ba) can be applied to the string "abcab" in two ways: replacing the first occurrence of "ab" gives "bacab" and replacing the second occurrences of "ab" gives "abcba".

Given a set of rules, a source string, and a target string, the problem is to find the shortest sequence of rules that transform the source string into the target string.

The input file consists of: the number of rules, (four in this example); the rules, one on each line; the source string; and the target string.

The output file consists of: the source string; the intermediate strings obtained by applying one rule at a time; and the target string, with each string on a separate line.

Here is a typical input file:

        4
        ab ba
        ab bcd
        aa d
        db bb
        abab
        bbb

And this is the corresponding output:

        abab
        baab
        bdb
        bbb

Test Data

First:

4
ab bcd
ab ba
cdc a
d bb
abab
bbbba

Expected output:

abab
abbcd
babcd
bbcdcd
bbad
bbabb
bbbab
bbbba

Second:

5
a ab
ab ba
baa bab
ba ab
bab bcb
abaaaa
bcbcbcb

Expected output:

abaaaa
abbaaaa
babaaaa
bababaa
bababab
bcbabab
bcbcbab
bcbcbcb

Discussion

Things to watch out for in this problem are: exponential explosion (several new strings can be derived from each string generated) and looping. There may be "smart" solutions to this problem but there isn't enough time to try them out in a competition. The best strategy is to start with a "brute force" technique and hope that the test data sets are not too large.

Imagine a tree with the starting string at the root; the subtrees of a node are the result of applying each applicable rule to that node. The target string must be somewhere in the tree. Depth first search will almost certainly fail, since there will be many infinite paths in the tree. But breadth first search will find the target string and also provide the shortest path to it.

One way to implement breadth first search is to set up a queue. The first entry of the queue is the start string. We retrieve a queue entry, apply the rules to it, and insert the new strings into the queue. To avoid looping, we never put the same string into the queue twice. When the target string appears in the queue, we are finished.

Fortunately, I coded the solution and tried it before the competition. In its original form, my second test took several minutes of CPU time! I simplified the problem, and the second data set above requires about 20 seconds of CPU time, during which it puts 4,942 strings in the queue.

The remaining problem is to print out the intermediate strings. To do this, we add a pointer to each queue entry, pointing back to the string from which this string was derived. We can then use a recursive function (report() below) to print out the intermediate strings in the correct sequence.

Although this problem required the most code, I do not consider it the most difficult problem of the four (although that may be because I wrote it myself!). The solution uses standard algorithms and data structures, and can be written quite quickly. However, this is the only problem for which I actually introduced classes, because I felt that I needed an abstraction to avoid getting bogged down in too many levels of detail.

Code

#include <iostream.h>
#include <fstream.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>

ifstream in;
ofstream out;

class Node
{
public:
   Node(char *str, Node *from = NULL, Node *next = NULL)
      : str(str), from(from), next(next) {}
   char *str;
   Node *from;
   Node *next;
};

class Queue
{
public:
   Queue(char *str)
   {
      first = new Node(str);
      last = first;
      curr = NULL;
   }
   Node *has(char *patt)
   {
      Node *p = first;
      while (p)
      {
         if (strcmp(p->str, patt) == 0)
            return p;
         p = p->next;
      }
      return NULL;
   }
   Node *next() 
   {
      if (curr == NULL)
         curr = first;
      else
         curr = curr->next;
      return curr;
   }
   void add(char *str, Node *from)
   {
      if (has(str))
         return;
      Node *n = new Node(str, from, NULL);
      last->next = n;
      last = n;
   }
   void show()
   {
      cout << "Queue: ";
      Node *p = first;
      while (p)
      {
         cout << p->str << " ";
         p = p->next;
      }
      cout << endl;
   }
private:
   Node *first;
   Node *last;
   Node *curr;
};

char *readstr()
{
   const int LEN = 200;
   char buffer[LEN];
   in >> buffer;
   int len = strlen(buffer);
   char *p = new char[len + 1];
   strcpy(p, buffer);
   return p;
}

const int MAX = 100;
char *src[MAX];
char *dst[MAX];
char *start;
char *target;

bool match(char *p, char *q)
{
   while (true)
   {
      if (*p == '\0')
      {
         return true;
      }
      if (*q == '\0')
      {
         return false;
      }
      if (*p != *q)
      {
         return false;
      }
      p++;
      q++;
   }
}

char *replace(char *patt, char *p, char *src, char *dst)
{
   char buff[100];
   char *b = buff;
   while (patt < p)
      *b++ = *patt++;
   for (char *q = src; *q; q++)
      patt++;
   char *d = dst; 
   for (q = dst; *q; q++)
      *b++ = *d++;
   while (*patt)
      *b++ = *patt++;
   *b++ = '\0';
   char *r = new char(strlen(buff) + 1);
   strcpy(r, buff);
   return r;
}

void report(Node *p)
{
   if (p)
   {
      report(p->from);
      out << p->str << endl;
   }
}

void main(int argc, char *argv[])
{
   in.open(argv[1]);
   out.open(argv[2]);
   int numrules;
   in >> numrules;
   for (int n = 0; n < numrules; n++)
   {
      src[n] = readstr();
      dst[n] = readstr();
   }
   start = readstr();
   target = readstr();

   Queue q(start);
   while (true)
   {
      if (q.has(target))
      {
         report(q.has(target));
         break;
      }
      Node *s = q.next();
      if (s == NULL)
         break;
      char *patt = s->str;
      for (n = 0; n < numrules; n++)
      {
         for (char *p = patt; *p; p++)
         {
            if (match(src[n], p))
            {
               q.add(replace(patt, p, src[n], dst[n]), s);
            }
         }
      }
   }
}