<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0074)http://www.cs.concordia.ca/ccss/events/coding_compet_2002/codingprob2.html --><HTML><HEAD><script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- MyFirstUnitAd -->
<ins class="adsbygoogle"
     style="display:inline-block;width:970px;height:250px"
     data-ad-client="ca-pub-5778386704669218"
     data-ad-slot="1503492166"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
<TITLE>CSSS Coding Competition</TITLE>
<META http-equiv=Content-Type content="text/html; charset=gb2312"><LINK rev=made 
href="mailto:grogono@cs.concordia.ca">
<META content="MSHTML 6.00.2800.1141" name=GENERATOR>
<META content="Peter Grogono" name=author>
<META content="Course Description" name=description>
<META content=Keywords name=keywords>
<STYLE type=text/css>P {
	FONT-FAMILY: Arial, Helvetica, Geneva, sans-serif
}
UL {
	FONT-FAMILY: Arial, Helvetica, Geneva, sans-serif
}
LI {
	FONT-FAMILY: Arial, Helvetica, Geneva, sans-serif
}
TABLE {
	FONT-FAMILY: Arial, Helvetica, Geneva, sans-serif
}
TR {
	FONT-FAMILY: Arial, Helvetica, Geneva, sans-serif
}
TD {
	FONT-FAMILY: Arial, Helvetica, Geneva, sans-serif
}
</STYLE>
</HEAD>
<BODY text=#800000 vLink=#800080 aLink=#ff0000 link=#0000ff bgColor=#f5deb3>
<H1 align=center>CSSS Coding Competition -- October 2002</H1>
<HR>

<H2>Problem 2</H2>
<P>Write a program to find the longest repeating substring of a given 
arbitrarily long character string. For example, in the string <PRE>        The 123 computers used in this competition to perform matrix
        computations are supplied by Commercial Computer Corporation.
</PRE>
<P>the repeating substrings include "co", "com", "comp", and ""comput". (There 
are also many trivial substrings consisting of one letter.) The longest 
repeating substring is "comput". Note that the comparisons are not case 
sensitive --- "Commercial" and "Computer" are included in these repeating 
substrings. Only upper and lower case letters are considered to be parts of 
substrings; blanks and punctuation symbols are ignored. 
<P>The input file consists of a string which may extend over more than one line. 
It ends with a blank line. 
<P>The output file consists of a single word: the longest repeating substring in 
the input. 
<P>Example input file: <PRE>        The 123 computers used in this competition to perform matrix
        computation are supplied by Commercial Computer Corporation.
</PRE>
<P>The corresponding output file: <PRE>        comput
</PRE>
<HR>

<H3>Test Data</H3>
<P>First: <PRE>The 123 computers used in this competition to perform matrix
computation are supplied by XYZ computer corporation
</PRE>
<P>Expected output: <PRE>comput
</PRE>
<P>Second: <PRE>The 123 computers used in this competition to perform matrix
computation are supplied by commercial computers corp.
</PRE>
<P>Expected output: <PRE>computers
</PRE>
<P>Third: <PRE>a abcd d def abcd def abc gh abcd swa abc def abcd ss
</PRE>
<P>Expected output: <PRE>abcd
</PRE>
<HR>

<H3>Discussion</H3>
<P>I made the mistake of changing this problem shortly before the competition: 
the original problem had specified case-sensitive matches and I changed this to 
case-insensitive matches (to make it slightly more difficult). That made the 
problem statement incorrect (because the answer should have been "computer" 
rather than "comput") and this may have confused some teams. 
<P>Jay supplied the first test data set; I added the second. After the 
competition started, I decided that both test sets were rather similar to the 
given example, and I added the third test set. 
<P>When solutions started coming in, I realized that some teams were reading 
only one line of input. The sample data suggests that there could be more than 
one line of input, but perhaps not strongly enough. I removed new lines from the 
test data for subsequent trials when I realized that some teams were stuck on 
this detail. The program, of course is much easier to write if there is only one 
line of text. Once you realize that there might be several lines of input, it is 
not hard to write a program that reads to the end of the file, concatenating 
lines as it goes. 
<P>The key to solving this problem quickly is to imagine two copies of the 
string, one sliding past the other: <PRE>The 123 computers used in this competition to perform matrix
                       The 123 computers used in this competition to perform matrix
</PRE>
<P>Starting a scan from "T" in the lower string, we find that "c" is in the same 
place in both strings. We move along until we discover that "e" and "u" are 
different and, at that point, we have "comp" as a repeating substring. If this 
substring is longer than one we have seen before, we record it. 
<P>There is a bug in the solution below, but it would nevertheless have passed 
in the competition. 
<HR>

<H3>Code</H3><PRE>#include &lt;iostream.h&gt;
#include &lt;fstream.h&gt;
#include &lt;strstrea.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;string.h&gt;
#include &lt;ctype.h&gt;

ifstream in;
ofstream out;

void main(int argc, char *argv[])
{
   in.open(argv[1]);
   out.open(argv[2]);
   char str[4000] = "";
   while(in)
   {
      char buf[500];
      in.getline(buf, 500);
      strcat(str,buf);
      strcat(str, " ");
   }

   for (char *p = str; *p; p++)
      *p = tolower(*p);

   char *fst = NULL;
   char *lst = NULL;
   for (unsigned int d = 1; d &lt; strlen(str); d++)
   {
      char *p = str;
      char *r = str;
      for (char *q = str + d; *q; p++, q++)
      {
         if (!isalpha(*p) || !isalpha(*q) || *p != *q)
         {
            if (p - r &gt; lst - fst)
            {
               fst = r;
               lst = p;
            }
            r = p;
         }
      }
   }
   for (p = fst; p &lt; lst; p++)
      out &lt;&lt; *p;
   out &lt;&lt; endl;
}
</PRE></BODY></HTML>
