<html>

<head><script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- MyFirstUnitAd -->
<ins class="adsbygoogle"
     style="display:inline-block;width:970px;height:250px"
     data-ad-client="ca-pub-5778386704669218"
     data-ad-slot="1503492166"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>

<meta http-equiv="Content-Language" content="zh-cn">
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
<meta name="GENERATOR" content="Microsoft FrontPage 5.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<title>CFGReader</title>
</head>

<body>



<p align="center"><span lang="en-ca"><font size="6" color="#FF0000"><b>CFG 
Reader</b></font></span></p>

<div align="left">
  <pre><b><font color="#ff0000" size="5">A. First Edition</font></b></pre>
</div>
<div align="left">
  <pre><b>This is first<span lang="en-ca"> edition of my </span></b><span lang="en-ca"><b>CFG Reader which stands for Context-Free-Grammar Reader. </b></span></pre>
</div>
<div align="left">
  <pre><b><font color="#ff0000" size="5"><span lang="en-ca">B</span>.<span lang="en-ca"><a name="problem"></a>The problem</span></font></b></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><b>A CFG is the basic rule for writing a parser, however, even the grammar itself should be read in automatically.</b></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><b>And what's more, the grammar should be accessable in all parsing procedure. So, this is the reason I want to write</b></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><b>such a seemingly stupid class. It not only turns input of grammar automatic, but in future whenever grammar is</b></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><b>supposed to change, everything should be able to update. Or do you understand the &quot;YACC&quot; is also doing like this </b></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><b>way? Though I never use it and have no idea how it looks like, in my mind, it should be something like this.</b></span></pre>
</div>
<div align="left">
  <font FACE="CMSLTT10">
  <p ALIGN="LEFT">program -&gt; stmt-sequence<br>
  stmt-sequence -&gt; stmt-sequence ; statement | statement<br>
  statement -&gt; if-stmt | repeat-stmt | assign-stmt | read-stmt | write-stmt<br>
  if-stmt -&gt; if boolean-exp then stmt-sequence end<br>
  | if boolean-exp then stmt-sequence else stmt-sequence end<br>
  repeat-stmt -&gt; repeat stmt-sequence until boolean-exp<br>
  assign-stmt -&gt; identifier := integer-exp<br>
  read-stmt -&gt; read identifier<br>
  write-stmt -&gt; write integer-exp<br>
  boolean-exp -&gt; integer-exp comparison-op integer-exp<br>
  comparison-op -&gt; &lt; | =<br>
  integer-exp -&gt; integer-exp addop term | term<br>
  addop -&gt; + | -<br>
  term -&gt; term mulop factor | factor<br>
  mulop -&gt; * | /<br>
  factor -&gt; ( integer-exp ) | number | identifier</font></div>
<div align="left">
  <font FACE="CMSLTT10">
  <p ALIGN="LEFT">　</font></div>
<div align="left">
  <b><font color="#ff0000" size="5"><span lang="en-ca"><a name="explain"></a>C</span>.<span lang="en-ca">The
  </span></font></b><span lang="en-ca"><font size="5" color="#FF0000"><b>idea of 
  program</b></font></span></div>
<div align="left">
  　</div>
<div align="left">
  <b><span lang="en-ca">The program is hard to write, except the design of 
  representation of the grammar. I used a rather</span></b></div>
<div align="left">
  　</div>
<div align="left">
  <span lang="en-ca"><b>naive way: A structure called Token stores the all basic 
  information of a token---its name, if it is</b></span></div>
<div align="left">
  　</div>
<div align="left">
  <span lang="en-ca"><b>terminal or non-terminal, the index of all its 
  production rule in the array of productions, and the </b></span></div>
<div align="left">
  　</div>
<div align="left">
  <span lang="en-ca"><b>number of all those productions. (Actually I can use the 
  -1 to represent the end of production rule </b></span></div>
<div align="left">
  　</div>
<div align="left">
  <span lang="en-ca"><b>index, but I am tired of using while loops.)</b></span></div>
<div align="left">
  　</div>
<div align="left">
  <span lang="en-ca"><b>There is a two-dimensioned array to represent the 
  production rule with each row as a rule and the</b></span></div>
<div align="left">
  　</div>
<div align="left">
  <span lang="en-ca"><b>value stored in each row is just index of Token*. Of 
  course you can imagine there should be an array</b></span></div>
<div align="left">
  　</div>
<div align="left">
  <span lang="en-ca"><b>of Token*. It seems quite primitive but it suits my 
  current purpose. In future, I will try to write</b></span></div>
<div align="left">
  　</div>
<div align="left">
  <span lang="en-ca"><b>more methods to enable to change production rules so 
  that left-recursion would be automatically </b></span></div>
<div align="left">
  　</div>
<div align="left">
  <span lang="en-ca"><b>replaced. </b></span></div>
<div align="left">
  　</div>
<div align="left">
  <span lang="en-ca"><b>There is only one painful job to mention: suppose you 
  have one production rule separated by sever &quot;|&quot;</b></span></div>
<div align="left">
  　</div>
<div align="left">
  <span lang="en-ca"><b>and what's worse, the rule is so long that it occupies 
  more than one line. Do you expect my class to</b></span></div>
<div align="left">
  　</div>
<div align="left">
  <span lang="en-ca"><b>read the correct rule, instead mis-translating them? 
  Surely you can see I test the first two strings</b></span></div>
<div align="left">
  　</div>
<div align="left">
  <span lang="en-ca"><b>to check if a &quot;-&gt;&quot; is used. If yes, it means this is a 
  new derivation. If not, I assume you are using</b></span></div>
<div align="left">
  　</div>
<div align="left">
  <span lang="en-ca"><b>the same old derivation. What's more, rules only start 
  new when &quot;|&quot; is encountered. Then you don't </b></span></div>
<div align="left">
  　</div>
<div align="left">
  <span lang="en-ca"><b>have to worry about the across line parts.</b></span></div>
<div align="left">
  <pre><b><font color="#ff0000" size="5">D.<span lang="en-ca"><a name="Method"></a>The </span>major functions</font></b></pre>
</div>
<div align="left">
  <pre><b><font color="#ff0000" size="5"><span lang="en-ca">E</span>.</font></b><span lang="en-ca"><font size="5" color="#FF0000"><b>Further improvement</b></font></span></pre>
</div>
<div align="left">
  <pre><b><font color="#ff0000" size="5"><span lang="en-ca">F</span>.</font></b><span lang="en-ca"><font size="5" color="#FF0000"><b>File listing</b></font></span></pre>
</div>
<div align="left">
  <pre><b><font size="3"><span lang="en-ca">1. CFGReader.h</span></font></b></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><font size="3"><b>2. </b></font></span><b><font size="3"><span lang="en-ca">CFGReader</span></font></b><span lang="en-ca"><font size="3"><b>.cpp  </b></font></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><font size="3"><b>3. main.cpp (main)</b></font></span></pre>
  <pre>　</pre>
</div>
<div align="left">
  <pre>　</pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><font size="3" color="#FF0000"><b>file name: CFGReader.h</b></font></span></pre>
</div>
<pre>const int BufferLength=256;
const int MaxTokenCount=50;
const int MaxRHSCount=8;
const int MaxRuleCount=50;
const int MaxTokenLength=10;
const int MaxProduction=10;

struct Token
{
	bool terminal;
	char* name;
	int production[MaxProduction];//pointer to the production it gives
	int count;
};


class CFGReader
{
private:
	char buffer[BufferLength];
	FILE* stream;
	Token* token[MaxTokenCount];
	int tokenCount;
	int curIndex;//the index of current token
	int curPos;//the position at production rule
	int production[MaxRuleCount][MaxRHSCount];
	int prodCount;
	void newRule();
	void initialize();
	void readRule();
	int tokenIndex(const char* str);
	void addRule(const char* str);
	int addToken(const char* str, bool isTerminal=true);
	void printRule(int index);
public:
	void readFromFile(const char* fileName);
	void print();
};

</pre>
<pre>　</pre>
<pre></pre>
<pre><span lang="en-ca"><font size="3" color="#FF0000"><b>file name: CFGReader</b></font></span><font size="3" color="#FF0000"><b>.cpp </b></font></pre>
<pre>#include &lt;iostream&gt;
#include &quot;CFGReader.h&quot;

using namespace std;

const char* deliStr=&quot; |\n&quot;;

void CFGReader::readFromFile(const char* fileName)
{
	if ((stream=fopen(fileName, &quot;r&quot;))==NULL)
	{
		cout&lt;&lt;&quot;cannot open file &quot;&lt;&lt;fileName&lt;&lt;endl;
	}
	else
	{
		initialize();
		readRule();		
	}
}

void CFGReader::readRule()
{
	char* ptr=buffer;
	char* hold;
	int count=0;

	while (!feof(stream))
	{
		count=0;
		fgets(buffer, BufferLength-1, stream);
		ptr=buffer;	
		
		while ((ptr=strtok(ptr, &quot; \n&quot;))!=NULL)
		{
			if (strcmp(ptr, &quot;|&quot;)!=0)
			{
				//test the first two token to see if old rule continues
				if (count==0)				
				{
					hold=ptr;
				}
				if (count==1)
				{
					if (strcmp(&quot;-&gt;&quot;, ptr)==0)
					{						
						curIndex=addToken(hold, false);//the index of cur token	
						token[curIndex]-&gt;terminal=false;
						newRule();					
					}
					else
					{								
						addRule(hold);
						addRule(ptr);															
					}
				}
				if (count&gt;=2)//always add
				{
					addRule(ptr);
				}
				count++;
			}
			else
			{
				//this is an alternative production rule
				newRule();						
			}
			ptr=NULL;
		}		
	}
}

void CFGReader::newRule()
{
	prodCount++;
	//add one pointer
	token[curIndex]-&gt;production[token[curIndex]-&gt;count++]=prodCount;
	curPos=0;//reset to restart;
}

int CFGReader::tokenIndex(const char* str)
{
	for (int i=0; i&lt;tokenCount; i++)
	{
		if (strcmp(str, token[i]-&gt;name)==0)
		{
			return i;
		}
	}
	return -1;
}

void CFGReader::addRule(const char* str)
{
	int index;
	index=addToken(str);
	production[prodCount][curPos++]=index;
	production[prodCount][curPos]=-1;//set end
}

int CFGReader::addToken(const char* str, bool isTerminal)
{
	int index;
	index=tokenIndex(str);
	if (index==-1)
	{
		index=tokenCount;
	}
	else
	{		
		return index;
	}
	token[index]=new Token;
	token[index]-&gt;terminal=isTerminal;
	token[index]-&gt;name= new char[strlen(str)+1];
	strcpy(token[index]-&gt;name, str);
	token[index]-&gt;count=0;
	tokenCount++;
	return index;
}



void CFGReader::initialize()
{
	tokenCount=curIndex=curPos=0;
	prodCount=-1;//in order to ++ blindly
}


void CFGReader::printRule(int index)
{
	int nodeIndex=0;
	while (production[index][nodeIndex]!=-1)
	{
		cout&lt;&lt;token[production[index][nodeIndex]]-&gt;name&lt;&lt;&quot; &quot;;
		nodeIndex++;
	}
}

void CFGReader::print()
{
	for (int i=0; i&lt;tokenCount; i++)
	{		
		if (!token[i]-&gt;terminal)
		{
			cout&lt;&lt;token[i]-&gt;name&lt;&lt;&quot; ==&gt; &quot;;
			for (int j=0; j&lt;token[i]-&gt;count; j++)
			{			
				printRule(token[i]-&gt;production[j]);			
			}
			cout&lt;&lt;&quot;\n&quot;;
		}		
	}
}



</pre>
<pre>
</pre>
<pre><span lang="en-ca"><font size="3" color="#FF0000"><b>file name: main</b></font></span><font size="3" color="#FF0000"><b>.cpp (main)</b></font></pre>
<pre>#include &lt;iostream&gt;
#include &quot;CFGReader.h&quot;

using namespace std;

int main()
{
	CFGReader R;
	R.readFromFile(&quot;c:\\TinyCFG.txt&quot;);
	R.print();
	return 0;
}</pre>
<pre>
</pre>
<pre></pre>
<pre></pre>
<pre><font color="#0000FF"><b>Here is the result:<span lang="en-ca"> The input file is <a href="TinyCFG.txt">&quot;c:\TinyCFG.txt&quot;.</a>  </span></b></font></pre>

<pre>program ==&gt; stmt-sequence
stmt-sequence ==&gt; stmt-sequence ; statement statement
statement ==&gt; if-stmt repeat-stmt assign-stmt read-stmt write-stmt
if-stmt ==&gt; if boolean-exp then stmt-sequence end if boolean-exp then stmt-sequence else stmt-sequen
ce end
repeat-stmt ==&gt; repeat stmt-sequence until boolean-exp
assign-stmt ==&gt; identifier := integer-exp
read-stmt ==&gt; read identifier
write-stmt ==&gt; write integer-exp
boolean-exp ==&gt; integer-exp comparison-op integer-exp
integer-exp ==&gt; integer-exp addop term term
comparison-op ==&gt; &lt; =
addop ==&gt; + -
term ==&gt; term mulop factor factor
mulop ==&gt; * /
factor ==&gt; ( integer-exp ) number identifier
Press any key to continue









</pre>

<pre></pre>

<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                                   
&nbsp;&nbsp;&nbsp; <a href="WhoAmI.htm">                  







                       <img src="picture/back.gif" style="border: medium none" alt="back.gif (341 bytes)" width="32" height="35"></a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
<a href="index.htm"><img src="picture/up.gif" style="border: medium none" alt="up.gif (335 bytes)" width="35" height="32"></a>       &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                         
<img src="picture/next.gif" style="border: medium none" alt="next.gif (337 bytes)" width="32" height="35">          


</p>

</body>

</html>