<html>

<head><script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- MyFirstUnitAd -->
<ins class="adsbygoogle"
     style="display:inline-block;width:970px;height:250px"
     data-ad-client="ca-pub-5778386704669218"
     data-ad-slot="1503492166"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>

<meta http-equiv="Content-Language" content="zh-cn">
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
<meta name="GENERATOR" content="Microsoft FrontPage 5.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<title>inexact match</title>
<style>
<!--
	table td.head{ 
		background-color: #3a6ba5;
		border: 1px #000000 solid;
		font-family: Verdana;
		font-weight: bold;
		font-size: 14px;
		color: #f79c19;
		padding: 6px;
	}

	table td.body{ 
		border-bottom: 1px #6699CC dotted;
		text-align: left;
		font-family: Verdana, sans-serif, Arial;
		font-weight: normal;
		font-size: 14px;
		color: #3A6BA5;
		background-color: #fafafa;
		padding: 6px;
	}
	
-->
</style>
</head>

<body>



<p align="left"><font size="6" color="#FF0000"><span lang="en-ca"><b>&nbsp; 
 
</b></span><b>&nbsp;&nbsp;&nbsp; <span lang="en-ca">&nbsp;&nbsp; </span></b></font>
<span lang="en-ca"><font size="6" color="#FF0000"><b>Inexact Match (DFS)</b></font></span></p>

<div align="left">
  <pre><b><font color="#ff0000" size="5">A. <span lang="en-ca">Second </span>Edition</font></b></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><b>This is a small test program as suggested by Mr. Li. It is a kind of simplified inexact match. The idea is a kind</b></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><b>of straight forward and it is like a kind of scanner. This edition only solves the problem exposed by first edition,</b></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><b>the ambiguous grammar. (Sometimes I really feel it is a great waste for my precious web space to allow almost </b></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><b>exactly same project to be stored there. There is little modification between two editions!)</b></span></pre>
</div>
<div align="left">
  <pre><b><font color="#ff0000" size="5"><span lang="en-ca">B</span>.<span lang="en-ca"><a name="problem"></a>The problem</span></font></b></pre>
  <pre><span lang="en-ca"><b>The problem is like following and pay attention to the last line of explanation. They want to solve this trivial</b></span></pre>
  <pre><span lang="en-ca"><b>problem and it is one of the simplest DFS.</b></span></pre>
  <pre><span lang="en-ca"><b>In China, many people use their mobile phone as their address book. They may store contacts with UNICODE or ASCII</b></span></pre>
  <pre><span lang="en-ca"><b>code. So, the searching problem is a bit complicated. </b></span></pre>
  <pre><span lang="en-ca"><b>We assume following conditions:</b></span></pre>
  <pre><span lang="en-ca"><b>1. The index for ASCII code is using its own ASCII string.</b></span></pre>
  <pre><span lang="en-ca"><b>2. The UNICODE string is using their &quot;pinyin&quot; string and especially in embedded system the storage resource is</b></span></pre>
  <pre><span lang="en-ca"><b>very tight and the developer may not want to store index as &quot;pinyin&quot; along with UNICODE string. Since there are</b></span></pre>
  <pre><span lang="en-ca"><b>only limited &quot;pinyin&quot; combinations, they are willing to consider that conversion from UNICODE to &quot;pinyin&quot; is fast.</b></span></pre>
  <pre><span lang="en-ca"><b>3. The order between a ASCII code string and a UNICODE string with same letter is that ASCII goes before UNICODE.</b></span></pre>
  <pre><span lang="en-ca"><b>i.e. The &quot;pinyin&quot; for my name of UNICODE is &quot;huang qing zhe&quot;. If I also input an ASCII code string &quot;huang qing zhe&quot;,</b></span></pre>
  <pre><span lang="en-ca"><b>I will expect to find ASCII code before UNICODE. And the UNICODE is before ASCII string of &quot;goo&quot; since ASCII has</b></span></pre>
  <pre><span lang="en-ca"><b>a letter &quot;g&quot; which is after &quot;pinyin&quot; of &quot;h&quot;.</b></span></pre>
  <pre><span lang="en-ca"><b>4. An inexact match is performed by accepting user's input string and should return all matched string. </b></span></pre>
  <pre><span lang="en-ca"><b>i.e. UNICODE string with &quot;pinyin&quot; of &quot;huang qing zhe&quot; will be considered to be match with &quot;hqz&quot;, &quot;huqizh&quot;, </b></span></pre>
  <pre><span lang="en-ca"><b>&quot;huangqz&quot;, or &quot;hqzhe&quot;. And the following is NOT matched: &quot;haqz&quot;, &quot;hugqingzhe&quot; etc.</b></span></pre>
  <pre><span lang="en-ca"><b>5. However, this kind of grammar has one ambiguity: i.e. Suppose we have an ASCII string &quot;shang hai shi&quot;. And the</b></span></pre>
  <pre><span lang="en-ca"><b>searching string is &quot;shs&quot;. There can be two explanation: one is that it should be matched since &quot;shs&quot; are all </b></span></pre>
  <pre><span lang="en-ca"><b>starting letter for all words. The other is that it is not matched since &quot;sh&quot; is matched with first word &quot;shang&quot;.</b></span></pre>
  <pre><span lang="en-ca"><b>And letter &quot;s&quot; will not be matched in second word &quot;hai&quot; any more. </b></span></pre>
  <pre><span lang="en-ca"><b>This kind of ambiguity is usually solved by pre-defined priority of searching pattern.</b></span></pre>
    
  <b><font color="#ff0000" size="5"><span lang="en-ca"><a name="explain"></a>C</span>.<span lang="en-ca">The
  </span></font></b><span lang="en-ca"><font size="5" color="#FF0000"><b>idea of 
  program</b></font></span><p><span lang="en-ca"><b><font face="Times New Roman">
  Originally Mr. Li suggested me to draw the possible DFA. But I feel it is a 
  kind of more difficult than coding. </font></b></span>
  <p><b><font face="Times New Roman">The dictionary is input by a file named &quot;nick.txt&quot; 
  and it should be the same directory along with executable file.</font></b><p>
  <span lang="en-ca"><b><font face="Times New Roman">The input file format is 
  like following:</font></b></span><p><span lang="en-ca"><b>
  <font face="Times New Roman">[number of words in name]&nbsp; [flag for 
  UNICODE, 1==UNICODE] [string1, string2 ...]</font></b></span><p>
  <span lang="en-ca"><b><font face="Times New Roman">In this second edition, I 
  want to express the algorithms in some pseudo-code for explanation purpose. 
  Therefore you will find some useless pseudo-code at top of code.</font></b></span></div>
<pre><b><font color="#ff0000" size="5">D.<span lang="en-ca"><a name="Method"></a>The </span>major functions</font></b></pre>
<div align="left">
  <pre><span lang="en-ca"><b>The findFirst function is returning the insertion point for new-added name.</b></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><b>The function &quot;unico2pinyin&quot; is only for simplification and you should take advantage of some library in your </b></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><b>platform.</b></span></pre>
</div>
<div align="left">
  <pre><b><font color="#ff0000" size="5"><span lang="en-ca">E</span>.</font></b><span lang="en-ca"><font size="5" color="#FF0000"><b>Further improvement</b></font></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><b>The searching for matching can be improved if you make use of the property that the address book is already sorted.</b></span></pre>
</div>
<div align="left">
  <pre>กก</pre>
</div>
<div align="left">
  <pre><b><font color="#ff0000" size="5"><span lang="en-ca">F</span>.</font></b><span lang="en-ca"><font size="5" color="#FF0000"><b>File listing</b></font></span></pre>
</div>
<div align="left">
  <pre><font size="3"><b><span lang="en-ca">1. addressbook.cpp</span></b></font></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><font size="3" color="#FF0000"><b>file name: addressbook.h</b></font></span></pre>
</div>
<pre>#include &lt;iostream&gt;
#include &lt;fstream&gt;
#include &lt;string.h&gt;

using namespace std;

const int LongestWordLength=10;
const int MaxWordNumber=10;
const int MaxNameNumber=100;

char* fileName=&quot;nick.txt&quot;;

char* unico2pinyin(char* str);


//let's define our language as following:
enum LetterType
{
	AlphaType, DelimiterType, EndOfStringType, ErrorType
};

LetterType letter2Type(char ch)
{
	if (ch&gt;='a'&amp;&amp;ch&lt;='z')
	{
		return AlphaType;
	}
	if (ch==' ')
	{
		return DelimiterType;
	}
	if (ch=='\0')
	{
		return EndOfStringType;
	}
	return ErrorType;
}


//let's define the states for our state machine
//and it is a kind high-level abstract machine
enum StateType
{
	InitiateState, ProcessState, AcceptState, RejectState
};

//isEOF stands for the sourcestring EOF
StateType stateMachine(StateType currentState, bool isMatch, bool isEOF)
{
	if (isEOF)
	{
		return AcceptState;
	}
	if (!isMatch)
	{
		return RejectState;
	}
	//so from here, we can assume all state below is &quot;isMatch==true&quot;
	//and both AcceptState and RejectState is our final States, so
	//the automata will halt at both states.
	switch (currentState)
	{
	case InitiateState:
		return ProcessState;
		break;
	case ProcessState:
		return ProcessState;
		break;
	}
}

//then our job is to define what is a match
//to me it is almost the simplest Depth-First-Search problem
//it is the &quot;real&quot; state machine we are using. Since it is a 
//special case of DFS---with exactly two choices in each state.
/*
struct State
{
	int wordIndex1;	   //this is for the word
	int letterIndex1;   //this is within word, the index of letter
	bool match1;
	int wordIndex2;
	int letterIndex2;
	bool match2;
	int stringIndex;   //this is the source letter index

};
*/

struct State
{
	int wordIndex;
	int letterIndex;
	bool match;
};




struct MyWord
{
	bool isUNICODE;	
	int letterCount;
	char myword[LongestWordLength+1];
	bool isSmaller(MyWord&amp; other);
	void display();

	char* toString();
};

struct MyName
{
	int wordCount;
	MyWord mywords[MaxWordNumber];
	void display();
	bool match(char* str);
	bool doMatch(State&amp; state, char* str, int index);

};

class MyBook
{
private:
	MyName mynames[MaxNameNumber];
	int nameCount;

public:
	void initialize();
	MyBook();
	//void addNewName(MyName&amp; newName);
	void addNewName(bool isUnico, int wordCount, char newWords[][LongestWordLength]);
	int findFirst(bool isUnico, char* str);
	void display();
	void findMatch(char* str);


};

int main()
{
	char buf[LongestWordLength];
	MyBook book;
	book.initialize();
	book.display();
	printf(&quot;now let's find the match\n&quot;);
	book.findMatch(&quot;shs&quot;);
	do
	{
		printf(&quot;input your searching string&gt;&quot;);
		scanf(&quot;%s&quot;, buf);
		if (strcmp(buf, &quot;exit&quot;)==0)
		{
			break;
		}
		book.findMatch(buf);
	}
	while (true);


	
	return 0;
}

void MyBook::findMatch(char* str)
{
	char* ptr;
	for (int i=0; i&lt;nameCount; i++)
	{
		ptr=mynames[i].mywords[0].toString();
	
		if (mynames[i].match(str))
		{
			printf(&quot;find a match:&quot;);
			mynames[i].display();
		}
		
	}
}

bool MyName::doMatch(State&amp; state, char* str, int index)
{
	State subState;
	bool result1, result2;
	if (!state.match)
	{
		return false;
	}
	if (str[index]=='\0')
	{
		return true;
	}

	//first case
	subState.wordIndex=state.wordIndex;		
	subState.letterIndex=state.letterIndex+1;
	if (subState.letterIndex==mywords[subState.wordIndex].letterCount)
	{
		//subState.match=false;
		//this choice is dead and we only need to test next jump method
		result1=false;
	}
	else
	{
		subState.match=mywords[subState.wordIndex].myword[subState.letterIndex]==str[index];
		if (subState.match)
		{
			result1=doMatch(subState, str, index+1);
		}
		else
		{
			result1=false;
		}
	}
	//now the jumping
	if (state.wordIndex==wordCount)
	{		
		result2=false;
	}
	else
	{
		subState.wordIndex=state.wordIndex+1;
		subState.letterIndex=0;
		subState.match=mywords[subState.wordIndex].myword[subState.letterIndex]==str[index];
		if (subState.match)
		{
			result2=doMatch(subState, str, index+1);
		}
		else
		{
			result2=false;
		}
	}
	return result1||result2;
}


//Let's simulate it with a state machine
bool MyName::match(char* str)
{
	State state;
	int index=0;
	//this is the very first letter, and it must be a match!!!
	//here we can define it as &quot;state 0&quot;
	state.letterIndex=0;
	state.wordIndex=0;

	if (mywords[0].myword[0]!=str[0]&amp;&amp;str[0]!='\0')
	{
		return false;
	}
	else
	{
		state.match=true;
	}
	index=1;

	return doMatch(state, str, index);
}




MyBook::MyBook()
{
	nameCount=0;
}

void MyBook::initialize()
{
	ifstream in;
	int count, unicoFlag;
	char names[MaxWordNumber][LongestWordLength];
	in.open(fileName);
	do
	{
		in&gt;&gt;count&gt;&gt;unicoFlag;
		for (int i=0; i&lt;count; i++)
		{
			in&gt;&gt;names[i];
		}
		addNewName(unicoFlag==1, count, names);
	}
	while (!in.eof());
}
	

void MyName::display()
{
	if (mywords[0].isUNICODE)
	{
		printf(&quot;unicode and corresponding pinyin is:&quot;);
	}
	for (int i=0; i&lt;wordCount; i++)
	{
		mywords[i].display();
		printf(&quot;,&quot;);
	}
	printf(&quot;\n&quot;);
}

void MyWord::display()
{
	if (isUNICODE)
	{
		printf(&quot;%s&quot;, unico2pinyin(myword));
	}
	else
	{
		printf(&quot;%s&quot;, myword);
	}
}


void MyBook::display()
{
	for (int i=0; i&lt;nameCount; i++)
	{
		mynames[i].display();
	}
}


void MyBook::addNewName(bool isUnico, int wordCount, char newWords[][LongestWordLength])
{
	char* ptr;
	int index;
	if (isUnico)
	{
		ptr=unico2pinyin(newWords[0]);
	}
	else
	{
		ptr=newWords[0];
	}
	index=findFirst(isUnico, ptr);
	//shift array to insert
	for (int i=nameCount; i&gt;index; i--)
	{
		mynames[i]=mynames[i-1];
	}
	mynames[i].wordCount=wordCount;
	for(i=0; i&lt;wordCount; i++)
	{
		mynames[index].mywords[i].isUNICODE=isUnico;
		strcpy(mynames[index].mywords[i].myword, newWords[i]);
		mynames[index].mywords[i].letterCount=strlen(mynames[index].mywords[i].toString());
	}
	nameCount++;
}



//this is the simplest conversion function, I just want to simplify my
//operation.
char* unico2pinyin(char* str)
{
	return str;
}

char* MyWord::toString()
{
	if (isUNICODE)
	{
		return unico2pinyin(myword);		
	}
	else
	{
		return myword;
	}
}

bool MyWord::isSmaller(MyWord&amp; other)
{
	char* myself, *theOther;
	myself=toString();
	theOther=other.toString();
	//same type, simple comparison
	if (isUNICODE&amp;&amp;other.isUNICODE||!isUNICODE&amp;&amp;!other.isUNICODE)
	{
		return strcmp(myself, theOther)&lt;=0;
	}
	else
	{
		//here is different type
		//different type and different &quot;class&quot;
		if (myself[0]!=theOther[0])
		{
			return strcmp(myself, theOther)&lt;0;
		}
		else
		{
			//same &quot;class&quot;, the unicode is bigger in index
			return !isUNICODE;
		}
	}
}

int MyBook::findFirst(bool isUnico, char* str)
{
	//you can use binary search too
	//char* ptr;
	//bool afterUnico=false;
	MyWord temp;
	temp.isUNICODE=isUnico;
	strcpy(temp.myword, str);
	for (int i=0; i&lt;nameCount; i++)
	{
		if (temp.isSmaller(mynames[i].mywords[0]))
		{
			return i;
		}
		/*
		//just skip until first non-unico or end of address book
		if (afterUnico)
		{
			if (!mynames[i].mywords[0].isUNICODE)
			{
				return i;			
			}
			else
			{
				ptr=unico2pinyin(mynames[i].mywords[0]);
				if (strcmp(str, ptr)&gt;0)
				{
					return i;
				}
			}
		}
		else
		{
			//in case we have only unico in address book
			if (mynames[i].mywords[0].isUNICODE)
			{
				ptr=unico2pinyin(mynames[i].mywords[0].myword);
				if (ptr[0]==ch)
				{
					afterUnico=true;
					continue;
				}
			}
			else
			{
				if (mynames[i].mywords[0].myword[0]==ch)
				{
					return i;
				}
			}
		}
		*/
	}
	//this is the case of end of address book

	return i;
}



</pre>
<pre>
</pre>
<pre>กก</pre>

<p><span lang="en-ca"><font color="#0000FF"><b>running result: (This is the 
running results of 10 seconds. The position of each quantum is</b></font></span></p>

<p><span lang="en-ca"><font color="#0000FF"><b>just approximation since I am 
trying to use integer to map double.)</b></font></span></p>

<p><font color="#0000FF"><span lang="en-ca">The input file is named &quot;nick.txt&quot; 
and it is like this:</span></font></p>

<p><span lang="en-ca">3 1 huang qing zhe<br>
2 0 xiang dong<br>
3 1 zheng shu zhao<br>
2 1 yong jun<br>
4 0 xiao xiao yu er<br>
5 1 this is a unicode test<br>
1 0 ascii<br>
2 1 many test<br>
3 1 yang zi wen<br>
3 0 hong yang yang<br>
3 0 tian an men<br>
2 0 zhang qing<br>
3 1 li yi hong<br>
3 1 xiao yong jun<br>
3 0 xiao jian dong<br>
3 1 li qing qing<br>
2 0 xu jin<br>
2 1 huang yong<br>
3 1 huang chang zheng<br>
3 1 zhu chun ming<br>
3 0 zhu chun min<br>
3 0 zhu jian min<br>
3 1 ni xiang li<br>
3 0 wu hui min<br>
3 1 li teng long<br>
3 0 chen zhi xiong<br>
3 1 hong xiang yang<br>
3 0 su cong wei<br>
3 1 su dong po<br>
2 1 yang yang<br>
2 1 yang tao<br>
3 1 yang jian zhong<br>
3 1 huang hong dao<br>
3 1 wu qi wei<br>
3 1 jiang jie shi<br>
3 0 mao ze dong<br>
2 1 zhu de<br>
3 1 ren bi shi<br>
3 0 du yu min<br>
2 0 xiao sun<br>
2 0 zhang yan<br>
3 0 li ke xin<br>
3 0 wo shi shui<br>
3 0 huang shi ren<br>
3 0 xu jia hui<br>
3 0 bei jin shi<br>
3 0 shang hai shi<br>
3 1 yang zhen ning<br>
3 1 li zheng dao<br>
3 1 wu de w<br>
4 0 a pei a wang<br>
2 0 cheng long<br>
3 1 ma jun ren<br>
2 0 liang zhan<br>
3 1 li deng hui<br>
3 0 su jin qiang<br>
3 1 huang xi jin<br>
3 0 huang tian ci<br>
2 0 tian ci<br>
3 0 zhang yun lei <br>
3 0 ma jia jun</span></p>

<p><font color="#0000FF"><span lang="en-ca">The following is the running result:</span><br>
</font>
<br>
<br>
a,pei,a,wang,<br>
ascii,<br>
bei,jin,shi,<br>
chen,zhi,xiong,<br>
cheng,long,<br>
du,yu,min,<br>
hong,yang,yang,<br>
huang,tian,ci,<br>
huang,shi,ren,<br>
unicode and corresponding pinyin is:hong,xiang,yang,<br>
unicode and corresponding pinyin is:huang,xi,jin,<br>
unicode and corresponding pinyin is:huang,hong,dao,<br>
unicode and corresponding pinyin is:huang,chang,zheng,<br>
unicode and corresponding pinyin is:huang,yong,<br>
unicode and corresponding pinyin is:huang,qing,zhe,<br>
unicode and corresponding pinyin is:jiang,jie,shi,<br>
li,ke,xin,<br>
liang,zhan,<br>
unicode and corresponding pinyin is:li,deng,hui,<br>
unicode and corresponding pinyin is:li,zheng,dao,<br>
unicode and corresponding pinyin is:li,teng,long,<br>
unicode and corresponding pinyin is:li,qing,qing,<br>
unicode and corresponding pinyin is:li,yi,hong,<br>
ma,jia,jun,<br>
mao,ze,dong,<br>
unicode and corresponding pinyin is:ma,jun,ren,<br>
unicode and corresponding pinyin is:many,test,<br>
unicode and corresponding pinyin is:ni,xiang,li,<br>
unicode and corresponding pinyin is:ren,bi,shi,<br>
shang,hai,shi,<br>
su,jin,qiang,<br>
su,cong,wei,<br>
unicode and corresponding pinyin is:su,dong,po,<br>
tian,ci,<br>
unicode and corresponding pinyin is:this,is,a,unicode,test,<br>
wo,shi,shui,<br>
wu,hui,min,<br>
unicode and corresponding pinyin is:wu,de,w,<br>
unicode and corresponding pinyin is:wu,qi,wei,<br>
xiang,dong,<br>
xiao,sun,<br>
xiao,jian,dong,<br>
xiao,xiao,yu,er,<br>
xu,jia,hui,<br>
xu,jin,<br>
unicode and corresponding pinyin is:xiao,yong,jun,<br>
unicode and corresponding pinyin is:yang,zhen,ning,<br>
unicode and corresponding pinyin is:yang,jian,zhong,<br>
unicode and corresponding pinyin is:yang,tao,<br>
unicode and corresponding pinyin is:yang,yang,<br>
unicode and corresponding pinyin is:yang,zi,wen,<br>
unicode and corresponding pinyin is:yong,jun,<br>
zhang,yun,lei,<br>
zhang,yan,<br>
zhang,qing,<br>
zhu,jian,min,<br>
zhu,chun,min,<br>
unicode and corresponding pinyin is:zheng,shu,zhao,<br>
unicode and corresponding pinyin is:zhu,de,<br>
unicode and corresponding pinyin is:zhu,chun,ming,<br>
now let's find the match<br>
find a match:shang,hai,shi,<br>
input your searching string&gt;shs<br>
find a match:shang,hai,shi,<br>
input your searching string&gt;zcm<br>
find a match:zhu,chun,min,<br>
find a match:unicode and corresponding pinyin is:zhu,chun,ming,<br>
input your searching string&gt;zhch<br>
find a match:zhu,chun,min,<br>
find a match:unicode and corresponding pinyin is:zhu,chun,ming,<br>
input your searching string&gt;</p>

<pre></pre>

<pre></pre>

<pre></pre>

<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                                   
&nbsp;&nbsp;&nbsp; <a href="PocketRuler.htm">                  







                       <img src="picture/back.gif" style="border: medium none" alt="back.gif (341 bytes)" width="32" height="35"></a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
<a href="index.htm">
<img src="picture/up.gif" style="border: medium none" alt="up.gif (335 bytes)" width="35" height="32"></a>       &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                         
<img src="picture/next.gif" style="border: medium none" alt="next.gif (337 bytes)" width="32" height="35">          


</p>

</body>

</html>