<html>

<head><script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- MyFirstUnitAd -->
<ins class="adsbygoogle"
     style="display:inline-block;width:970px;height:250px"
     data-ad-client="ca-pub-5778386704669218"
     data-ad-slot="1503492166"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>

<meta http-equiv="Content-Language" content="zh-cn">
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
<meta name="GENERATOR" content="Microsoft FrontPage 6.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<title>regular-sampling-quick-sort (MPI)</title>
</head>

<body>



<p align="left"><font size="6" color="#FF0000"><span lang="en-ca"><b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </b></span>
</font><span lang="en-ca"><font size="6" color="#FF0000"><b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
Regular-Sampling-Quick-Sorting (MPI)</b></font></span></p>

<div align="left">
  <pre><b><font color="#ff0000" size="5">A. First Edition</font></b></pre>
</div>
<div align="left">
  <pre><b><font size="3">This is simple practicing because I want to prove something.</font></b></pre>
</div>
<div align="left">
  <pre><b><font color="#ff0000" size="5"><span lang="en-ca">B</span>.<span lang="en-ca"><a name="problem"></a>The problem</span></font></b></pre>
</div>
<div align="left">
  <p ALIGN="LEFT"><span lang="en-ca"><b>Actually I want to say that this is 
	intellectually shallow because it is a simple variant of merge-sort which is 
	one of external sorting algorithm used extensively in database to sort very 
	big list which cannot be placed totally in memory.</b></span></div>
<div align="left">
  <p ALIGN="LEFT">　</div>
<div align="left">
  <b><font color="#ff0000" size="5"><span lang="en-ca"><a name="explain"></a>C</span>.<span lang="en-ca">The
  </span></font></b><span lang="en-ca"><font size="5" color="#FF0000"><b>idea of 
  program</b></font></span></div>
<div align="left">
  　</div>
<p><span lang="en-ca"><b>Originally the definition of sorting in MPI is that 
lists are scattered around a series of working clusters before sorting. After 
sorting they still are scattered in those machines except that each segment is 
sorted and if one element in A is bigger than any element in B, then no element 
in A would not be smaller than any element in B. (Is it complicated? No, it just 
says segments are sorted.)</b></span></p>
<div align="left">
  <pre><b><font color="#ff0000" size="5">D.<span lang="en-ca"><a name="Method"></a>The </span>major functions</font></b></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><font size="3"><b>I modify the algorithm a little bit because I think it is meaningless that the sorted lists are still segments </b></font></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><font size="3"><b>existed in different machines. So, I require whole list will be assembled in one node finally.</b></font></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><font size="3"><b>1. firstPhase: sorting local segment list and sample elements in sorted list.</b></font></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><font size="3"><b>2. secondPhase: sampled elements are sent to one machine, say master node, and sorted then broadcasted to all</b></font></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><font size="3"><b>working node.</b></font></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><font size="3"><b>3. thirdPhase: each working node use these samples to slice sorted list into sub lists and send back the length </b></font></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><font size="3"><b>of these sub lists back to master node so that master node can arrange to receive these sub list in corresponding</b></font></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><font size="3"><b>position in next phase.</b></font></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><font size="3"><b>4. fourthPhase: receiving these sub lists using temporary buffers . And merge these lists into one final array.</b></font></span></pre>
</div>
<div align="left">
  <pre><span lang="en-ca"><font size="3"><b>5. zeroPhase: in order to compare with speed of sorting with single machine, I invented this function.</b></font></span></pre>
</div>
<div align="left">
  <pre><b><font color="#ff0000" size="5"><span lang="en-ca">E</span>.</font></b><span lang="en-ca"><font size="5" color="#FF0000"><b>Further improvement</b></font></span></pre>
</div>
<div align="left">
  <pre>　</pre>
</div>
<div align="left">
  <pre><b><font color="#ff0000" size="5"><span lang="en-ca">F</span>.</font></b><span lang="en-ca"><font size="5" color="#FF0000"><b>File listing</b></font></span></pre>
</div>
<div align="left" style="width: 898; height: 86">
  <pre><font size="3"><b><span lang="en-ca">1. quicksort</span>.cpp</b></font></pre>
  <pre>　</pre>
  <pre><font size="3" color="#FF0000"><b><span lang="en-ca">file name: quick</span>.cpp</b></font></pre>
  <pre>#include &lt;stdlib.h&gt;
#include &lt;stdio.h&gt;
#include &lt;time.h&gt;
#include &lt;math.h&gt;
#include &quot;mpi.h&quot;

int myRank;
int mySize;

const int SingleArrayLength=10000000;
const int MaxArrayElement=1000000;
const int MinArrayElement=100;
int* myArray;

const int WorkerNumber=7;

const int SampleOffset=SingleArrayLength/WorkerNumber;
const int StartSampleOffset=WorkerNumber/2-1;


int* sampleArray;
int**mergeBuf;

MPI_Request* sampleRequests;

const int SAMPLE=100;
const int PIVOT=101;
const int MaxIntegerPrintLength=10;

void myExit()
{
	//printf(&quot;\n\nrank %d finishes\n&quot;, myRank);
}


void printArray(int* array, int length, char* comment=&quot;Array print out:&quot;);

void printArray(int* array, int length, char* comment)
{
	char* buf;
	char temp[MaxIntegerPrintLength];
	int commentLength=0;
	commentLength=strlen(comment)+5;
	buf=new char[length*MaxIntegerPrintLength+commentLength];
	sprintf(buf, &quot;rank[%d] %s:*****&quot;, myRank, comment);
	for (int i=0; i&lt;length; i++)
	{
		sprintf(temp, &quot;%d,&quot;, array[i]);
		strcat(buf, temp);
	}
	strcat(buf, &quot;\n&quot;);
	printf(buf);
	delete []buf;
}


void initialize()
{
	int i;
	atexit(myExit);
	srand(myRank*time(0));
	if (myRank==0)
	{
		sampleArray=new int[WorkerNumber*WorkerNumber];
		sampleRequests=new MPI_Request[WorkerNumber];
		myArray=new int[SingleArrayLength*WorkerNumber];
		mergeBuf=new int*[WorkerNumber];
		for (i=0; i&lt;WorkerNumber; i++)
		{
			mergeBuf[i]=new int[SingleArrayLength];
		}
	}
	else
	{
		sampleArray=new int[WorkerNumber];
		sampleRequests=new MPI_Request[1];
		//srand(time(0));
		myArray=new int[SingleArrayLength];
		for (i=0; i&lt;SingleArrayLength; i++)
		{
			myArray[i]=rand()%MaxArrayElement+MinArrayElement;
		}
	}

	

}


int intComp(const void* first, const void* second)
{
	return *(int*)first - *(int*)second;
}

void zeroPhase()
{
	int i;
	double start, end;
	if (myRank==0)
	{
		for (i=0; i&lt;WorkerNumber; i++)
		{
			sampleRequests[i]=MPI_REQUEST_NULL;
			MPI_Irecv(myArray+i*SingleArrayLength, SingleArrayLength, MPI_INT, i+1, 0, MPI_COMM_WORLD, sampleRequests+i);
		}
		MPI_Waitall(WorkerNumber, sampleRequests, MPI_STATUSES_IGNORE);
		start=MPI_Wtime();
		qsort(myArray, WorkerNumber*SingleArrayLength, sizeof(int), intComp);
		end=MPI_Wtime();
		printf(&quot;single machine sorting array of length %d takes %f\n&quot;,SingleArrayLength*WorkerNumber, end-start);
				
	}
	else
	{
		MPI_Send(myArray, SingleArrayLength, MPI_INT, 0, 0, MPI_COMM_WORLD);
	}
}		



void firstPhase()
{
	if (myRank!=0)
	{
		qsort(myArray, SingleArrayLength, sizeof(int), intComp);
		//printArray(myArray, SingleArrayLength, &quot;worker data array print out&quot;);
		//retrieve samples
		for (int i=0; i&lt;WorkerNumber; i++)
		{
			sampleArray[i]=myArray[i*SampleOffset];
		}		
	}
}


void secondPhase()
{
	int i;
	if (myRank==0)
	{
		for (i=0; i&lt;WorkerNumber; i++)
		{
			MPI_Irecv(sampleArray+i*WorkerNumber, WorkerNumber, MPI_INT, i+1, SAMPLE, MPI_COMM_WORLD, sampleRequests+i);
			//MPI_Recv(sampleArray+i*WorkerNumber, WorkerNumber, MPI_INT, i+1, SAMPLE, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
		}
		MPI_Waitall(WorkerNumber, sampleRequests, MPI_STATUSES_IGNORE);
		qsort(sampleArray, WorkerNumber*WorkerNumber, sizeof(int), intComp);
		//printArray(myArray, SingleArrayLength, &quot;worker's data array\n&quot;);
		/*
		for (i=0; i&lt;WorkerNumber*WorkerNumber; i++)
		{
			printf(&quot;sample[%d]=%d\n&quot;, i, sampleArray[i]);
		}
		*/
		//printArray(sampleArray, WorkerNumber*WorkerNumber);
		

		for (i=1; i&lt;WorkerNumber; i++)
		{
			sampleArray[i-1]=sampleArray[WorkerNumber*i+StartSampleOffset];			
		}
		MPI_Bcast(sampleArray, WorkerNumber-1, MPI_INT, 0, MPI_COMM_WORLD);
		//printArray(sampleArray, WorkerNumber-1, &quot;this is the sampel data broadcasted&quot;);

	}
	else
	{
		MPI_Ssend(sampleArray, WorkerNumber, MPI_INT, 0, SAMPLE, MPI_COMM_WORLD);
		MPI_Bcast(sampleArray, WorkerNumber-1, MPI_INT, 0, MPI_COMM_WORLD);
	}
	/*
	for (i=0; i&lt;WorkerNumber-1; i++)
	{
		printf(&quot;rank[%d][%d]=%d\n&quot;, myRank, i, sampleArray[i]);
	}
	*/
	
}

//it returns the smallest index of which the number is bigger than or equal to the key
int binarySearch(int key, int* array, int length)
{
	int front=0, end=length-1;
	if (key&gt;array[end])
	{
		return length;
	}
	if (key&lt;array[front])
	{
		return 0;
	}

	int pos=(front+end+1)/2;;
	while (front&lt;=end)
	{		
		if (key&gt;array[pos])
		{
			front=pos+1;
		}
		else
		{
			if (key&lt;array[pos])
			{
				end=pos-1;
			}
			else
			{
				break;
			}
		}
		pos=(front+end+1)/2;
	}
	
	return pos;
}



void thirdPhase()
{
	int i, flag;
	//do binary search
	if (myRank!=0)
	{
		for (i=0; i&lt;WorkerNumber-1; i++)
		{
			//printf(&quot;rank[%d]key=%d\n&quot;, myRank, sampleArray[i]);
			sampleArray[i]=binarySearch(sampleArray[i], myArray, SingleArrayLength);			
			//printf(&quot;rank[%d][%d]=%d and the data myArray[%d]=%d\n&quot;, myRank, i, sampleArray[i],sampleArray[i], myArray[sampleArray[i]] );//for testing
			//printf(&quot;before %d and after %d \n&quot;, myArray[sampleArray[i]-1], myArray[sampleArray[i]+1]); 
		}
		sampleArray[WorkerNumber-1]=SingleArrayLength;
		MPI_Ssend(sampleArray, WorkerNumber, MPI_INT, 0, PIVOT, MPI_COMM_WORLD);
		//printArray(sampleArray, WorkerNumber, &quot;worker sampleArray print out in third phase&quot;);
	}
	else
	{
		for (i=0; i&lt;WorkerNumber; i++)
		{
			sampleRequests[i]=MPI_REQUEST_NULL;
			MPI_Irecv(sampleArray+i*WorkerNumber, WorkerNumber, MPI_INT, i+1, PIVOT, MPI_COMM_WORLD, sampleRequests+i);
			//MPI_Recv(sampleArray+i*WorkerNumber, WorkerNumber, MPI_INT, i+1, PIVOT, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
			//sampleArray[(i+1)*WorkerNumber]=WorkerNumber;
		}
		//printArray(sampleArray, WorkerNumber*WorkerNumber, &quot;master sampleArray print out in third phase&quot;);
		MPI_Waitall(WorkerNumber, sampleRequests, MPI_STATUSES_IGNORE);
		/*
		for (i=0; i&lt;WorkerNumber*WorkerNumber; i++)
		{
			printf(&quot;sample[%d]=%d\n&quot;, i, sampleArray[i]);
		}
		*/
		
		
	}
}

void doMerge(int** mergeBuf, int* lengthArray,int&amp; currentPos)
{
	int indexArray[WorkerNumber];
	int i, candidate, candidateIndex;
	bool beFirst=true, allOver=false;
	for (i=0; i&lt;WorkerNumber; i++)
	{
		indexArray[i]=0;
	}
	do
	{		
		beFirst=true;
		allOver=true;
		for (i=0; i&lt;WorkerNumber; i++)
		{
			if (indexArray[i]&lt;lengthArray[i])
			{
				allOver=false;
				if (beFirst)
				{
					beFirst=false;
					candidate=mergeBuf[i][indexArray[i]];
					candidateIndex=i;
				}
				else
				{
					if (candidate&gt;mergeBuf[i][indexArray[i]])
					{
						candidate=mergeBuf[i][indexArray[i]];
						candidateIndex=i;
					}
				}
			}
		}
		if (allOver)
		{
			break;
		}
		myArray[currentPos]=candidate;
		currentPos++;
		indexArray[candidateIndex]++;
	}
	while (true);
}
		
						
			
		


void fourthPhase()
{
	int i, j, flag;
	int* sizePtr;
	int* dataPtr;
	int currentPos=0;
	int previous, current;
	int length;
	//MPI_Request* tempRequests;
	int lengthArray[WorkerNumber];
	if (myRank==0)
	{
		//tempRequests=new MPI_Request[WorkerNumber*WorkerNumber];
		//printArray(sampleArray, WorkerNumber*WorkerNumber, &quot;before 4th phase, let' see sample Array\n&quot;);
		for (i=0; i&lt;WorkerNumber; i++)//the index of  worker node
		{
			for (j=0; j&lt;WorkerNumber; j++)//the index within worker node index
			{
				sizePtr=sampleArray+j*WorkerNumber+i;
				if (i==0)
				{
					previous=0;	
					current=*sizePtr;
				}
				else
				{
					if (i==WorkerNumber-1)
					{
						current=SingleArrayLength;
					}
					else
					{
						current=*sizePtr;
					}
					previous=*(sampleArray+j*WorkerNumber+i-1);
				}
				//printf(&quot;\n current=%d, previous=%d\n&quot;, current, previous);
				lengthArray[j]=current - previous;
				//currentPos+=length;
				//dataPtr=myArray+currentPos;
				sampleRequests[j]=MPI_REQUEST_NULL;
				//printf(&quot;\nmaster begins\n&quot;);
				if (lengthArray[j]&gt;0)
				{
					//MPI_Irecv(dataPtr, length, MPI_INT, j+1, j*10+i, MPI_COMM_WORLD, tempRequests+j*WorkerNumber+i);
					//printf(&quot;\nmaster begin to recv data from rank %d of length %d\n&quot;, j+1, lengthArray[j]);
					MPI_Irecv(mergeBuf[j], lengthArray[j], MPI_INT, j+1, j*10+i, MPI_COMM_WORLD, sampleRequests+j);
					//MPI_Recv(mergeBuf[j], lengthArray[j], MPI_INT, j+1, j*10+i, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
				}

				//printf(&quot;\nmaster after prints\n&quot;);

			}
			MPI_Waitall(WorkerNumber, sampleRequests, MPI_STATUSES_IGNORE);
			doMerge(mergeBuf, lengthArray, currentPos);
			/*
			printf(&quot;\nmaster after tests of %d\n&quot;, i+1 );
			for (int k=0; k&lt;WorkerNumber; k++)
			{	
				//printf(&quot;\nmaster going to print %d\n&quot;, lengthArray[k]);			
				if (lengthArray[k]&gt;0)
				{
					printArray(mergeBuf[k], lengthArray[k], &quot;Master receive segment&quot;);
				}
			}
			*/

		}
		//MPI_Testall(WorkerNumber*WorkerNumber, tempRequests, &amp;flag, MPI_STATUSES_IGNORE);
	}
	else
	{
		for (i=0; i&lt;WorkerNumber; i++)
		{
			sizePtr=sampleArray+i;
			if (i==0)
			{
				previous=0;
				current=*sizePtr;
			}
			else
			{
				if (i==WorkerNumber-1)
				{
					current=SingleArrayLength;
				}
				else
				{
					current=*sizePtr;
				}
				previous=*(sampleArray+i-1);

			}
			dataPtr=myArray+previous;
			length=current-previous;
			currentPos+=length;
			if (length&gt;0)
			{
				//printArray(dataPtr, length, &quot;going to send segment&quot;);
				MPI_Send(dataPtr, length, MPI_INT, 0, (myRank-1)*10+i, MPI_COMM_WORLD);
			}
		}
	}
}



void testArray()
{
	int previous=myArray[0], current=myArray[0];
	for (int i=1; i&lt;WorkerNumber*SingleArrayLength; i++)
	{		
		
		current=myArray[i];
		if (current&lt;previous)
		{
			printf(&quot;sorting error at %d with %d &gt; %d\n&quot;, i, previous, current);
			//exit(4);
		}
		previous=current;
		//printf(&quot;rank[%d][%d]=%d\n&quot;, myRank, i, myArray[i]);
	}
}

		





int main(int argc, char** argv)
{
	double start, end;
	MPI_Init(&amp;argc, &amp;argv);                 
   	MPI_Comm_rank(MPI_COMM_WORLD, &amp;myRank);
    	MPI_Comm_size(MPI_COMM_WORLD, &amp;mySize);  
	initialize();
	
	zeroPhase();
	if (myRank==0)
	{
		start=MPI_Wtime();
	}


	firstPhase();
	//printf(&quot;\nfirst phase ends\n&quot;);

	secondPhase();	
	//printf(&quot;\nsecond phase ends\n&quot;);

	thirdPhase();
	//printf(&quot;\nthird phase ends\n&quot;);

	fourthPhase();

	//printf(&quot;\nfourth phase ends\n&quot;);

	
	if (myRank==0)
	{
		end=MPI_Wtime();
		printf(&quot;distributing system sorting takes %f\n&quot;, end-start);
	}
	

	


	return 0;
}</pre>
  <pre>　</pre>
	<pre>　</pre>
	<pre><font color="#FF0000" size="3">running result:</font></pre>
	<pre><font size="3" color="#FF0000">single machine sorting array of length 7000 takes 0.003865
distributing system sorting takes 0.901712
0.000u 0.012s 0:01.69 0.5% 0+0k 0+0io 0pf+0w

single machine sorting array of length 70000 takes 0.042404
distributing system sorting takes 0.899044
0.000u 0.008s 0:01.99 0.0% 0+0k 0+0io 0pf+0w


single machine sorting array of length 700000 takes 0.499217
distributing system sorting takes 0.818472
0.004u 0.012s 0:02.23 0.4% 0+0k 0+0io 0pf+0w


single machine sorting array of length 7000000 takes 10.278250
distributing system sorting takes 3.047682
0.000u 0.004s 0:18.02 0.0% 0+0k 0+0io 0pf+0w


single machine sorting array of length 70000000 takes 66.025549
distributing system sorting takes 13.169787
0.004u 0.004s 1:30.91 0.0% 0+0k 0+0io 0pf+0w
</font></pre>
	<pre>　</pre>
</div>
<pre>
</pre>

<pre>　</pre>

<pre><span lang="en-ca">			</span></pre>

<pre><span lang="en-ca">				</span> <a href="game24.htm"><img src="picture/back.gif" style="border: medium none" alt="back.gif (341 bytes)" width="32" height="35"></a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a href="index.htm"><img src="picture/up.gif" style="border: medium none" alt="up.gif (335 bytes)" width="35" height="32"></a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <img src="picture/next.gif" style="border: medium none" alt="next.gif (337 bytes)" width="32" height="35"> </pre>

<pre></pre>

<pre></pre>

<pre></pre>

<pre></pre>

<pre></pre>

<pre></pre>

<pre></pre>

<pre></pre>

<pre></pre>

<pre></pre>

<pre></pre>

<pre></pre>

<pre></pre>

<pre></pre>

<pre></pre>

<pre></pre>

<pre></pre>

<pre></pre>

<pre></pre>

<pre></pre>

<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                                   
&nbsp;&nbsp;&nbsp;</p>

</body>

</html>