Java
LAB 2: MPI (C++)
Submissiondue8/31/ 20- 11:55 pm
NOTE: Skeletons are provided for Part Bonly. Data files will be provided for all problems.
NOTE: UseMPI_Scatterto distribute the data.
Part A (Emulate a distributed system using MPI messaging):
You are given a grayscale image with 8-bit pixels (as matrix D). You goal is to compute an intensity histogram of D in a distributed way. Use the BFS algorithm (see below) to visit every node and establish parent-child relationship. A parent node collects partial histograms of its children and adds its own before sending its partial histogram to its parent. A leaf node just returns its histogram.
Your program will emulate a distributed system by only sending messages between neighboring node. You are given a matrix A describing the DS connectivity (an adjacency bit matrix). Rowiin the matrix A describes nodes that nodeiis connected to (a 1 in position A[i,j] means node j is a neighbor ofi). A node can send messagesonlyto its neighbors (per the adjacency matrix).You can to useMPI_Recvspecifying source and tag as MPI_ANY_SOURCE, MPI_ANY_TAG.These are wild cards, used when source is unknown (as in a message from a parent).After the blockingRecvreturns you can find out the sender and tag in the status returned byRecv.You can also use non-blockingMPI_IrecvwithMPI_Iprobe.Send/Recvroutines are the only ones you can use after initialization.
An intensity histogram calculates the number of pixels that have the same intensity in an image, e.g. the same pixel value, for every pixel value between 0 and 255.
The computation is initiated by node X, where X is read from the input file.
Your implementation will takefourcommand line arguments (in this order): Grayscale PGM image, path to text file with the adjacency matrix, initiator rank, and output file name.
Use the grayscale PGM image from Lab 1. You can use the code from the lab to read an image into the matrix D.
A sample text file with an adjacency matrix with 4 nodes will look as follows:
0
1
0
1
1
0
1
0
0
1
0
1
1
0
1
0
It shows that Node 1 is connected to 2 and 4,Node 2 is connected to 1 and 3, Node 3 is connected to 2 and 4, and Node 4 is connected to 1 and 3.
Print histogram output into the output file where each line will represent the # of pixels for each intensity. Therefore, the output file MUST have 256 lines only.
The BFS algorithm is shown at the end of this document.
Part B1 (Word Frequency: Reduction):
Implement a simple and parallel version of an algorithm to count the word frequency using MPI.To aggregate your results in the Master process you must useMPI_Reduce.
intMPI_Reduce(constvoid *sendbuf, void *recvbuf,intcount,MPI_Datatypedatatype,MPI_Opop,introot,MPI_Commcomm)
Part B2 (Word Frequency: Ring topology):
Use the same implementation of the algorithm as PartB1 but instead of usingMPI_reduce, pass the partial result between the tasks as messages. You will use a point-to-point communication between adjacent tasks, e.g.iand i-1.
Look at the provided skeleton for CLargsand for choosing between part B1 and B2.
Timing (Part B1 and B2):
Report the timings for Part B1 and B2 with 2, 4, 8, and 16 processes. Be careful when timing your code, you want to time only the processing time, not reading the input. Only the Master process should time the execution, not the worker threads. Create a graph from that timing data and explain what they show.
Also explain:
1.What is the topology used by MPI to implements its Reduce?
2.How does the ring topology affect the runtime?
3.Is there a case where the ring topology may be faster?
4.How many messages are passed between machines when program is run with 2, 4, 8, and 16 processes?
See below for the submission instructions. How to time the implementation will be discussed in the discussion.
NOTE: See theMPI How-Toguide on the assignment web page before running your implementations onopenlabservers.
Submit via Canvas:
YOUMUSTUSE the file names below
1.Implementation of your C++ program:ImplementationA.cppandImplementationB.cpp(includes both B1 and B2)
2.Analysis of the timing for Part B1 and B2:Timing.pdf
Point Breakdown:
Part A: Implementation -> 40pts
Part B1 and B2: Implementation-> 50pts (25pts each), Timing -> 10pts
USEFUL LINKS:
MPI Tutorial
NOTE:The TA cannot solve the exercises for you, but he can answer your questions!
The BFS Algorithm:
Assumption:Anundirectednetwork of processes.
The algorithm performs a complete network traversal as in
https://www.cs.yale.edu/homes/aspnes/pinewiki/SynchronousBreadthFirstSearch.html