Once you have implemented on your own laptop the program, in order to test it on andromeda (the 8 core, 16 contexts machine remotely available, do the following steps:
To increase the time spent in the f(x) computed by the pipeline stage or by the map workers, use a loop such as
for(int i=0; i<LARGEVALUE; i++) x = sin(x);
For large values of iterations (millions) it is easy to get a dely ranging in fractions of seconds.
Using C (or C++) and pthread (only), implement a three stage pipeline where:
Each stage should be implemented as a concurrent activity (a pthread).
The stream should be implemented in such a way a special tag “EOS” is propagate when no more stream items are available. Concurrent activities receiving (or generating, in case of the first stage) the EOS should propagate (if the case) the EOS and then terminate.
The communication channels implementing the stream should be properly implemented using any of the standard data structures available in C/C++ or in the standard libraries.
Proper “timing” code should be used to print the amount of time spent computing the whole pipeline.
Using a code similar to the one used to implement the farm (the one discussed in class lessons), implement a map where a floating point vector of N items is processed by *NW* workers to obtain the vector with each item being the square of its original value. As for the other exercise, you should implement the map using (only) pthreads. Once it works, modify the code in such a way a stream of vectors is processed, rather than a single one.
Set up a pipeline filtering a stream of integers from 2 to MAX, such that:
2 [GEN] ---> [CHK( )] ---> [CHK( )] ---> [CHK( )] ---> [DISCARD] 3 [GEN] ---> [CHK(2)] ---> [CHK( )] ---> [CHK( )] ---> [DISCARD] 4 3 [GEN] ---> [CHK(2)] ---> [CHK( )] ---> [CHK( )] ---> [DISCARD] 5 (4) 3 4 discarded [GEN] ---> [CHK(2)] ---> [CHK( )] ---> [CHK( )] ---> [DISCARD] 6 5 [GEN] ---> [CHK(2)] ---> [CHK(3)] ---> [CHK( )] ---> [DISCARD] 7 (6) 5 6 discarded [GEN] ---> [CHK(2)] ---> [CHK(3)] ---> [CHK( )] ---> [DISCARD] 8 7 5 [GEN] ---> [CHK(2)] ---> [CHK(3)] ---> [CHK(5)] ---> [DISCARD]
A pipeline may be created and stages may be added by invoking the method add_stage, e.g. as in
myPipe.add_stage(new Stage(...))
Consider the possibility to list all the prime numbers in an interval [MIn, MAX] by:
This implies using a completely different parallel pattern with respect to the previous exercise, of course.
Sample solutions for the prime number examples here
Implement a C++ program computing the standard ijk matrix multiplication algorithm between two matrices A and B of size MxP and PxN, respectively.
Then parallelize the code using the FastFlow ParallelFor and ParalleForReduce patterns. Implement three versions:
The seq ijk matrix multiplication code looks as follows:
for(size_t i=0; i<M; i++) for(size_t j=0; j<P; j++) { C[i][j] = 0.0; for(size_t k=0; k<P; k++) C[i][j] += A[i][k]*B[k][j]; }
See the matmul example in the FastFlow tutorial source code.
Implement by using the FastFlow MDF pattern (ff_mdf), the matrix multiplication algorithm starting from the following sequential code (matrices are of size NxN, double precision elements) :
for(size_t i=0;i<N; ++i) for(size_t j=0;j<N;++j) { A[i*N+j] = i+j; B[i*N+j] = i*j; } for(size_t i=0;i<N; ++i) for(size_t j=0;j<N;++j) { C[i*N +j] = 0; for(size_t k=0;k<N;++k) { C[i*N + j] += A[ i*N+k ]*B[ j*N+k ]; // B is transposed ! } }
The initialization of the two matrices A and B has to be overlapped with the computation of the elements of the C matrix.
Sample code here.
Sample code here,