Day | Hours | Argument | Blackboard | Multimedia |
Sept. 21 | 0 | Lessons will start on Sept. 22nd |
Sept. 22 | 2 | Course introduction | blackboard | Video |
Sept. 23 | 2 | Parallel programming: concurrent activity and orchestration concepts. Sample use cases (with patterns) | blackboard | Video1hVideo2h |
Sept. 28 | 2 | Threadpool with C and C++ (Code) | blackboard | Video1hVideo2h |
Sept. 29 | 2 | Threadpool with Java. Time measure tools. Performance measures. (Code) | blackboard | Video1hVideo2h |
Sept. 30 | 2 | Concurrent activity graph. Non functional concerns. | blackboard | Video1hVideo2h |
Oct. 5 | 2 | More on non functional concerns: chekpointing (fault tolerance), load balancing (performance) | blackboard | Video1hVideo2h |
Oct. 6 | 2 | Task and data parallelism (with CA graphs) | blackboard | Video1hVideo2h |
Oct. 7 | 2 | Performance modelling of parallel patterns: pipeline. | blackboard | Video1hVideo2h |
Oct. 12 | 2 | More on stream parallel patterns: task farm. Implementation of task farm: E-W*-C and master worker. Using the same implementation schema for data parallel map. (TODO: have a look at OpenMP 4.0 by Wednesday (only parts agreed during lesson) | blackboard | Video1hVideo2h |
Oct. 13 | 2 | More on data parallel patterns: reduce. Using the farm template (E-string(W)-C) to implement a reduce. Algorithmic skeletons: introduction and historical perspective. | blackboard | Video1h Video2h not available |
Oct. 14 | 2 | OpenMP: classroom discussion | blackboard | Video1hVideo2h |
Oct. 19 | 2 | More on OpenMP: variable clauses (private,lastprivate,firstprivate) and reduce clauses. Forking of tasks to the threadpool. Skeleton frameworks: Compositionality | blackboard | Video1hVideo2h |
Oct. 20 | 2 | Sample code in OpenMP. Template based skeleton framework implementation | blackboard | Video1hVideo2h |
Oct. 21 | 2 | More on template based implementation for skeleton frameworks: optimizing parameters (par degree). Macro data flow implementation. | blackboard | Video1hVideo2h |
Oct. 26 | 2 | Macro Data Flow optimizations. Introduction to Skandium | blackboard | Video1hVideo2h |
Oct. 27 | 2 | Skandium: performance, impact of the implementation schema on performance, stream parallel rewritings. Functional semantics of skeletons and motivations of the rewriting rules. | blackboard | Video1hVideo2h |
Oct. 28 | 2 | More on rewriting rules. Normal form as sequence of rewriting. Searching a tree of equivalent skeleton trees. Assignment of Game of Life excercise | blackboard | Video1hVideo2h |
Nov. 9 | 2 | Access to Xeon PHI and compiler workflow. Introduction to vectorization: principles, condition to vectorize, icc compiler flags. | blackboard | Video1hVideo2h |
Nov. 10 | 2 | Introduction to FastFlow. Building a simple pipeline. ClassWork1. | Slides | Video1h |
Nov. 11 | 2 | Building a task-farm in FastFlow. Mixing pipeline and task-farm. ClassWork2. Assignment of ClassWork3. | Slides | not available |
Nov. 16 | 2 | More on vectorization: memory alignment and flags (Intel vectorization material here and here. Introduction to SKEPU. | blackboard | Video1hVideo2h (sorry, no audio (alsa problem)) |
Nov. 17 | 2 | More on FastFlow task-farm pattern. Master-Worker computation, feedback channels, scheduling policies. How to define your own scheduler for the farm. ClassWork2 discussion. Assignment of ClassWork4. | Slides | Video1h |
Nov. 18 | 2 | FastFlow ParallelFor and ParallelForReduce patterns. ClassWork4 discussion. Assignment of ClassWork5. | Slides | Video1h |
Nov. 23 | 2 | Autonomic mangement of non functional concerns | blackboard | Video1hVideo2h |
Nov. 24 | 2 | Autonomic mangement of non functional concerns: hierarchical management, multiple concern management. | blackboard | Video1hVideo2h |
Nov. 25 | 2 | Parallel design patterns | blackboard | Video1hVideo2h |
Nov. 30 | 2 | ParallelFor* iteration scheduling policies. FastFlow map. Nesting data-parallel computations inside pipeline and task-farm patterns. ClassWork5 discussion. Assignment of ClassWork6. | Slides | Video1h |
Dec. 1 | 2 | Discussion of some parallel applications developend using FastFlow. Completion of previous class works. | Slides | Video1h |
Dec. 2 | 2 | Debugging and profiling tools. How to use Intel vtune amplifier (matrix multiplication example). FastFlow memory allocator. | Slides | not available |
Dec 3 | 2 | Parallel design patterns: how to use the design space hierarchy. Implementation of parallel applications on COW/NOW: principles, client/server paradigm, Port assignment, discovery, socket syscalls. | blackboard | Video1hVideo2h (first part missing due to my mistake while recording) |
Dec 7 | 2 | Sample code for the name server for channel-address association with TCP/IP sockets. Implementation of a pipeline on a COW with sockets. RPC and rpcgen. | blackboard | Video1hVideo2h |
Dec 9 | 2 | Discussion of the final project. More on RPCGEN (sequential and multithreaded execution. RMI in Java (outline) | blackboard | Video1hVideo2h |
Dec 14 | 2 | Accessing the Xeon PHI as a coprocessor: using sockets, using SCIF, with offloading pragmas. | blackboard | Video1h-startVideo1h-endVideo2h |
Dec 15 | 2 | Structuring RTS: RISC parallel building blocks. | blackboard | Video12h |