1- What does each of these clauses do: copy(list) copyin(list) copyout(list) create(list) 2- What was the reason for the poor performance in the following code(page 27 of presentation 1) Modify the code so the performance improves, explain. 3- Explain the following code(page 15 presentation 1) What is the restrict keyword for? 4- How would you parallelise the following code in OpenMp? in OpenAcc?(page 9 presentation 1) (solution for this example not entirely in slides) How many kernels does this code launch when parallelised with openACC? 5- This kernel belongs to the dot.cu program, explain the main idea behind it. (kernel in dot.cu) What does the loop at the end of the kernel do? 6- What command do you use to run a program (say matrixMulCublas) on node 2 of our local cluster? On node 17? Suppose that node 2 has the Fermi Tesla M2090 card whereas node 17 has the Kepler card (k20), and that the code of our program makes use of Cuda5, where do you expect to see better performance? how much increase in the performance (measured in GFLOPs) do you expect to see?