shared-memoryprogramming(编辑修改稿)内容摘要:

wer should be Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Race Condition Time Line T h r e a d A T h r e a d BV a l u e o f a r e a1 1 . 6 6 7+ 3 . 7 6 5+ 3 . 5 6 31 1 . 6 6 71 5 . 4 3 21 5 . 2 3 0Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. critical Pragma  Critical section: a portion of code that only thread at a time may execute  We denote a critical section by putting the pragma pragma omp critical in front of a block of C code Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Correct, But Inefficient, Code double area, pi, x。 int i, n。 ... area =。 pragma omp parallel for private(x) for (i = 0。 i n。 i++) { x = (i+)/n。 pragma omp critical area += ( + x*x)。 } pi = area / n。 Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Source of Inefficiency  Update to area inside a critical section  Only one thread at a time may execute the statement。 ., it is sequential code  Time to execute statement significant part of loop  By Amdahl’s Law we know speedup will be severely constrained Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Reductions  Reductions are so mon that OpenMP provides support for them  May add reduction clause to parallel for pragma  Specify reduction operation and reduction variable  OpenMP takes care of storing partial results in private variables and bining partial results after the loop Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. reduction Clause  The reduction clause has this syntax: reduction (op :variable)  Operators  + Sum  * Product  amp。 Bitwise and  | Bitwise or  ^ Bitwise exclusive or  amp。 amp。 Logical and  || Logical or Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. finding Code with Reduction Clause double area, pi, x。 int i, n。 ... area =。 pragma omp parallel for \ private(x) reduction(+:area) for (i = 0。 i n。 i++) { x = (i + )/n。 area += ( + x*x)。 } pi = area / n。 Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Performance Improvement 1  Too many fork/joins can lower performance  Inverting loops may help performance if Parallelism is in inner loop After inversion, the outer loop can be made parallel Inversion does not significantly lower cache hit rate Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Performance Improvement 2  If loop has too few iterations, fork/join overhead is greater than time savings from parallel execution  The if clause instructs piler to insert code that determines at runtime whether loop should be executed in parallel。 ., pragma omp parallel for if(n 5000) Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Performance Improvement 3  We can use schedule clause to specify how iterations of a loop should be allocated to threads  Static schedule: all iterations allocated to threads before any iterations executed  Dynamic schedule: only some iterations allocated to threads at beginning of loop’s execution. Remaining iterations allocated to threads that plete their assigned iterations. Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Static vs. Dynamic Scheduling  Static scheduling Low overhead May exhibit high workload imbalance  Dynamic scheduling Higher overhead Can reduce workload imbalance Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Chunks  A chunk is a contiguous range of iterations  Increasing chunk size reduces overhead and may increase cache hit rate  Decreasing chunk size allows finer balancing of workloads Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. schedule Clause  Syntax of schedule clause schedule (type[,chunk ])  Schedule type required, chunk size optional  Allowable schedule types  static: static allocation  dynamic: dynamic allocation  guided: guided selfscheduling  runtime: type chosen at runtime based on value of environment variable OMP_SCHEDULE Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Scheduling Options  schedule(static): block allocation of about n/t contiguous iterations to each thread  schedule(static,C): interleaved allocation of chunks of size C to threads  schedule(dynamic): dynamic oneatatime allocation of iterations to threads  schedule(dynamic,C): dynamic allocation of C iterations at a time to threads Co。
阅读剩余 0%
本站所有文章资讯、展示的图片素材等内容均为注册用户上传(部分报媒/平媒内容转载自网络合作媒体),仅供学习参考。 用户通过本站上传、发布的任何内容的知识产权归属用户或原始著作权人所有。如有侵犯您的版权,请联系我们反馈本站将在三个工作日内改正。