shared-memoryprogramming(编辑修改稿)内容摘要:
wer should be Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Race Condition Time Line T h r e a d A T h r e a d BV a l u e o f a r e a1 1 . 6 6 7+ 3 . 7 6 5+ 3 . 5 6 31 1 . 6 6 71 5 . 4 3 21 5 . 2 3 0Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. critical Pragma Critical section: a portion of code that only thread at a time may execute We denote a critical section by putting the pragma pragma omp critical in front of a block of C code Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Correct, But Inefficient, Code double area, pi, x。 int i, n。 ... area =。 pragma omp parallel for private(x) for (i = 0。 i n。 i++) { x = (i+)/n。 pragma omp critical area += ( + x*x)。 } pi = area / n。 Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Source of Inefficiency Update to area inside a critical section Only one thread at a time may execute the statement。 ., it is sequential code Time to execute statement significant part of loop By Amdahl’s Law we know speedup will be severely constrained Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Reductions Reductions are so mon that OpenMP provides support for them May add reduction clause to parallel for pragma Specify reduction operation and reduction variable OpenMP takes care of storing partial results in private variables and bining partial results after the loop Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. reduction Clause The reduction clause has this syntax: reduction (op :variable) Operators + Sum * Product amp。 Bitwise and | Bitwise or ^ Bitwise exclusive or amp。 amp。 Logical and || Logical or Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. finding Code with Reduction Clause double area, pi, x。 int i, n。 ... area =。 pragma omp parallel for \ private(x) reduction(+:area) for (i = 0。 i n。 i++) { x = (i + )/n。 area += ( + x*x)。 } pi = area / n。 Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Performance Improvement 1 Too many fork/joins can lower performance Inverting loops may help performance if Parallelism is in inner loop After inversion, the outer loop can be made parallel Inversion does not significantly lower cache hit rate Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Performance Improvement 2 If loop has too few iterations, fork/join overhead is greater than time savings from parallel execution The if clause instructs piler to insert code that determines at runtime whether loop should be executed in parallel。 ., pragma omp parallel for if(n 5000) Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Performance Improvement 3 We can use schedule clause to specify how iterations of a loop should be allocated to threads Static schedule: all iterations allocated to threads before any iterations executed Dynamic schedule: only some iterations allocated to threads at beginning of loop’s execution. Remaining iterations allocated to threads that plete their assigned iterations. Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Static vs. Dynamic Scheduling Static scheduling Low overhead May exhibit high workload imbalance Dynamic scheduling Higher overhead Can reduce workload imbalance Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Chunks A chunk is a contiguous range of iterations Increasing chunk size reduces overhead and may increase cache hit rate Decreasing chunk size allows finer balancing of workloads Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. schedule Clause Syntax of schedule clause schedule (type[,chunk ]) Schedule type required, chunk size optional Allowable schedule types static: static allocation dynamic: dynamic allocation guided: guided selfscheduling runtime: type chosen at runtime based on value of environment variable OMP_SCHEDULE Copyright 169。 The McGrawHill Companies, Inc. Permission required for reproduction or display. Scheduling Options schedule(static): block allocation of about n/t contiguous iterations to each thread schedule(static,C): interleaved allocation of chunks of size C to threads schedule(dynamic): dynamic oneatatime allocation of iterations to threads schedule(dynamic,C): dynamic allocation of C iterations at a time to threads Co。shared-memoryprogramming(编辑修改稿)
相关推荐
图 从 web 和 GRI 分析结果来看可以找出三组强连接: ( 1) beer、 cannedveg、 frozenmeal ( 2) wine、 confectionery ( 3) fruitveg、 fish 从而可以归纳出三类客户,通过 derive 节点生成一个标记字段来区别这三类客户,在 derive 节点中增加一个 set 类型字段,如下: 更正:上式中 3 对应得条件应为
围小于 .001,所以估计在迭代次数 12 处终止。 SPSS 19(中文版 )统计分析实用教程 电子工业出版社 32 模型分类预测值表 二元 Logistic回归分析 此时模型的预测准确率已达到97%。 表格从左到右依次表示变量及常数项的系数值( B)、标准误差( .)、 Wald卡方值、自由度( df)、相伴概率( Sig.)、 Exp( B)。 由于各回归系数均为正数
频数分析 t 检 验 单样本 t检验 one sample t test 配对资料 t检验 paired samples t test 方差齐性检验 homogeneity test 两独立样本 t检验 tow sample t test for independent sample 校正 t检验 separate variance estimation t test
换上平民服装与杜元纪出城东行,登上古坟墓,观望云气。 有人告发李义府窥测灾异,图谋不轨。 他又派遣儿子右司议郎李津找长孙无忌的孙子长孙延,收受七百缗钱后,授给长孙延司津监的官职。 右金吾仓曹参军杨行颖将此事告发。 夏季,四月,乙丑(疑误),朝廷将李义府逮捕入狱,派遣司刑太常伯刘祥道与御史、详刑寺官员共同审讯,还命令司空李世 监督此事。 他所犯罪行都属实。 戊子(初五),唐高宗下诏令
stdt yymmdd10. exchflg $1. stktype $1.。 informat stkcd $6. lstknm $12. lstdt yymmdd10. delistdt yymmdd10. exchflg $1. stktype $1.。 label stkcd=39。 股票代码 |stock code39。 lstknm=39。 最新股票名称 |latest stock
动平均线相应的程序如下: Data MAV。 Input date date7. close。 Moveaver=(close+lag1(close)+lag2(close))/3。 Card。 19MAY99 20MAY99 21MAY99 24MAY99 25MAY99 Proc print data= MAV。 Run。 二、 实现选择( SELECT 语句) 在 DATA步中使用