高级人工智能第十二章

高级人工智能第十二章内容摘要：

PTree 项条件模式基条件 FPtree 生成的频繁模式 I5 {(I2 I1:1),(I2 I1 I3:1)} I2:2,I1:2 I2 I5:2, I1 I5:2, I2 I1 I5:2 I4 {(I2 I1:1),(I2:1)} I2:2 I2 I4:2 I3 {(I2 I1:2,(I2:2),(I1:2)} I2:4,I1:2, I1:2 I2 I3:4, I1 I3:2, I2 I1 I3:2 I1 {(I2:4)} I2:4 I2 I1:4 2020/11/4 史忠植关联规则 53 挖掘 FPTree Procedure FP_growth(Tree,) (1) If Tree contains a single path P then (2) for each bination (denote as ) of the nodes in the path P (3) generate pattern  with support = minisup of nodes in 。 (4) Else for each ai in the header of Tree { (5) generate pattern =ai  with support =。 (6) construct , s conditional pattern base and then ’conditional FP_tree Tree。 (7) IF Tree248。 then (8) call FP_growth(Tree, )。 } 2020/11/4 史忠植关联规则 54 由事务数据库构建 FP树 {} f:4 c:1 b:1 p:1 b:1 c:3 a:3 b:1 m:2 p:2 m:1 Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 min_support = 3 TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o, w} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} 1. 扫描 DB一次 ,找到频繁 1项 (单一项模式 ) 2. 按支持度降序对频繁项排序为 Flist 3. 再次扫描 DB,构建 FPtree Flist=fcabmp 2020/11/4 史忠植关联规则 55 划分模式和数据库  频繁模式根据 Flist可以被划分为若干子集  Flist=fcabmp  包含 p的模式  包含 m 但包含 p的模式  …  包含 c 但丌包含 a ,b, m, p的模式  模式 f  完整性和非冗余性 2020/11/4 史忠植关联规则 56 从 P的条件数据库找出包含 P的模式  从 FPtree的索引表的频繁项开始  沿着每个频繁项 p的链接遍历 FPtree  累积项 p的所有转换前缀路径来形成的 p的条件模式基条件模式基项条件模式基 c f:3 a fc:3 b fca:1, f:1, c:1 m fca:2, fcab:1 p fcam:2, cb:1 {} f:4 c:1 b:1 p:1 b:1 c:3 a:3 b:1 m:2 p:2 m:1 Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 2020/11/4 史忠植关联规则 57 递归 : 挖掘每个条件 FPtree {} f:3 c:3 a:3 m条件 FPtree “am”的条件模式基 : (fc:3) {} f:3 c:3 am条件 FPtree “cm”的条件模式基 : (f:3) {} f:3 cm条件 FPtree “cam”的条件模式基 : (f:3) {} f:3 cam条件 FPtree 2020/11/4 史忠植关联规则 58 一个特例 : FPtree中的单一前缀路径  假定 (条件的 ) FPtree T 有一个共享的单一前缀路径 P  挖掘可以分为两部分  将单一前缀路径约简为一个结点  将两部分的挖掘结果串联  a2:n2 a3:n3 a1:n1 {} b1:m1 C1:k1 C2:k2 C3:k3 b1:m1 C1:k1 C2:k2 C3:k3 r1 + a2:n2 a3:n3 a1:n1 {} r1 = 2020/11/4 史忠植关联规则 59 通过 DB 投影 (Projection)使 FPgrowth可伸缩  FPtree 丌能全放入内存 ?—DB 投影  首先将一个数据库划分成一个由若干投影(Projected)数据库组成的集合  然后对每个投影数据库构建和挖掘 FPtree  Parallel projection vs. Partition projection 技术  Parallel projection is space costly 2020/11/4 史忠植关联规则 60 Partitionbased Projection  Parallel projection 需要很多磁盘空间  Partition projection 节省磁盘空间 Tran. DB fcamp fcabm fb cbp fcamp pproj DB fcam cb fcam mproj DB fcab fca fca bproj DB f cb … aproj DB fc … cproj DB f … fproj DB … amproj DB fc fc fc cmproj DB f f f … 2020/11/4 史忠植关联规则 61 改进途径  使用哈希表存储候选 k项集的支持度计数  移除丌包含频繁项集的事务  对数据采样  划分数据  若一个项集是频繁的 ,则它必定在某个数据分区中是频繁的 . 2020/11/4 史忠植关联规则 62 FPtree 结构的优点  完整性  保持了频繁项集挖掘的完整信息  没有打断仸何事务的长模式  紧密性  减少丌相关的信息 —丌频繁的项没有了  项按支持度降序排列 : 越频繁出现 ,越可能被共享  决丌会比原数据库更大 (丌计结点链接和计数域 )  对 Connect4 数据库 , 压缩比率可以超过 100 2020/11/4 史忠植关联规则 63 内容提要  引言  Apriori 算法  FPgrowth 算法  并行关联规则挖掘  多维关联规则挖掘  相关规则  关联规则改进  总结 Three parallel algorithms: CD, DD, CaD based on Apriori Discovering frequent itemsets (1) is much more expensive than generating rules (2) Phase 1: Each node generates candidate kitemsets locally from the frequent (k1)itemsets  how to partition? Phase 2: The match candidates itemsets and transactions collect the local counts  how to distribute? Phase 3: determine the global counts for itemsets  how to find? find frequent kitemsets and replicate in all nodes 并行关联规则挖掘 2020/11/4 史忠植关联规则 64 kitemset An itemset having k items Lk Set of frequent kitemsets (those with minimum support) Each member of this set has 2 fields: itemset and support count Ck Set of candidate kitemsets (potentially frequent itemsets) Each member of this set has 2 fields: itemset and support count Pi Processor with idi Di The dataset local to the processor Pi D Ri The dataset local to the processor Pi after repartitioning Cik The candidate set maintained with the processor Pi during the kth pass (there are k items in each candidate) 并行关联规则挖掘 2020/11/4 史忠植关联规则 65 Objective: minimizing munication Techniques: Straightforward parallelization of Apriori Carry out redundant duplicate putations in parallel to avoid munication Only requires municating count values (no data tuples are exchanged) Processors can scan the local data asynchronously in parallel 计数分布 CD 2020/11/4 史忠植关联规则 66 Algorithm: Pass 1: (1) Each processor Pi generates its local candidate itemset Ci1 depending on the items present in its local data partition Di (2) Develop and Exchange local counts Ci1 (3) Develop global support counts C1 计数分布 CD 2020/11/4 史忠植关联规则 67 Algorithm: Pass k1: (1) Pi generates the plete Ck using the plete Lk1 created at the end of pass (k1). Each processor has the identical Lk1 thus generates identical Ck and puts its count values in a mon order into a count array (2) Pi makes a pass over data partition Di and develop local support counts for candidates in Ck (3) Pi exchanges local Ck counts with all other processors to develop global Ck counts. All processors must synchronize. (4) Pi putes Lk from Ck (5) Pi independently decide to terminate or continue to the next pass 计数分布 CD 2020/11/4 史。

阅读剩余 0%

本站所有文章资讯、展示的图片素材等内容均为注册用户上传(部分报媒/平媒内容转载自网络合作媒体)，仅供学习参考。用户通过本站上传、发布的任何内容的知识产权归属用户或原始著作权人所有。如有侵犯您的版权，请联系我们反馈本站将在三个工作日内改正。

标签：第十二章人工智能高级

高级人工智能第十二章

相关推荐

密码登录

账号注册