关于fpga的外文文献翻译---一种新的包装,布局和布线工具的fpga研究(编辑修改稿)内容摘要:

ion routines all work only with this routing resource graph, so adding new routing architecture features only involves changing the subroutines that build this VPR was initially developed for islandstyle FPGAs [2, 3], it can also be used with rowbased FPGAs [4]. VPR is not currently capable of targeting hierarchical FPGAs [5], although adding an appropriate placement cost function and the required routing resource graph building routines would allow it to target , VPR’s builtin graphics allow interactive visualization of the placement, the routing, the available routing resources and the possible ways of interconnecting the routing resources. The VPACK Logic Block Packer / Netlist Translator VPACK reads in a blif format list of a circuit that has been technologymapped to LUTs and flipflops, packs the LUTs and flip flops into the desired FPGA logic block, and outputs a list in VPR’s list format. VPACK can target a logic block consisting of one 13 LUT and one FF, as shown in Figure 2, as this is a mon FPGA logic element. VPACK is also capable of targeting logic blocks that contain several LUTs and several flip flops, with or without shared LUT inputs [6]. These “clusterbased”logic blocks are similar to those employed in recent FPGAs by Altera, Xilinx and Lucent Technologies. 2 Placement Algorithm VPR uses the simulated annealing algorithm [7] for placement. We have experimented with several different cost functions, and found that what we call a linear congestion cost function provides the best results in a reasonable putation time [8].The functional form of this cost function is where the summation is over all the s in the circuit. For each , bbx and bby denote the horizontal and vertical spans of its bounding box, respectively. The q(n)factor pensates for the fact that the bounding box wire length model underestimates the wiring necessary to connect s with more than three terminals, as suggested in [10]. 14 Its value depends on the number of terminals of n。 q is 1 for s with 3 or fewer terminals, and slowly increases to for s with 50 , x(n) and Cav, y(n) are the average channel capacities (in tracks) in the x and y directions, respectively, over the bounding box of cost function penalizes placements which require more routing in areas of the FPGA that have narrower channels. All the results in this paper, however, are obtained with FPGAs in which all channels have the same capacity. In this case Cav is a constant and the linear congestion cost function reduces to a bounding box cost good annealing schedule is essential to obtain highquality solutions in a reasonable putation time with simulated annealing. We have developed a new annealing schedule which leads to very highquality placements, and in which the annealing parameters automatically adjust to different cost functions and circuit sizes. We pute the initial temperature in a manner similar to [11]. Let Nblocks be the total number of logic blocks plus the number of I/O pads in a circuit. We first create a random placement of the circuit. Next we perform Nblocks moves (pairwise swaps) of logic blocks or I/O pads, and pute the standard deviation of the cost of these Nblocks different configurations. The initial temperature is set to 20 times this standard deviation, ensuring that initially virtually any move is accepted at the start of the in [12], the default number of moves evaluated at each temperature is. This default number can be overridden on the mand line, however, to allow different CPU time / placement quality tradeoffs. Reducing the number of moves per temperature by a factor of 10, for example, speeds up placement by a factor of 10 and reduces final placement quality by only about 10%.When the temperature is so high that almost any move is accepted, we are essentially moving randomly from one placement to another and little improvement in cost is obtained. Conversely, if very few moves are being accepted (due to the temperature being low and the current placement being of fairly high quality), there is also little improvement in cost. With this motivation in mind, we propose a new temperature update schedule which increases the amount of time spent at temperatures where a significant fraction of, but not all, moves are being accepted. A new temperature is puted as Tnew = a Told, where the value of a depends on the fraction of attempted moves that were accepted (Raccept) at Told, as shown in Table , it was shown in [12, 13] that it is desirable to keep 15 Raccept near for aslong as possible. We acplish this by using the value of Raccept to control a range limiter only interchanges of blocks that are less than or equal to Dlimit units apart in the x and y directions are attempted. A small value of Dlimit increases Raccept by ensuring that only blocks which are close together are considered for swapping. These“local swaps” tend to result in relatively small changes in the placement cost, increasing their likelihood of acceptance. Initially, Dlimit is set to the entire chip. Whenever the temperature is reduced, the value of Dlimit is updated according to, and then clamped to the range 1 163。 Dlimit 163。 maximum FPGA dimension. This results in Dlimit being the size of the entire chip for the first part of the anneal, shrinking gradually during the middle stages of the anneal, and being 1 for the lowtemperature part of the , the anneal is terminated when T * Cost / Ns. The movement of a logic block will always affect at least one . When the temperature is less than a small fraction of the average cost of a , it is unlikely that any move that results in a cost increase will be accepted, so we terminate the anneal. 3 Routing Algorithm VPR’s router is based on the Pathfinder negotiated congestion alg。
阅读剩余 0%
本站所有文章资讯、展示的图片素材等内容均为注册用户上传(部分报媒/平媒内容转载自网络合作媒体),仅供学习参考。 用户通过本站上传、发布的任何内容的知识产权归属用户或原始著作权人所有。如有侵犯您的版权,请联系我们反馈本站将在三个工作日内改正。