شبکه بر تراشه با ولتاژ تطبیقی چندسطحی برای مدیریت حافظه توان آگاه در پردازندههای چندهستهای
محورهای موضوعی : مهندسی برق و کامپیوترسیده معصومه مومنی 1 , هادی شهریار شاه حسینی 2 *
1 - دانشکده مهندسی برق، دانشگاه علم و صنعت ایران، تهران، ایران،
2 - دانشکده مهندسی برق، دانشگاه علم و صنعت ایران، تهران، ایران
کلید واژه: چندپردازنده, سامانه بر تراشه, مدیریت انتقال دادهها, حافظه نهان, مدیریت انرژی.,
چکیده مقاله :
مقیاسبندی ولتاژ، یک روش پرکاربرد برای کاهش مصرف انرژی است که هزینه آن، افزایش تأخیر در شبکه در سامانههای بر تراشه چندپردازندهای است. برای کاهش این هزینه عملکردی بر شبکه و سیستم، کاهش میزان جابهجایی دادهها و ارتباطها در شبکه باید مورد توجه قرار گیرد. در برنامههای کاربردی حافظهمحور و ارتباطمحور، بخش قابل توجهی از تأخیر شبکه به دلیل ترافیک ناشی از عدم دسترسی به حافظه نهان است. در این مقاله از روش مقیاسبندی ولتاژ به صورت تطبیقی و چندسطحی استفاده میکنیم؛ در حالی که از فضای خالی حافظه میانگیرهای ورودی در گرههای شبکه بر تراشه برای کاهش ترافیک ناشی از عدم دسترسی به حافظه نهان استفاده میشود. بنابراین روش پیشنهادی باعث افزایش کارایی حافظه و کاهش مصرف انرژی تراشه میشود. بهمنظور حصول بیشینه ظرفیت ناشی از بهکارگیری رویکرد مقیاسبندی ولتاژ، ولتاژ منابع در سه سطح مختلف و با توجه به میزان متوسط فضای خالی حافظه میانگیرهای شبکه اعمال میشود. به این صورت که وقتی حافظه میانگیرها نزدیک به پر هستند، بهکارگیری مقیاسبندی ولتاژ متوقف میشود. روش پیشنهادی به طور متوسط، میزان عدم دسترسی به داده در حافظه نهان را 16 درصد و مصرف انرژی را 5/12 درصد بهبود میدهد.
Voltage scaling is a widely used technique for energy saving, which increases the delay in the network in MPSoCs. To overcome this challenge, the volume of communication in the network should be reduced. In memory-intensive and communication-intensive applications, a considerable part of the network delay is due to the traffic originated from cache misses. In this paper, we employ the voltage scaling method in an adaptive way, while the free space of the NoC input buffers is used to reduce the traffic caused by the cache misses. Therefore, the proposed method increases the memory efficiency and reduces the energy consumption of the chip. To have an adaptive approach, the voltage is adjusted according to the average amount of free space of the NoC buffers, and the voltage scaling stops when the buffers are close to full. We achieve a 16% reduction in miss penalty on average, and a 12.5% improvement in power consumption.
[1] W. Amin, et al. "HyDra: hybrid task mapping application framework for NOC-based MPSoCs," IEEE Access, vol. 11, pp. 52309-52326, 2023.
[2] A. Kumar, N. Kumar, and B. Reddy, "An efficient real-time embedded application mapping for NoC based multiprocessor system on chip," Wireless Personal Communications, vol. 128, no. 4, pp. 2937-2952, 2023.
[3] S. P. Kaur, M. Ghose, A. Pathak, and R. Patole, "A survey on mapping and scheduling techniques for 3D network-on-chip," J. of Systems Architecture, vol. 147, Article ID: 103064, Feb. 2024.
[4] L. Mo, X. Li, A. Kritikakou, and X. Zhai, "Contention and reliability-aware energy efficiency task mapping on NoC-based MPSoCs," IEEE Trans. on Reliability, vol. 74, no. 1, pp. 2010-2026, Mar. 2025.
[5] M. Momeni and H. S. Shahhoseini, "Energy efficient 3D network-on-chip based on approximate communication," Computer Networks, vol. 203, Article ID: 108652, 11 Feb. 2022.
[6] D. Deb and J. Jose, "ZPP: a dynamic technique to eliminate cache pollution in NoC based MPSoCs," ACM Trans. on Embedded Computing Systems, vol. 22, Article ID.: 118, 25 pp., 2023.
[7] M. Mineo, M. Palesi, G. Ascia, P. P. Pande, and V. Catania, "On-chip communication energy reduction through reliability aware adaptive voltage swing scaling," IEEE Trans. Comput. Des. Integr. Circuits Syst., vol. 35, no. 11, pp. 1769-1782, Nov. 2016.
[8] Y. Ouyang, et al., "DBU-PG: energy-efficient noc design using dual-buffering power gating," the J. of Supercomputing, vol. 80, pp. 13632-13656, 2024.
[9] M. Baharloo, R Aligholipour, M. Abdollahi, and A. Khonsari, "ChangeSUB: a power efficient multiple network-on-chip architecture," Computer Electronic Engineering, vol. 83, Article ID: 106578, May 2020.
[10] F. Yazdanpanah and R. A. Afsharmazayejani, "Systematic analysis of power saving techniques for wireless network-on-chip architectures," J. of Systems Architecture, vol. 126, Article ID: 102485, May 2022.
[11] G. Ascia, V. Catania, S. Monteleone, M. Palesi, D. Patti, J. Jose, and V. M. Salerno, "Exploiting data resilience in wireless network-on-chip architectures," ACM J. Emerging Technology Computer System, vol. 16, no. 2, Article ID:21, 27 pp., 2020.
[12] R. Hesse and N. E. Jerger, "Improving DVFS in NoCs with coherence prediction," in Proc. of the 9th International Symposium on Networks-on-Chip, Article ID:24, 8 pp., Vancouver, Canada, 28-30 Sept. 2015.
[13] T. Krishna, J. Postman, C. Edmonds, L. S. Peh, and P. Chiang, "SWIFT: a swing-reduced interconnect for a token-based network-on-chip in 90 nm CMOS," in Proc. of IEEE Int. Conf. on Computer Design: VLSI in Computers and Processors, pp. 439-446, Amsterdam, Netherlands, 3-6 Oct. 2010.
[14] R. Hesse, J. Nicholls, and N. E. Jerger, "Fine-grained bandwidth adaptivity in networks-on-chip using bidirectional channels," in Proc. of IEEE/ACM 6th In. Symp. on Networks-on-Chip, pp. 132-141, Lyngby, Denmark ,9-11 May 2012.
[15] N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, "Reactive NUCA: near-optimal block placement and replication in distributed caches," in Proc. of the 36th Annual Int. Symp. on Computer Architecture, pp. 184-195, Jun. 2009.
[16] G. Chen, F. Li, S. W. Son, and M. Kandemir, "Application mapping for chip multiprocessors," in Proc. of the 45th Annual Design Automation Conf., pp. 620-625, Austin, TX, USA, 20-24 Jun. 2008.
[17] M. E. Wolf and M. S. Lam, "A data locality optimizing algorithm," ACM SIGPLAN Notices, vol. 26, no. 6, pp. 30-44, May 1991.
[18] U. Bondhugula, et al., "Towards effective automatic parallelization for multicore systems," in Proc. IEEE Int. Symp. on Parallel and Distributed Processing, 5 pp., Miami, FL, USA, 14-18 Apr. 2008.
[19] F. Rad, M. Reshadi, and A. Khademzadeh, "Flow control and scheduling mechanism to improve network performance in wireless NoC," IET Communications, vol. 14, no. 14, pp. 2231-2239, Aug. 2020.
[20] A. Das, A. Kumar, J. Jose, and M. Palesi, "Opportunistic caching in NoC: exploring ways to reduce miss penalty," IEEE Trans. on Computer, vol. 70, no. 6, pp. 892-905, Jun. 2021.
[21] P. G. Massas and F. Pétrot, "Comparison of memory write policies for NoC based multicore cache coherent systems," in Proc. of the Conf. on Design, Automation and Test in Europe, pp. 997-1002, Mar. 2008.
[22] N. Agarwal, T. Krishna, L. S. Peh, and N. Jha, "GARNET: a detailed on-chip network model inside a full-system simulator," in Proc. of the International Symposium on Performance Analysis of Systems and Software, pp. 33-42, Munich, Germany, 10-14 Mar. 2009.
[23] N. Binkert, et al., "The Gem5 simulator," ACM SIGARCH Computer Architecture News, vol. 39, no. 2, pp. 1-7, Aug. 2011.
[24] C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The PARSEC benchmark suite: characterization and architectural implications," in Proc. of the 17th. Parallel Architectures and Compilation Techniques, pp. 72-81, Toronto, Canada, 25-29 Oct. 2008.
[25] J. San Miguel, M. Badr, and N. E. Jerger, "Load value approximation," in Proc. of the 47th Annual IEEE/ACM Int. Symp. on Microarchitecture, pp. 127-139, Cambridge, UK, 13-17 Dec. 2014.