بررسی تأثیر سلسلهمراتب حافظه نهان ناهمگن در پردازندههای مراکز داده
محورهای موضوعی : مهندسی برق و کامپیوترعدنان نصری 1 , محمود فتحی 2 * , علی برومندنیا 3
1 - دانشگاه آزاد اسلامی واحد علوم و تحقیقات تهران
2 - دانشگاه علم و صنعت ایران
3 - دانشگاه آزاد اسلامی واحد تهران جنوب
کلید واژه: مرکز داده ابریپردازندهسلسلهمراتب حافظه نهانحافظه غیر فرارمحک CloudSuite,
چکیده مقاله :
این مقاله به مسأله تأثیر استفاده از حافظههای غیر فرار در سلسلهمراتب حافظه نهان برای پردازندههای مراکز داده در عصر سیلیکون تاریک پرداخته است. همان طور که مصرف انرژی به یکی از مباحث مهم عملیات و نگهداری مراکز داده ابری تبدیل شده است، فراهمکنندگان سرویسهای ابری به شدت در این زمینه نگران شدهاند. تکنولوژی حافظههای غیر فرار نوظهور جایگزینی مناسب برای حافظههای متداول امروزی میباشند. ما در این مقاله از حافظه غیر فرار STT-RAM در مقایسه با حافظه SRAM به عنوان حافظه نهان سطح آخر استفاده میکنیم. تراکم بالا، دسترسی خواندن سریع، توان مصرفی نشتی نزدیک به صفر و غیر فرار بودن باعث میشود حافظه STT-RAM یک فناوری مهم برای حافظههای درون تراشه باشد. در اکثر تحقیقات قبلی که از حافظههای غیر فرار بهره گرفتهاند، روشهای خاص و مبتنی بر محکهای متعارف بررسی شده و در مورد محکهای ابری نوظهور تحت عنوان بارهای کاری Scale-out تحلیل کاملی انجام ندادهاند. ما در این مقاله با اجرای بارهای کاری Scale-out، تأثیر استفاده از حافظههای غیر فرار در سلسلهمراتب حافظه نهان پردازندههای ابری مراکز داده را بررسی میکنیم. نتایج آزمایش روی محکِ CloudSuite نشان میدهد که استفاده از حافظه STT-RAM در مقایسه با حافظه SRAM در حافظه نهان سطح آخر، میزان انرژی مصرفی را حداکثر 59% کاهش میدهد.
This paper focuses on the effect of heterogeneous cache hierarchy in data center processors in the dark silicon era. For extreme-scale high performance computing systems, system-wide power consumption has been identified as one of the key constraints. As energy consumption becomes a key issue for operation and maintenance of cloud data centers, cloud computing providers are becoming significantly concerned. Emerging non-volatile memory technologies are favorable replacement for conventional memory. Here, we employ a nonvolatile memory called spin-transfer torque random access memory (STT-RAM) as an on-chip L2 cache to obtain lower energy compared to conventional L2 caches, like SRAM. High density, fast read access, near-zero leakage power and non-volatility make STT-RAM a significant technology for on-chip memories. In order to decrease memory energy consumption, it is required to address both the leakage and dynamic energy. Previous studies have mainly studied specific schemes based on common applications and do not provide a thorough analysis of emerging scale-out applications with multiple design options. Here, we discuss different outlooks consisting of performance and energy efficiency in cloud processors by running CloudSuite benchmarks as one of scale-out workloads. Experiment results on the CloudSuite benchmarks show that using STT-RAM memory compare to SRAM memory as last level cache, consumes less energy in L2 cache, around 59% at maximum.
[1] Q. Zhang, L. Cheng, and R. Boutaba, "Cloud computing: state-of-the-art and research challenges," J. of Internet Services and Applications, vol. 1, no. 1, pp. 7-18, May 2010.
[2] A. Uchechukwu, K. Li, and Y. Shen, "Energy consumption in cloud computing data centers," International J. of Cloud Computing and Services Science, vol. 3, no. 3, pp. 31-48, Jun. 2014.
[3] H. Rong, H. Zhang, S. Xiao, C. Li, and C. Hu, "Optimizing energy consumption for data centers," Renewable and Sustainable Energy Reviews, vol. 58, no. 1, pp. 674-691, May 2016.
[4] M. Ferdman, et al., "Clearing the clouds: a study of emerging scale-out workloads on modern hardware," in Proc. of the 17th Conf. on Architectural Support for Programming Languages and Operating Systems, ASPLOS’12, pp. 37-48, Mar. 2012.
[5] P. Lotfi-Kamran, B. Grot, and B. Falsafi, "NOC-Out: microarchitecting a scale-out processor," in Proc. of the 45th Annual IEEE/ACM Int. Symp. on Microarchitecture, pp. 177-187, Vancouver, BC, Canada, 1-5 Dec. 2012.
[6] M. R. Jokar, M. Arjomand, and H. Sarbazi-Azad, "Sequoia: a high-endurance NVM-based cache architecture," IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 24, no. 3, pp. 954-967, Apr. 2015.
[7] P. Lotfi-Kamran, M. Modarressi, and H. Sarbazi-Azad, "An efficient hybrid-switched network-on-chip for chip multiprocessors," IEEE Trans. on Computers, vol. 65, no. 5, pp. 1656-1662, Jun. 2016.
[8] V. Karakostas, O. S. Unsal, M. Nemirovsky, A. Cristal, and M. Swift, "Performance analysis of the memory management unit under scale-out workloads," in Proc. IEEE Int. Symp. on Workload Characterization, IISWC'14, 12 pp., Raleigh, NC, USA, 26-28 Oct. 2014.
[9] J. Wang, J. Zhang, W. Zhang, K. Qiu, T. Li, and M. Wu, "Near threshold cloud processors for dark silicon mitigation: the impact on emerging scale-out workloads," in Proc. of the 12th ACM Int, Conf. on Computing Frontiers, 8 pp., Ischia, Italy, 10-12 May 2015.
[10] A. Pahlevan, et al., "Towards near-threshold server processors," in Proc. IEEE Design, Automation & Test in Europe Conf. & Exhibition, DATE'16, pp. 7-12, Dresden, Germany,14-18 Mar. 2016.
[11] Z. Wang, D. A. Jimenez, C. Xu, G. Sun, and Y. Xie, "Adaptive placement and migration policy for an STT-RAM-based hybrid cache," in Proc IEEE 20th Inte. Symp. on High Performance Computer Architecture, HPCA’14, pp. 13-24, Orlando, FL, USA, 15-19 Feb. 2014.
[12] Y. T. Chen, J. Cong, H. Huang, B. Liu, C. Liu, M. Potkonjak, and G. Reinman, "Dynamically reconfigurable hybrid cache: an energy efficient last-level cache design," Proc. IEEE Design, Automation & Test in Europe Conf. & Exhibition, DATE’12, pp. 45-50, Dresden, Germany, 12-16 Mar. 2012.
[13] J. Ahn, S. Yoo, and K. Choi, "Prediction hybrid cache: an energy-efficient STT-RAM cache architecture," IEEE Trans. on Computer, vol. 65, no. 3, pp. 940-951, May 2015.
[14] A. Valero, J. Sahuquillo, S. Petit, P. Lopez, and J. Duato, "Design of hybrid second-level caches," IEEE Trans. on Computers, vol. 64, no. 7, pp. 1884-1897, Aug. 2015.
[15] Z. Zhou, L. Ju, Z. Jia, and X. Li, "Managing hybrid on-chip scratchpad and cache memories for multi-tasking embedded systems," in Proc. 20th Asia and South Pacific Design Automation Conf., ASP-DAC’15, pp. 423-428, Chiba, Japan, 19-22 Jan. 2015.
[16] D. Jevdjic, G. H. Loh, C. Kaynak, and B. Falsafi, "Unison cache: a scalable and effective die-stacked DRAM cache," in Proc. 47th Annual IEEE/ACM Int. Symp. on Microarchitecture, , pp. 25-37, Cambridge, UK, 13-17 Dec.? Dec. 2014.
[17] S. Onsori, A. Asad, K. Raahemifar, and M. Fathy, "Notice of violation of IEEE publication principles: An energy-efficient heterogeneous memory architecture for future dark silicon embedded chip-multiprocessors," IEEE Trans. on Emerging Topics in Computing, vol. 4, no. 2, p. 1, May 2015.
[18] A. Asad, A. Dorostkar, and F. Mohammadi., "A novel power model for future heterogeneous 3D chip-multiprocessors in the dark silicon age," EURASIP Journal on Embedded Systems, vol. 12, no. 1, pp. 1-16, Dec. 2016.
[19] S. Onsori, A. Asad, K. Raahemifar, and M. Fathy, "OptMem: dark-silicon aware low latency hybrid memory design," in Proc IEEE. International Conf. on VLSI Systems, Architectures, Technology and Applications, VLSI-SATA'16, 5 pp., Bangalore, India, 10-12 Jan. 2016.
[20] S. Onsori, A. Asad, O. Ozturk, and M. Fathy, "Hybrid stacked memory architecture for energy efficient embedded chip-multiprocessors based on compiler directed approach," in Proc. IEEE 6th Int. Green Computing Conf. and Sustainable Computing Conf., IGSC'15, 7 pp., Las Vegas, NV, USA, 14-16 Dec. 2015.
[21] S. Senni, L. Torres, G. Sassatelli, A. Gamatie, and B. Mussard, "Exploring MRAM technologies for energy efficient systems-on-chip," IEEE J. on Emerging and Selected Topics in Circuits and Systems, vol.6, no. 3pp. 279-292, Apr. 2016.
[22] -, CloudSuite 1.0, 2012. [Online]. Available: http://parsa.epfl.ch/cloudsuite
[23] J. S. Vetter and S. Mittal, "Opportunities for nonvolatile memory systems in extreme-scale high-performance computing," Computing in Science & Engineering, vol. 17, no. 2, pp. 73-82, Jan. 2015.
[24] D. Jevdjic, S. Volos, and B. Falsafi, "Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? Have it all with footprint cache," ACM SIGARCH Computer Architecture News, vol. 41, no. 3, pp. 404-415, Jun. 2013.
[25] J. Park, J. Jung, K. Yi, and C. M. Kyung, "Static energy minimization of 3D stacked L2 cache with selective cache compression," in Proc. IFIP/IEEE 21st Int. Conf. on Very Large Scale Integration, VLSI-SoC'13, pp. 228-233, Istanbul, Turkey, 7-9 Oct. 2013.
[26] M. H. Samavatian, H. Abbasitabar, M. Arjomand, and H. Sarbazi-Azad, "An efficient STT-RAM last level cache architecture for GPUs," in Proc. of the 51st Annual Design Automation Conf., 6 pp., San Francisco, CA, USA, 1-5 Jun. 2014.
[27] M. Bakhshalipour, et al., "Reducing writebacks through in-cache displacement," ACM Trans. on Design Automation of Electronic Systems, vol. 24, no. 2, pp. 1-21, Jan. 2019.
[28] S. Rashidi, M. Jalili, and H. Sarbazi-Azad, "A survey on pcm lifetime enhancement schemes," ACM Computing Surveys, vol. 52, no. 4, pp. 1-38, Aug. 2019.
[29] S. Rashidi, M. Jalili, and H. Sarbazi-Azad, "Improving MLC PCM performance through relaxed write and read for intermediate resistance levels," ACM Trans. on Architecture and Code Optimization, vol. 15, no. 1, pp. 1-31, Mar. 2018.
[30] C. Qian, L. Huang, P. Xie, N. Xiao, and Z. Wang, "A study on non-volatile 3d stacked memory for big data applications," in Proc. Int. Conf. on Algorithms and Architectures for Parallel Processing, pp. 103-118, Zhangjiajie, China, 18-20 Nov. 2015.
[31] M. Hosomi, et al., "A novel non-volatile memory with spin torque transfer magnetization switching: spin-ram," in Proc. IEEE Int. Electron Devices Meeting, IEDM Technical Digest., pp. 459-462, Washington, DC, USA, 5-5 Dec. 2005.
[32] A. K. Mishra, X. Dong, G. Sun, Y. Xie, N. Vijaykrishnan, and C. R. Das, "Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs," ACM SIGARCH Computer Architecture News, ACM, vol. 39, no. 3, pp. 69-80, Jun. 2011.
[33] Q. Wang, L. Shen, and Z. Wang, "Research on scale-out workloads and optimal design of multicore processors," in Proc. of Int. Conf. on Soft Computing Techniques and Engineering Application, pp. 157-166, Kunming, China, 25-27 Sept. 2013.
[34] E. Chen, D. Lottis, A. Driskill-Smith, D. Druist, V. Nikitin, S. Watts, X. Tang, and D. Apalkov, "Nonvolatile spin-transfer torque RAM (STT-RAM)," in Proc. IEEE Device Research Conf., DRC'10, pp. 249-252, South Bend, IN, USA, 21-23 Jun. 2010.
[35] T. Wenisch, et al., "SimFlex: statistical sampling of computer system simulation," IEEE Micro, vol. 26, no. 4, pp. 18-31, Jul./Aug. 2006.
[36] M. Palesi, S. Kumar, and D. Patti, Noxim: Network-on-Chip Simulator, http://noxim.sourceforge.net, 2010.
[37] S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in Proc. Annual IEEE/ACM Int. Symp. on MICRO-42, pp. 469-480, New York, NY, USA, 12-16 Dec. 2009.
[38] N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, "CACTI 6.0: a tool to model large caches," HP Laboratories, Technical Report, 2009.
[39] X. Dong, C. Xu, N. Jouppi, and Y. Xie, "NVSim: a circuit-level performance, energy, and area model for emerging non-volatile memory," IEEE Trans, on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 7, pp. 994-1007, Jun. 2012.