Научная статья на тему 'Modeling and monitoring of information system infrastructure'

Modeling and monitoring of information system infrastructure Текст научной статьи по специальности «Электротехника, электронная техника, информационные технологии»

CC BY
178
43
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
DATA CENTER / TEMPERATURE MAP / POWER CONSUMPTION / COOLING FACILITY / SERVERS' ROOM / COMPUTATIONAL FLUID DYNAMICS SIMULATION / RC NETWORK

Аннотация научной статьи по электротехнике, электронной технике, информационным технологиям, автор научной работы — Babkin Oleg Vyacheslavovich, Varlamov Aleksandr Aleksandrovich, Gorshunov Roman Aleksandrovich, Dos Evgenii Vladimirovich, Kropachev Artemii Vasilyevich

Development of power and temperature monitoring system is a key aspect of ensuring of data centers performance. It is important to build accurate model of the scalable server room power consumption system and cost-effective cooling facility. Systematic analysis demonstrates that the servers’ power consumption is always correlated with key workload parameters of shared storage, memory, computational capability and network bandwidth. It was considered that for management of temporal and spatial temperature variations of servers’ stability it is necessary to develop accurate temperature map modeling algorithm. It was shown that computational fluid dynamics simulation is most effective instrument of analysis while it uses mathematical methods for development of precise fluid flow model. Though, it was proved to be a very complex model because it based on differential equation with no analytical solution so resource-intensive numerical procedures have to be used in this case. It was proposed to use algorithm which allows decreasing complexity of computational fluid dynamics simulation and building accurate temperature map. Proposed algorithm is based on building of heatand air-flow graphs. Simplified temperature model for servers is oriented on the computational and memory-sockets of servers, as well as on heat removal capability referring to the fan speed changes. Model included building of thermal RC network scheme of the system which is based on connection between thermal and electrical losses. Cooling facility of developed model included cooling tower, chiller, server room air conditioning and server room air handling. It was considered that for estimation of data center servers temperature map, it is necessary to analyze account interactions of multiple servers’ heat and air flows within the bounds of the server’s room. This procedure allowed developing accurate heat recirculation scheme of data center platform.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Modeling and monitoring of information system infrastructure»

MODELING AND MONITORING OF INFORMATION SYSTEM

INFRASTRUCTURE

12 3

Babkin O.V. , Varlamov A.A. , Gorshunov R.A. , Dos E.V.4, Kropachev A.V.5, Zuev D.O.6

1Babkin Oleg Vyacheslavovich - Strategy Consultant, IBM;

2VarlamovAleksandrAleksandrovich - CTO, SHARXDC LLC, MOSCOW;

3Gorshunov Roman Aleksandrovich - Solution Architect, AT&T, BRATISLAVA, SLOVAKIA;

4Dos Evgenii Vladimirovich - Lead DevOps Architect, EPAM, MINSK, REPUBLIC OF BELARUS;

5Kropachev Artemii Vasilyevich - Principal Architect, LI9 TECHNOLOGY SOLUTIONS, NORTH CAROLINA;

6Zuev Denis Olegovich - Independent Consultant, NEW JERSEY, USA

Abstract: development of power and temperature monitoring system is a key aspect of ensuring of data centers performance. It is important to build accurate model of the scalable server room power consumption system and cost-effective cooling facility. Systematic analysis demonstrates that the servers' power consumption is always correlated with key workload parameters of shared storage, memory, computational capability and network bandwidth. It was considered that for management of temporal and spatial temperature variations of servers' stability it is necessary to develop accurate temperature map modeling algorithm. It was shown that computational fluid dynamics simulation is most effective instrument of analysis while it uses mathematical methods for development of precise fluid flow model. Though, it was proved to be a very complex model because it based on differential equation with no analytical solution so resource-intensive numerical procedures have to be used in this case. It was proposed to use algorithm which allows decreasing complexity of computational fluid dynamics simulation and building accurate temperature map. Proposed algorithm is based on building of heat- and air-flow graphs. Simplified temperature model for servers is oriented on the computational and memory-sockets of servers, as well as on heat removal capability referring to the fan speed changes. Model included building of thermal RC network scheme of the system which is based on connection between thermal and electrical losses. Cooling facility of developed model included cooling tower, chiller, server room air conditioning and server room air handling. It was considered that for estimation of data center servers temperature map, it is necessary to analyze account interactions of multiple servers' heat and air flows within the bounds of the server's room. This procedure allowed developing accurate heat recirculation scheme of data center platform.

Keywords: data center, temperature map, power consumption, cooling facility, servers' room, computational fluid dynamics simulation, RC network.

1. Introduction

Modern scalable data centers platforms performance is one of the most important task of IT-

area development. Building of power and temperature facilities models is proved to be effective

instrument of data centers servers' room stability ensuring. Assigned task could be solved by

development of mathematical model of server room power consumption system.

To identify the main aspects of the problem, systematic analysis of recent studies and publications was done. There were analyzed aspects high-level power data center servers' models to estimate key workload parameters [1-3]. To solve problem of electrical cooling complex organization based of fan's system works [4, 14] which demonstrate that the system sets significant amount of data center infrastructure power utilization were studied.

Computational fluid dynamics simulation as effective instrument of development of servers' thermal map [5, 14] was analyzed; as well as methods which allows decreasing complexity of this simulation [6-8]. Comparative analysis of cooling power as varying processor utilization process which leads to adjusting the server room temperature change [10-14] was also considered. Systematic analysis shows possibility to develop effective model based on heat recirculation scheme of data center platform.

2. Data center power system modeling

Development of efficient power and temperature monitoring system is a key aspect of ensuring of data centers performance. It is necessary to build accurate model of the data center server room power consumption and cooling facility and then work on scalable and cost-effective power with temperature monitoring systems.

Most accurate power models usually simulate and analyze individual components of servers, but for large-scale data centers these algorithms would be resource-intensive and speed of such a simulation proves to be low enough. Thereby our goal is to simulate the large clusters of servers in data centers' infrastructure network (Figure 1).

(¡¡¡¡¡1 Lli cpl □ Memory U—» UjJ Storage n Network

1

l PS = P0 + P1-P{+P2 • U2 + P3 • U3 + P4 u4

3Ç Cooling system - Pf Pq + Pf-

1 Pf

Po /

i

Fig. 1. Data center power system modeling scheme

It was demonstrated [1-3], that power models are widely used to monitor and estimate the power consumption of servers, analysis shows that the power consumption for a given server is always correlated with key workload parameters: shared storage, memory (RAM and cash-memory), computational capability (CPU) and network bandwidth (Figure 1).

To estimate this connection various experimental studies of high-level power data center servers' models were to be done [1-3]. Basically model should use simplified simulation equation of linear or nonlinear regression power model which estimates the server power consumption up to service resource occupancy level:

f Ps = P0+ZiPfUÏi ta > 1 ;Ui £ [ 0 .. 1 0 0 %] ,

where P5 is server power consumption, is physical resources utilization level and P; is a set of fitting parameters, which varies according to the physical resource's type of analyzed data center server system.

Evaluations for developing the high-level server power model could be conducted by comparing different forms of power models which refers to different values of and . Most simplified model one could set i = 1 and analyzes only computational capability of data center's servers (r; = 1 is used for linear model and r; > 1 is for nonlinear one). For accurate simulation it is better to set i = 1 and analyze all servers' physical resources occupancy (CPU, RAM and storage workload intensity, as well as network bandwidth).

Electrical cooling complex based of fan's system stands significant amount of data center infrastructure power utilization. Fan power consumption has a cubic relationship with fan speed [4], as follows:

Pf = P0 + Pf-5| , (2)

where P0 and PF are fitting parameters and SF is a fan speed. Thus, lowering of the fan speed lets us to significantly reduce power consumption (Figure 1).

3. Data center temperature control system modeling

To manage temporal and spatial temperature variations stability it is necessary to develop accurate temperature model. It allows to significantly save expenses on placing of thermal sensors a high area data center server room and prevent problems caused by its' frequent failures. Computational fluid dynamics (CFD) simulation is proved to be effective instrument of development of servers' thermal map. It uses mathematical methods and algorithms for precise analysis of fluid flow model. CFD-based thermal modeling [5] is based following equation:

' - Fli D8ip 1

£i? + ir?? = 4f# + 50p)> (3)

at oxoyoz oxoyoz

where is a air fluid density, are coordinates, is velocity for each of direction,

is the source for each variable and is a variable that can be used for following properties:

• mass;

• velocity;

• temperature;

• turbulence.

is the diffusion coefficient which could be estimated as

AT 2

D =■

M± + M2

il-pA<J

(4)

where A is coefficient, Mw are molar masses of molecules in the gaseous mixture, T is the absolute temperature, p^ is the pressure, a is the average collision diameter, n is a temperature-dependent collision integral.

It should be noticed that four components in Eq. (3) refers to main parts of air fluid transport process model:

. d(p<p)

• transient: ——;

dt_ d(Vp<p)

• convection: ———;

oxoyoz

d( Ddv ) j • _cc • dxdydz

• diffusion: -—t-—;

oxoyoz

• source: S (<p ) .

CFD-simulation shows high accuracy, but this kind of simulation is a very complex one because there is no analytical solution for differential equation so it has to be solved by numerical procedures which prove to be resource-intensive.

At this study is presented solution based on the works [6, 8, 14] which allows to decrease complexity of CFD-simulation and build accurate temperature map. The algorithm is based on building of heat- and air-flow graphs. Simplified temperature model for servers is oriented on the CPU- and RAM-blocks of servers, as well as on heat removal capability referring to the fan speed changes. Model includes building of thermal RC network scheme of the system (Figure 2) based on connection between thermal and electrical losses [7].

Fig. 2. RC network based temperature model which includes CPU and memory sockets

Figure 2 demonstrates that CPU socket RC network scheme includes:

• power consumption of each core in a socket Pfpu;

• lateral thermal resistance R fp u;

• vertical thermal resistance ;

• thermal resistance of heat spreader Rfp u;

• case-to-ambient thermal resistance of heat sink R fp u;

• thermal capacitances of die Cfp u;

• thermal capacitances of heat spreader C f p u;

• thermal capacitances of heat sink C f p u;

• junction temperature .

It has to be noticed that is usually neglected while and could be

obtained as a sum of the thermal resistances of heat sink and convective resistance as function of the fan speed SF

!nCPU _ pCPU I pCPU (r \ KCA ~ ^HS ^ KConv\yFJ

R fpuv ~ (EA-S£) - 1 , (5) a E [80. .100%]

where R "^estimation is based on parameters of effective area EA and factor a. In other hand, memory socket RC network scheme also includes further components and definitions (Figure 2):

• power consumption of each RAM chip P^f M;

• thermal resistance of each RAM chip R ;

• thermal capacitance of each RAM chip C$f M;

• junction temperature of each RAM chip TP£M;

• thermal resistance of the case to ambient of the memory R ;

• number of ranks of each RAM chip N.

Temperature of memory socket is correlated with the temperature of CPU socket due to air flows inside a server. Thereby, air absorbing heat in CPU socket affects to the temperature of RAM socket as it is equivalent to raising temperature at memory socket. Thermal coupling should be modeled as follows:

tcpu

H ~ (6)

ca

where is the dependent coupling heat source of the memory; is CPU heat sink temperature, of the CPU.

3. Data center computing facility and cooling facility modeling

For precise estimation of data center servers temperature map, it is necessary to analyze account interactions of multiple servers' heat and hot air flows from bottom to top of the servers room. This procedure allows to develop heat recirculation scheme of data center. The model of recirculation can be built by a cross-interference matrix represented by

Vnxn'-

... (P

... (Pi_j

(p-L-j ... (pN-N

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

(7)

where parameter refers to the outlet heat rate of the -th server in the inlet heat rate of the j -th server of data center, N is the number of servers in a servers room.

Let us suppose that H fut is outlet heat of i-th server and Hjn is inlet heat of y'-th server. Hjn can be calculated on server room environment heat , power consumed by -th server and value (Figure 3):

Tjin _ ViV IJO'.

Hj - Li=1 "i

p i—j """ He nv """ Pj.

(8)

Heat rate allows estimating the temperature at each server within a server room by temperature map models described at previous chapter.

Fig. 3. Scheme of estimation of data center server temperature level

Cooling facility model usually includes following functional components:

• cooling tower;

• chiller;

• server room air conditioning (SRAC);

• server room air handling (SRAH).

Thereby, the heat generated by data center's servers is absorbed by SRAC conditioned air provided from CRAH, and then it has to be drawn by SRAH system. SRAH exchanges the heat with cold air (or water) provided from a chiller based on refrigeration cycle. Comparative analysis of cooling power should be provided as varying processor utilization process which leads to adjusting the server room temperature change [10-14].

Up to this model power usage effectiveness (PUE) as a comparison of total power utilized by data center and power utilized by servers can be evaluated on server temperature set-point (Figure 4) which depends on CRAH efficiency [13, 14]:

'SRAH 'room ^air rj.wa.ter. (9)

SRAH~ SRAH

where refers to the temperatures of air exhausted from server room and is the temperature of chilled water flowing into the SRAH.

Power usage effectiveness

1,6

1,4

1,2-

1,0

lectr icol Cooli

f^t ng

F, •ee Coolin ?

-1 I- -■ - --1

15

20

25

30

35

Fig. 4. Power usage effectiveness of data center server room in electrical and free cooling

Therefore, those parameters can be temperature estimation. While ECRAH < 1

calculated by server power consumption, outside it should be noticed Troom has to be always higher

4. Conclusions

It was shown that development of power and temperature monitoring system is a key aspect of ensuring of data centers performance. Thereby it is important to build accurate model of the scalable server room power consumption system and cost-effective cooling facility. Analysis demonstrates that the servers' power consumption is always correlated with key workload parameters of shared storage, memory, computational capability and network bandwidth.

It was considered that for management of temporal and spatial temperature variations of servers' stability it is necessary to develop accurate temperature map simulation algorithm. Computational fluid dynamics simulation is proved to be effective instrument of analysis while it uses mathematical methods for development of precise fluid flow model. Though, it is a very complex model because it based on differential equation with no analytical solution so resource-intensive numerical procedures have to be used in this case. It was proposed to use algorithm which allows decreasing complexity of computational fluid dynamics simulation and building accurate temperature map. The algorithm is based on building of heat- and air-flow graphs. Simplified temperature model for servers is oriented on the computational and memory-sockets of servers, as well as on heat removal capability referring to the fan speed changes. Model includes building of thermal RC network scheme of the system which is based on connection between thermal and electrical losses

Cooling facility model included cooling tower, chiller, server room air conditioning and server room air handling. It was mentioned that for estimation of data center servers temperature map, it is necessary to analyze account interactions of multiple servers' heat and air flows within the bounds of the server's room. This procedure allowed developing precise heat recirculation scheme of data center.

References

1. Kralicek E, 2016. Physical vs. Virtual Server Environments. The Accidental SysAdmin Handbook. 121-134.

2. Rivoire S., RanganathanP. and Kozyrakis C. "A Comparison of High-Level Full-System Power Models,"HotPower8, 2008. 3-3.

3. Pedram M. and Hwang I. "Power and performance modeling in a virtualized server system," in Parallel Processing Workshops (ICPPW), 2010. 39th International Conference on. Pp. 520526. IEEE, 2010.

4. Megdiche M., 2014. Dependability Engineering for Data Center Infrastructures. Data Center Handbook. 275-305. doi:10.1002/9781118937563.ch15.

5. Choi J., Kim Y., Sivasubramanjam A., Srebric J., Wang Q. and J. Lee. "A CFD-based tool for studying temperature in rack-mounted servers,"Computers, IEEE Transactions on 57. № 8 (2008): 1129-1142.

6. Heath T., Centeno A.P., George P., Ramos L., Jaluria Y. and Bianchini R. "Mercury and freon: temperature emulation and management for server systems," in ACM SIGARCH Computer Architecture News. Vol. 34, no. 5, Pp. 106-116. ACM, 2006.

7. IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information, 2013. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 21(2)..

8. Ayoub R., Nath R. and Rosing T. "JETC: Joint energy thermal and cooling management for memory and CPU subsystems in servers," inHigh Performance Computer Architecture (HPCA), 2012. IEEE 18th International Symposium onro Pp. 1-12. IEEE, 2012.

9. Pakbaznia and Pedram M. "Minimizing data center cooling and server power costs," in Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design. Pp. 145-150. ACM, 2009.

10. Hwang D.C., Manno V.P., Hodes M. and Chan G.J. "Energy savings achievable through liquid cooling: A rack level case study," in Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm), 2010 12th IEEE Intersociety Conference on. Pp. 1-9. IEEE, 2010.

11. Breen T.J., Walsh E.J., Punch J., Shah A.J. and Bash C.E. "From chip to cooling tower data center modeling: Part I influence of server inlet temperature and temperature rise across cabinet," inThermal and Thermomechanical Phenomena in Electronic Systems (ITherm), 2010. 12th IEEE Intersociety Conference on, pp. 1 -10. IEEE, 2010.

12. Gao T., Samadiani E., SchmidtR. & Sammakia B., 2013. Dynamic Analysis of Hybrid Cooling Data Centers Subjects to the Failure of CRAC Units. Volume 2: Thermal Management; Data Centers and Energy Efficient Electronic Systems.

13. Kim J., Ruggiero M. and Atienza D. "Free cooling-aware dynamic power management for green datacenters," in High Performance Computing and Simulation (HPCS), 2012. International Conference on. Pp. 140-146. IEEE, 2012.

14. Kim J., Sabry M.M., Ruggiero M. & Atienza D., 2015. Power-Thermal Modeling and Control of Energy-Efficient Servers and Datacenters. Handbook on Data Centers. 857-913.

i Надоели баннеры? Вы всегда можете отключить рекламу.