Hacker News new | past | comments | ask | show | jobs | submit login

Great points. We'll make sure to wire to separate power feeds that can both handle the entire load. Suggestions in how to calculate this? Taking the maximum rated load seems over the top.



I recently had to do this. The server I was putting up was rated for 3kW. To determine the expected load, I put it under a dummy load that I reasonably considered the maximum for what I would expect on the server (this was a dev machine, so I picked compiling the linux kernel as a benchmark). I ran that until the power stabilized (SuperMicro servers can measure power consumption in hardware and expose this via IPMI - very handy), because power consumption may keep creeping up for a few minutes as the fans adjust to the new operating temperature. I then repeated the same exercise with a CPU torture test (Prime95), just to see what the maximum I could possible get out of the machine was. The numbers turned out to be about 1.8kW for the linux kernel benchmark, and 2.3kW for the torture test. What I ended up doing was to provision for 2kW and use the BIOS' power limiting feature to enforce this. That would kill the machine if it exceeded its designed load, but that's usually better than tripping the breaker and killing the entire circuit. You may also want to talk to your data center provider about overrages. Some of the quotes I got wouldn't kill your circuit when you went over, but would just charge you (a crazy amount, but better than losing all your servers).

Hope that helps. Your deployment is larger than ours, so there may be other techniques, but that's what we did.


I love the idea of killing the server instead of the circuit, thanks!


Might be better to just throttle the CPU/etc when too much power is being used.


Prime95! I haven't seen that since 2000-2001! Solid tool for burning in a box and putting the CPU under maximum load.


Having designed and run colo data centers for many years my rule of thumb for calculating this for customers was to use the vendors tools if available or take 80% of power supply rating. Keep in mind that most servers will have a peak load during initial power on as the fans and components come online and run through testing. If a rack ever comes up on a single channel and the circuits are not rated right that breaker will just pop right back offline and you'll have to bring it up by unplugging servers and bringing them up in sets. Also most breakers are only rated at 80% load so you have to de-rate them for your load. So, e.g., a 20amp x 208 3 phase circuit really only has 8KW of constant power draw available.


+1. As tempting as it is to say "oh this server only needs full power when it boots" this may come back to bite you. As you grow usage in CPU and Disk, the power used by the server will increase substantially.


note that you can get around the 80% breaker limit by having your DC hardwire the power, if you have enough scale to have them do this for you.


Circuit breakers aren't just there for the amusement value of watching a clustered system go into split-brain mode...

I.e. you would also want to be sure that your wiring was rated for 100% utilization, and that other circuit-breaker-like functions exist.

Fire is an actual thing, and figuring out the best way to recharge a halon system isn't exactly what you want to be doing.


Yes. That's why they hardwire and use a breaker rated at 100%. In most jurisdiction, code requires breaker at 80% if a plug/receptacle is used. If you are hardwired you can use 100% of your capacity before tripping the breaker. You have incorrectly assumed that I suggested that you ignore breakers.

When you hardwire the circuit the electrical code allows you to use a 100% breaker.


Thank you for adding this detail... what you said makes WAY more sense to me now.


What exactly gets hardwired to what? This is surprisingly hard to search for details on.


Instead of your plug of your PDU going into a receptacle, the wires that would go into the plug are hardwired to a panel circuit breaker.

This is less common in DCs historically but more and more as folks do 208v 3phase 100A circuits.


This really is only a concern when you are paying a monthly recurring charge (MRC) by the breaker amp with many power drops.

For a deployment of this scale it should be metered power (For example 1 (or more) 3phase a+b drops to each cabinet) where you only pay a Non-Recurring setup Charge (NRC) and then the MRC is based on actual power draw.

3phase also means fewer physical PDU's (uses less space), but more physical breakers. Over-building delivery capability will eliminate any over-draw concerns for startup cycles.


Not really, although I agree with your reasoning. The other is issue is capex. When I deploy 240kW pods, if I use 80% breakers, I have to deploy 25% more PDUs than if I have 100% breakers.

Since my cabinet number is usually evenly divisible by N*PDUs, this impacts overall capital.


We are talking about 1 to 2 cab density here so capex doesn't carry that much weight.

Having a little headroom on your power circuits is also incredibility important, and not every facility will sell 100% rated breakers. It may make more sense to be in a facility with 80% rated breakers than 100%, even with the added capex of an extra PDU or two.

Goes back to my previous comment. What is important to you, at the pod / multiple pod level, isn't as important to the 1-2 cab deployment.


That's fair and accurate.

Similarly, ensure spare room in the cabinet for adjustments, that thing you forgot, and small growth. Much better to have 70% full and not need the space then to have no free RU and need the space.


Most DC PDUs will stagger your outlet power on to stagger the initially large power draws.


Some DCs provide PDU's and some don't.


At RS we would burn the servers in measure the load with a clamp meter. For large scale build-outs(swift, cloud servers) we would burn-in entire racks and measure the load reported by the DC power equipment.

I would recommend verifying everything is fault tolerant/HA as expected every step of the way. We ran into issues where the power strips on both sides were plugged into the same circuit(D'oh), wrong SST's, redundant routers getting cabled up to the same power strips, etc and you name it.

After a rack is setup have people at the DC(your own employees or the DC's techs) help simulate(create) failure in power, networking, and systems to verify everything is setup correct. It sounds like you have people coming onboard with experience provisioning/delivering physical systems though, so I would expect them to be on the ball with most of this stuff.


Thanks, that is very helpful. And we haven't made these hires yet but we hope we can in the coming months.


You can certainly do some math/estimates based on looking at individual component specifications, but I like using a power meter (built in to some PDUs - which is a feature worth having, or you can buy one, they are quite inexpensive).

A system at idle vs full CPU vs full cpu + all disk will produce very different measurements.

Also keep in mind 80% derating - many electrical codes will state that an X amp circuit should only be used at 0.8X on a continuous basis (and the circuit breakers will be sized accordingly).


For HP gear, use HP Power Advisor utility. For Dell, see the Data Center Capacity Planner. Not sure what SuperMicro has -- check with your VAR.


you want 2*N Power feeds where N = 1 or 2. I recommend ServerTech PDUs. You'll want to factor in the maximum load, but with the amount of servers and power you're doing, you should be able to negotiate metered power at under $250/kW/month. Give them a commit that's 1/3 of your TDW/reportd power load, and then pay for what you use.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: