Google Kubernetes Engine adds support for Arm nodes

wmf · on July 13, 2022

People would probably be more interested in the underlying Google Cloud ARM announcement: https://news.ycombinator.com/item?id=32084887

milesward · on July 13, 2022

Yup, my team did a little copy-paste lab if folks want to kick the tires: https://github.com/sadasystems/gke-multiarch-guide

Cidan · on July 13, 2022

Also, the PM's are going to do an AMA for this on reddit tomorrow:

https://www.reddit.com/r/googlecloud/comments/vy8hx3/ama_wit...

Come join us! Also, hi Moles :)

FooBarWidget · on July 13, 2022

Great. Hopefully Azure is next, so we can get ARM Github Actions runners on which I can build ARM binaries. I can already do it with qemu-user but it's slow.

cagataygurturk · on July 13, 2022

You can already self-host GH Runners on ARM.

jrockway · on July 13, 2022

What kind of ARM machine should I buy if I want to self-host an ARM server? My not-that-recent look at the market shows everyone kind of doing their own thing; Amazon makes Amazon's ARM servers, Apple makes Apple's ARM chips, etc. As some random guy who wants to test on ARM, it's annoying. Maybe things have improved recently, though?

(I'm guessing self-host realistically means "get a VM on AWS", which is probably fine for CI if you already use AWS. A little annoying to have another monthly bill to pay if you don't, though.)

cagataygurturk · on July 14, 2022

The ARM nodes which are exactly the topic of this submission could be a place where you can self-host your GH runners :)

https://github.com/actions-runner-controller/actions-runner-... works quite well with its autoscaling. You could create a zonal GKE cluster, which is in free tier, and create a small spot vm node pool with ARM nodes. It wouldn't be entirely free but it would cost quite low amount of money.

aseipp · on July 13, 2022

Your best options IMO are either 1) a Mac Mini with 16GB of RAM running a Linux VM or Asahi Linux connected OR 2) an Nvidia Jetson Xavier AGX; you can get the 32GB version for like $700 USD I think. (The newer Orin is a fair bit more expensive.)

You realistically want something with ARMv8.2 or better, these are relatively easy to acquire with the current supply chains, they're beefy, can be equipped with fast storage, and they are both small units you can put on your desk. Note that the Xavier will require you to fiddle with the usual Nvidia bullshit through their SDK but it's otherwise a standard Ubuntu machine. It should be possible to get another distro on there too. The M1 will almost certainly perform better overall watt-for-watt though.

psanford · on July 13, 2022

Buy a Mac mini and install Linux on it? That feels like it is (or will soon be) the best option.

FooBarWidget · on July 13, 2022

The point is to not have to self-host. It costs money, is a pain because maintenance is required, and there are security issues with running CI for public pull requests.

Klasiaster · on July 14, 2022

With garm (https://github.com/cloudbase/garm) you can spawn ephemeral runner VMs, be it as Azure/GCE ARM instances or as lxc VMs (or lxc containers).

alexeldeib · on July 13, 2022

Azure already has ARM VMs (as well as AKS)?

The equivalent would be cloud build, not GKE <-> GH actions, right? does cloud build support arm runners?

markdog12 · on July 13, 2022

Looks like it's only available in NA central for now, boooo: https://cloud.google.com/compute/docs/regions-zones#availabl...

milesward · on July 13, 2022

Where did you want it?

BillinghamJ · on July 13, 2022

In every zone, obviously. AWS has had widespread ARM support for many years.

ksec · on July 14, 2022

I mean first gen Graviton was announced in late 2018 during reInvent. And it wasn’t even generally available until mid 2019. It was underwhelming, but layering some required software foundation. Graviton 2 was in late 2019 and wasn’t GA and available in many region until 2021.

Hardly widespread support for many years.

Klasiaster · on July 13, 2022

It's quite light on details: What properties does the (virtualized) ARM CPU that one gets have? Can it do KVM itself (meaning the underlying real CPU has support for nested KVM)? Can it do ARMv8.3 pointer authentication?

my123 · on July 13, 2022

It's an Ampere Altra for those first instances. So no on both.

Klasiaster · on July 13, 2022

They could have named Ampere and give credit to them, I think it's common in cloud instance offerings to have AMD or Intel also being named as underlying physical CPUs, even with the generation.

milesward · on July 13, 2022

They did in the docs, 2nd paragraph of the blog post, etc

https://cloud.google.com/blog/products/compute/tau-t2a-is-fi...

Klasiaster · on July 13, 2022

Thanks, yes, makes sense to look at the GCE announcement instead of the GKE announcement for this info.

bogomipz · on July 13, 2022

I'd be curious to hear what the use case is for nested KVM. What hardware support is required for that as well?

vocram · on July 13, 2022

The instance exposed by GCE is virtualized. If you want to run any hw virtualized workload inside it, you need nested virtualization.

bogomipz · on July 13, 2022

I'd be curious to hear more about your Kubernetes workloads. What virtualized hardware do your pods require?

dilyevsky · on July 14, 2022

Any untrusted workloads (say CI runners running your clients arbitrary code) better be run inside kata containers so you can’t use t2a vms for that

bogomipz · on July 14, 2022

In GKE you can just enable GKE Sandbox/gVisor on a node pool to run your untrusted workloads. gVisor serves the same purpose as Kata containers.

dilyevsky · on July 14, 2022

Yes except slow io

bogomipz · on July 14, 2022

Can you elaborate? What type of I/O, network, disk? What is the issue exactly?

dilyevsky · on July 14, 2022

You can refer to gvisor performance docs - https://gvisor.dev/docs/architecture_guide/performance/#file... throughput is really terrible, same deal with networking and also if your userland issues a lot of syscalls

bogomipz · on July 15, 2022

Thanks for the link. I'm curious, how is the I/O performance with Kata? Does it use VirtIO?

dilyevsky · on July 16, 2022

It does and I believe with dax (if your kernel has it) it’s basically same speed as regular runc. 9p is still default tho i think

Aissen · on July 13, 2022

Pricing is in line with the cloud: expensive. That's it, they made it: now Arm servers are mainstream.

jonatron · on July 13, 2022

I always found the CPU pricing of cloud to be relatively reasonable. It's everything else that's expensive, and egress bandwidth just ridiculous.

runlevel1 · on July 13, 2022

I've done some cost analyses between our AWS and DC infrastructure.

To come up with our on-prem compute costs, we baked in the cost of power, real estate, staff, taxes, network infrastructure, servers (both in-use and in reserve), etc. On the AWS side, we used 3 year RIs and Savings Plan. After all that, there was around a 30% cost advantage on-prem. That's non-trivial, but not as big as one might think.

Outbound networking, however, is ludicrously cheaper on-prem. It's about 85% cheaper on-prem than in AWS. Bandwidth is not expensive outside the public cloud.

In fact, egress volume is the #1 cost driver for us moving a service on-prem or building it there to begin with. Some of the AWS managed services are also very pricey, but nowhere near the egregious markup of egress bandwidth.

redox99 · on July 13, 2022

85% cheaper seems little. In my case (collocation) bandwidth is 95% cheaper (i.e AWS is 20x as expensive) than AWS.

sigmaml · on July 14, 2022

A quick question.

Have you also included:

  - storage costs (equivalent of EBS, S3 and Glacier) and
  - cost of analytics pipelines (equivalent of EMR, Athena, SageMaker, ...)

in the above price comparison?

Would you have some insights there? Thanks.

api · on July 13, 2022

Typical cloud bandwidth pricing is known as "roach motel pricing" after the old roach motel pest control slogan of "roaches check in but they never check out." The idea is to make ingress free but egress expensive to make it easy to move all your data in but costly and hard to move it out.

dilyevsky · on July 13, 2022

Meh cpus/ram most likely have lighter margins for cloud vendors compared to storage and bandwidth (which is just ridiculous) but they have to keep a lot of spare capacity for scaling so if you don’t scale by much you can do way better on metal even on cpus

mathnode · on July 13, 2022

I am very ignorant of the current ARM cloud offerings. Is it similar prices but less power usage? Or have they had to ramp up the watts to compete?

justizin · on July 13, 2022

not sure about pricing here, but graviton on aws has generally offered more performance at a lower price point, which is likely linked to lower power usage and perhaps lower cost of custom silicon vs intel.

the notion, "the cloud is expensive", ignores the fact that the cloud is not just rented hardware, but staff, facilities, planning, management, etc..

there are businesses where it makes more sense to own hardware and employ your own staff, but if you just want generic compute and storage, you're unlikely to do it as well for less.

also you cannot easily source arm hardware commercially, there is the honeycomb lx2, and its' lead time is months for a single unit. if you want hundreds of nodes, you're gonna use a cloud provider who manufactures their own silicon.

vbezhenar · on July 13, 2022

> you cannot easily source arm hardware commercially

Buy Mac Minis and run virtual linux inside?

api · on July 13, 2022

You can (in theory) run Linux straight on Apple Silicon. It's not locked like the iPad and iPhone. The M1 Ultra would be a pretty solid high volume server if you can manage to plug a 10-40gbps LAN dongle into it. I believe USB-C and/or thunderbolt connectors like that exist.

In theory so far because I'm not sure if there are mature installers and such yet.

dochtman · on July 13, 2022

The Asahi Linux installer worked flawlessly on my MacBook Pro M1 Max, though it asks you to do some low level things like repartitioning the drive.

throwaway894345 · on July 13, 2022

I've been passively curious about this for a homelab--does linux virtualize on apple silicon today? I was of the impression this didn't work when they announced m1 in 2020, but I could be misremembering.

vbezhenar · on July 13, 2022

Absolutely. It worked from day one. There are issues with emulating x86 Linux, but that’s a different story. For ARM Linux — just ordinary qemu works fine.

throwaway894345 · on July 13, 2022

Oh, interesting. That's great to hear!

goodpoint · on July 13, 2022

> the notion, "the cloud is expensive", ignores the fact that the cloud is not just rented hardware, but staff, facilities, planning, management, etc..

Not at all. Any organization that runs its datacenters can calculate a TCO.

It's really business 101.

runlevel1 · on July 13, 2022

Dell and HP are still nowhere to be seen in the on-prem Arm server market. Is there not enough of a cost advantage for there to be demand?

wmf · on July 14, 2022

https://www.hpe.com/us/en/newsroom/press-release/2022/06/hew...