> *Note that the next hop’s IP address is in the router’s memory only: it does n...

jcrawfordor · on Sept 10, 2020

To give a simplified but largely accurate summation: IP and Ethernet were each designed in different time periods and largely without knowledge of the other. Ethernet was historically used in such a fashion that multiple hosts (more than 2) occupied the same collision domain, that is, they were physically connected to the same cable, or through hubs that repeated frames to all interfaces without routing. This means that Ethernet required an addressing scheme so that hosts on the same media knew which frames were for them (higher-level protocols at the time did not necessarily handle this).

Ethernet's addressing scheme was not designed to accommodate large hierarchical networks and so is unsuitable for the IP use case, but more importantly, IP was designed completely separately from Ethernet, and was not used primarily with Ethernet until later, so IP could not "assume" that the layer below it handled addressing (typically there was either no layer below [point-to-point] or only a very simple one).

The result is that Ethernet and IP duplicate functionality to some extent. It is theoretically possible, although not common, to build a network which uses only layer 3 routing without any reliance on Ethernet addressing. A significant reason this is rare, arguably the most significant reason, is that IP is now carried over Ethernet a significant majority of the time and L2 Ethernet devices (like switches) require the use of Ethernet addressing for the network to function. You usually see "pure IP" in virtual networking environments where the IP is encapsulated in, well, more IP, but even then Ethernet frames are sometimes used because, well, just like network hardware, operating system network stacks generally expect them (examine, e.g., the linux bridge implementation). It is completely possible to build network stacks and network appliances which do not require the use of Ethernet but it is expensive and there's not much of a motivation to do so, and you'd run into issues with any kind of equipment not so designed.

Addressing is not the only duplicate functionality between Ethernet and IP, and it's one of the less significant ones since Ethernet addressing does provide utility even if not strictly required. Ethernet frames are checksummed, and IP headers are also checksummed, even though the Ethernet checksum is already over them. The IP header checksum exists because IP was historically carried over lower layers that did not provide integrity checking. This is basically pure wasted space in typical networks, so IPv6 drops the header checksum to remove the overhead.

In general, though, network protocols tend to make more sense when you have some awareness of the history of their development, as when you try to view the modern internet as an elegant, monolithic design as some authors attempt, a lot of things won't make sense because they simply are that way for historic reasons. Ethernet and IP were each designed in the '70s, but separately, and their use has accumulated significant cruft since then, including some radical changes in the ways that they were used (for example the transition of Ethernet from shared media to point-to-point, which occurred de facto earlier but became largely formalized with the introduction of GbE which prohibits more than two hosts in a collision domain, and of course ironically the introduction of multiple hosts in a collision domain as an even larger issue with wireless protocols, which requires additional handling below, or actually in lieu of, the ethernet layer, 802.11 being a replacement for ethernet that happens to behave similarly in many ways for compatibility).

Finally, the OSI model is something that tends to add complexity and confusion to these discussions, which is why I doggedly discourage its use in teaching. The OSI Model describes the OSI protocols, which were contemporaries competitors to the TCP/IP protocols. Arguably, one of the reasons that the OSI protocols fell out of use (in favor of IP) is exactly because they assumed seven layers, and each was fairly complex. Some OSI protocols are still in use, for example IS-IS (OSI layer 2) in the telecom industry and some backbone IP transit, but in niches and generally being replaced with IP. IP is intentionally simpler, and can be fully described using four layers, what's usually referred to as the TCP/IP model.

The OSI layers do not map 1:1 to the TCP/IP layers, even if you simply ignore the ones that map more poorly as instructors often do. Even worse, many instructors and textbook authors feel such a strong compulsion to map modern networks to the obsolete OSI model that they cram application-layer protocols into OSI layers 5 and 6 in order to have examples of them. I have seen cases as extreme as an instructor claiming that HTTP cookies represent the session layer. This kind of thing is nonsense and hinders understanding rather than contributing to it. If the OSI model is taught (not a bad idea at all as students should realize that TCP/IP is merely the popular way, and certainly not the only way), it should be taught specifically by contrasting it to the different TCP/IP model. Unfortunately few instructors and website authors today seem to even be aware that the OSI protocol stack existed separately from IP.

And, if you are wondering, yes, Ethernet can be used in a switched network completely independently from IP (although not really in a routed network unless you are generous about how you define routing). This was more common decades ago, the only equipment I have ever personally encountered that used bare Ethernet was a very outdated CNC setup.

jwatzman · on Sept 10, 2020

Along with the above fantastic comment, I found https://apenwarr.ca/log/20170810 an interesting (if inflammatory/divisive) essay on the subject and its history.

jcrawfordor · on Sept 11, 2020

Yes, that essay is outstanding! I largely left out mention of IPv6 because it's a whole different can of worms, but as that article presents, it aims to make the situation radically simpler but in practice, well, doesn't. Cue the XKCD about making a new standard.

A bit ago I touched on various competitors to IP on my blog-thing (https://computer.rip/) but I need to find time to give the topic a more thorough treatment. As with a lot of fields, you can probably learn more about what really matters in networking by studying the protocols that didn't make it than by studying the ones that did. It's hard for most people that entered the computing field in the last couple of decades to imagine IP and TCP/UDP not being the clearly correct design, but in the '80s to early '90s the expansion of microcomputers was accompanied by a flourishing of network protocols for use with them. There are multiple reasons that TCP/IP over Ethernet eventually became dominant but in the end it's mostly happenstance, it's pretty easy to imagine XNS becoming the norm if ARPANET had gone a little differently. Imagine the problems we'd be talking about today in that parallel universe, XNSv6 adoption is such a mess.

I'm honestly a bit sad to see the "all-IP" trend working its way through the telecom industry. It's reducing use of protocols like MPLS that I think are very cool. But now software-defined networking brings a whole new world of strange network technologies that we'll find ill-advised in fifty years.

therealcamino · on Sept 11, 2020

Besides the choice between using IP or "bare" Ethernet, there are alternatives to IP as the layer on top of Ethernet that are used in routed networks. Two of the more-common examples historically are Novell Netware (IPX/SPX) and DECnet.

tssva · on Sept 12, 2020

Another historic alternative was VINES IP which was used by Banyan Vines systems. Like IPX/SPX it was inspired by XNS.

What makes it particularly interesting is that Vines was based upon AT&T UNIX System V which means it is was a widely deployed commercial Unix implementation which did not use TCP/IP for it's network stack.

bnjms · on Sept 11, 2020

Beautiful rant.

Request. Do TLS next (if it’s in your wheelhouse). I’ve been looking for a good summary of ECC and selected curves in tls 1.2

jcrawfordor · on Sept 11, 2020

I don't know, it's hard to get that far with TLS because you get mired down pointing out all the problems and failed potential solutions in the CA infrastructure first. ;)

swinglock · on Sept 10, 2020

> It doesn't matter which subnet the next hop's IP is in, as the routing table isn't consulted for it anyway - it's only used in ARP)

You can only ARP for hosts on the same subnet as you, terrible hacks excluded.

> This leaves the question, why the indirection and why the mucking around with ARP and IPs that are never used as the destination to anything?

Because it was designed in layers so that different layers could be replaced. We didn't know we'd end up with mostly only IP and Ethernet in LANs back then.

> Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

It could have been done in any number of ways. It's not that much complexity through and it would bake Ethernet MACs into everything IP, even in the cases where it's not needed.

AlphaSite · on Sept 10, 2020

Fiddling with ARO comes up more often that you’d think, especially as a quick easy way to handle HA.

james412 · on Sept 10, 2020

IP addresses sharing a route have a common prefix. This is not true of MAC addresses. They are allocated essentially randomly. If you wanted to route solely using MAC addresses, every router in the world would need a lookup table containing every MAC address, route aggregation would be impossible

That's not /the/ reason why a MAC address is involved. It's because that's the address for a physical device at a lower layer in the stack. As others mention, IP is media-independent, it cannot depend on a lower tier addressing scheme without becoming fused to that medium

mrkstu · on Sept 11, 2020

In an alternative universe where Novell continued to dominate networking, we'd be talking about how IPX uses the MAC directly to ID the host and had a separate network ID to uniquely identify the LAN the host is connected to.

It is actually a pretty reasonable way of integrating hardware MACs directly into the internetworking stack.

yabones · on Sept 10, 2020

The reason for that is because IP is not 'integrated' with layer-2 tech like Ethernet. In fact, for a very long time Ethernet was only really used on local networks. Point-to-Point Protocol (PPP) [1] is a completely separate data link layer technology with no real concept of MAC addresses, because there can only be two devices on the bus.

Most of the very expensive 'multilayer' switches [2] do a form of this where they associate a next-hop IP with a MAC address entry and store that in the TCAM or data layer. It's not used as much because Cisco has a ton of patents on this type of technology, and also because general purpose hardware has gotten quick enough that it's not as important as it was ~15 years ago...

[1] https://en.wikipedia.org/wiki/Point-to-Point_Protocol

[2] https://en.wikipedia.org/wiki/Multilayer_switch#Layer-3_swit...

yardstick · on Sept 11, 2020

> Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

One reason why using an IP is still important is the IP can move to a different router, so the MAC for that IP can change. Eg if a hardware swapout was performed, or the network admin manually moved the IP, or some HA system that dynamically moves IPs to other routers (and isn’t VRRP, which uses a virtual MAC).

Usability: it’s a lot easier imo to read a routing table with IP next hop than MAC as you don’t have to remember what MAC every machine is. The IP also conveys visually which port the traffic is (probably) going out. Eg Port 1 - 192.168.1.0/24 Port 2 - 192.168.2.0/24

If my next hop for 1.1.1.1 is via 192.168.2.254 I know immediately it’s going out port 2. If it was a MAC I’d have no clue unless I memorised all MACs in my networks.

w7 · on Sept 10, 2020

You can have network segments which do not use ethernet and therefor have no MAC addresses, but still use IP addressing and need to be routable. It doesn't make sense to tie the next-hop in a table to MAC addresses which are an implementation detail on a lower layer. A good, popular, example of this you can test yourself without obscure hardware is wireguard.

monocasa · on Sept 10, 2020

A lot of protocols don't end up using Ethernet as the physical layer, even ones you still use today.

Qemu (and I think Docker too?) use SLIRP internally for access between VMs which is ultimately an IP layer bridge.

On the WAN side (at least at one point, I could be out of date here) they didn't use Ethernet, but instead IP layer routing as well, on top of stuff like PPP and SONET.

starfallg · on Sept 11, 2020

>Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

This is exactly what Cisco Express Forwarding (and similar layer 3 switching technology) does. The adjacency table keeps all of the layer 2 information to be used for fast routing of packets. This was implemented on the CPU back in the day, but now usually done in the switching ASICs.

However, you still need layer 3 next-hop information in the routing table (and dynamic routing protocols). The reason being 1. ethernet is one of many layer 2 technologies that IP supports and 2. MAC addresses can change for a particular IP address due to various reasons including hardware replacement and HA.

wmf · on Sept 10, 2020

Historically, some links didn't have MAC addresses and different link types have different address types so it's easier for the routing protocols to work in terms of IP addresses.

jlgaddis · on Sept 11, 2020

> Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

Several others have already answered your question -- the key points being "the OSI model" (e.g., layer 2 vs. layer 3) and the multitude of other layer 2 protocols which don't use MAC addresses -- so I'll mention one other important detail.

---

Although the Ethernet protocol itself has been around for ~40 years now, for the majority of that time it mostly only existed "in the LAN".

In fact, when it comes to "on the WAN", Ethernet is still a relative newcomer. Before ~15 years or so ago, pretty much no one was using Ethernet "on the WAN" -- instead, it was X.25 and frame relay and HDLC and PPP and ATM and POS on analog "leased lines" and ISDN and DS-{1,3}s and OC-{3,12,48,192}s.

Along came MPLS, MetroE, EoMPLS, Carrier Ethernet, etc., and soon enough everyone was "tunneling" Ethernet between sites but we were still mostly using those "legacy" protocols "on the WAN".

Over time, technology advanced to the point that "native" Ethernet eventually became feasible "on the WAN" -- in no small part because 1) Ethernet speeds kept increasing by an order of magnitude (!) every few years, 2) standardizing on Ethernet everywhere drove the costs down, and 3) Ethernet was "easy" (compared to all of those "WAN" protocols we were using up until this point) -- everybody already "knew" Ethernet because, by this time, everybody had been using it in their LANs for a decade or more!

Although ATM and SONET (at least) are still around in (some parts of) some service provider networks, they are now the exception and Ethernet -- to butcher a phrase -- "has eaten the world" but, as I mentioned, Ethernet "on the WAN" is still a relatively new thing.

---

So, I'll offer an alternative answer to your question:

> Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

Sure, if you had done it about 30 years earlier!

notyourday · on Sept 10, 2020

> Couldn't you simply put the next hop's MAC address (instead of IP address) into the routing table and be able to route packets just as well, with a lot less complexity?

No, because MAC address only makes sense for ethernet-like layer 2 protocols and IP can run over any number of layer 2 protocols, including point to point protocols and some of the point to point protocols.

rmetzler · on Sept 10, 2020

If you would put next hops MAC address in the routing table and the device fails and needs to be replaced, all the routing tables would need to be rewritten, because MACs are supposed to be unique. You couldn’t just take a spare device, configure it accordingly and be done with it.

bluecmd · on Sept 10, 2020

IPV6 commonly does that. Your next hop is installed as a link-local fe80-entry which is derived from the mac address. Not exactly what you're after, but removes the IP numbering need.