Well, a “fun” day with an Apple X-Server that bears repeating (well, writing down). If your server suddenly stops functioning with no real changes worth noting, you know its going to be an interesting day.
I got called in by a friend and fellow consultant. He told me that the X-Serve would be connected to the network for a short while and then stop. He could make it work by disconnecting the ethernet cable and re-connecting it. That made MacOS reset the connection and it would work for a short while.
After looking over endless settings, preference files, notes online, and a call to Apple support, we were left with the conclusion that we should re-install the OS (reluctantly). This is really something we did not want. (And felt that this was like surrendering, but, being the biggest stick we could hit it with, we thought this would fix it.)
WRONG! After re-installing and following all the instructions from documents, setup wizards and Apple’s instructions from the phone call, IT STILL DROPS OFF THE NETWORK.
In a moment of inspiration (desperation), I thought that the problem might be with the router (a current-model Apple Airport Extreme base station) or with some other equipment hooked up to it all. So it was time to start testing the network.
I started up a TCPDump job on the X-Serve to see if it showed something noticeably weird. There was quite a bit, but the only noticeable was the slowdown in packets once it dropped off the network. One thing that was strange; once the X-Serve could no longer load a web page and no desktop could connect to it, I could still successfully PING the X-Serve from my laptop. WHA?!
OK, at some point, I noticed that “arp -a” on the X-Serve would return a strange entry for x.x.x.255. (Something like ff:ff:ff:ff:ff if I recall.) This seemed symptomatic of the problem, but deleting it accomplished nothing. But AHA! the result of arp -a on my laptop showed that the X-Serve’s IP address was associated with an INCORRECT hardware address (M.A.C. address).
SO! There’s ARP poison coming from somewhere. (Or some kind of hiccup in the Airport.) So, in short order, we disconnected EVERY other device in the network except the DSL modem, Airport and X-Serve. But it STILL happens! We even replaced with the Airport with a brand-new spare of the same mode. (Which happened to be there for expansion plans.) It was STILL getting dropped! (Now we know its a good bug-hunt.)
The last bit needed to root it all out; that bad MAC address. It was always off by 1. The X-Serve has two ethernet connectors, and the information utilities show the two MAC addresses. In this case, they ended with :b0 and :b1. However, whenever the X-Serve dropped its connection, the arp entry on the laptop show the number, but ending with :b2. Resetting the X-Serve connection with the ethernet cable unplug/plug trick would immediately cause it to show as :b0 on the laptop until the next drop.
Well, one more quick call to Apple and we find out that the X-Serve has two EXTRA MAC addresses for the Lights-Out Management (LOM) system. And yes, they are :b2 and :b3 for this X-Serve. This was something I hadn’t realized because I hadn’t looked into using LOM before. In fact, when I went through the setup wizard after the re-install, I had specifically opted out of setting this up.
A quick trip to the Server Monitor program gave it all up. We set it to monitor the X-Serve, used the Server menu’s “Configure Local Machine” option to show LOM and yes, LOM ports 1 and 2 had been configured with the SAME IP address as the manual configuration we specified in the wizard.
So, whenever LOM would do something, the ARP table in the Airport (and any other computer listening) would be updated with the wrong MAC address. (Kind of like ARP poison I suppose.)
A quick change to the LOM settings and all was well. A minute or two later, and I had ARP entries for both the MAC addresses with their separate IP addresses. And the X-Serve has been happy ever since.
Its strangely nostalgic. I recall working on the same problem many years ago (circa 1990 when the MacTCP driver for MacOS 6 had a “dynamic addressing” option. It would create an ARP poison situation too. It had been driving my group nuts and when we found the reason we were able to just set manual addresses for all the computers and keep everyone working with no more dropouts.
Now for some rest.