Jan 7 activities

Very difficult time getting into my car on this bitterly cold day (locks, door latches and doors themselves frozen), but eventually made it...

Anyway, discussed with Mike the retirement of several former Dell database servers from the BCF and keeping one spare 2850 and some of the "best" disks of the bunch.

minor help for Dmitry with his new 27" display.

Spoke briefly to Jeff about l4evp - sounds like it won't be a big deal for them to restore it if it has to be rebuilt from scratch (as I expect) and that ScLinux 6 will be acceptable.

Continued the move in 1006C - everything of mine is now either in my new area, or in the discard pile that Bob and I will later this week take to the Excess Equipment Pool or trash/recycling as appropriate.  Still some odds and ends on an excess bookshelf to clean up and organize the clutter, but should allow other parties to proceed with rearrangement.

dbbak replacement nodes arrived; brought them from 510 to the DAQ Room.

found and investigated three onlNN node problems:

  • onl32 had not properly booted after yesterday's power outage.  Simple enough reason - it had a Sc.Linux disk in its optical drive and had booted into the installation routine.  Easy fix.
  • onl33 had mysteriously dropped offline around 9am this morning.  Investigation shows only that NetworkManager came along seemingly out of the blue at 9:07 and whacked eth1 offline, for no apparent reason.  It rebooted cleanly however, so this is a worrisome mystery at the moment.
  • onl26 was found running with only 3GB of RAM, despite having six 1GB DIMMS installed (and recognised).  Turns out they were not installed symmetrically in the two banks.  Upon resoorting them, it went to 4GB of RAM (more in line with expectation).  Am stumped whether we recieved the correct RAM that had been ordered for this node in December after Mike identified a faulty module.  I had ordered (via Rachel) two of Crucial part number CT12872Y40B (registered RAM), but what I found was CT12872Z40B (unbuffered).  Since I wasn't present when Mike got the modules, I don't know if I am looking at the modules that were delivered (a mistake in the order) or if Mike put the delivered modules in a spot where I didn't find them (asked him via email this evening).

Minor follow-up discussion with Jim Thomas about medm and font issues.  The sysuser account on the Slow Controls machines has dozens(?) of aliases defined for starting subsystem-specific medm screens, but somewhat confusingly also has an alias for medm itself using "-displayFont scalable".  He suggests that medm itself should NOT be an alias, but rather the individual subsystem aliases should be adjusted to explicitly set their own preferred value for the displayFont option.

learned about the chage -l option (how did I know know that by now?!)

And now shortly after 9pm, I discover onl33 is behaving weirdly (initially found it was not updating in Ganglia) - I can SSH in, but gmond won't start and applications can't get name resolution to work, though nslookup, dig and host from the command line are all fine.  Clearing the hosts cache in nscd seems to have cleared things up, but now have to wonder if there is something funky going on with this machine.