How to fix a yum when it hangs with no error messages


I was trying to install a package on an FC5 box at home today and discovered that yum was being uncooperative.  Well, I blame it on yum but it was actually rpm.  There were no error messages, no indication at all that something was wrong.  It just hung.  I noticed I had a number of other hung yum processes left over from previous days cron jobs as well.  Something wicked this way came.

Well, maybe not so wicked.  The only way to find out what was going on was to use strace on the process.  Since installing packages requires root privileges I need to run the thing as root, so I did this:

Strace will output directly to the screen if you don't use the -o option.  Well, after buckets of stuff spewed on the screen it finally got to a hang on a futex() call.  Never mind what futex() is, right above it was the file it was trying to read – /var/lib/rpm/Providename.  A google search on futux and this name came up with a page on the gmane mailing list (search the page for yum hangs on futex()), which pointed out the real problem.  RPM uses the berkeley db system for its database and occassionally this db gets corrupted (usually on hard reboots, though I don't remember any of those recently).  Fortunately, it's easy to fix.

The instructions on this page were slightly out of sync with fedora Core 5.  So here is the modified version:

cd /var/lib/rpm
/usr/lib/rpm/rpmdb_recover

And that's it.  Yum recovered nicely immediately after that.  Slick.  I thought I'd been hacked or something and was contemplating the ever popular upgrade (I only do those ever other year or so and it's not quite time for that).  Now I can continue happily on.  Without so much as a reboot.

Thanks to Bill McGonigle for his post on that mailing list.  I've put the same info here in the hopes it will help someone else down the line.

update: 2007-0403

The problem keeps showing up almost every day now, mostly when 0check4update is run from cron.daily.  There is a thread about similar problems in a bug report from red hat.  I"m having problems with atrpms site at the moment – don't know if it's down or what – and yum can't find any mirrors for it.  So I had to disable it  I may have messed up my system by disabling a number of the repos for yum, then doing updates.  It might be time to migrate my laptop to FC6, just to clear this problem out once and for all.