I got an interesting question the other day that I thought might be of value to the general public. The requestor thought there might be a problem with the Linux scheduler, but that really wasn’t the issue:
I’m using FEDORA 7 at work and I have run into a problem that I thinks is related to the scheduler. I have a core program that must run in the background and need to be able to configure it on the fly using an interface program that must run in the foreground and when done have the interface go away. When I need the interface again, I just restart it, reconfigure the system and tell the background process to grab the configuration and deal with it. So I start the UI program do what I need and then inform the background process via a socket the new configuration knowing that it sould be in a state where it will read from the socket the new data. The problem is that the background process start a read from the socket and never returns. This seems to to be a scheduling problem. The question is 1) how to prove it and 2) how to fix it? Please accept my apology for the intrusion. Have you ever encountered a problem like this?
This likely isn’t a scheduler problem. It’s more likely socket management problem. You probably aren’t reading the EOF on the socket or you have a KEEP_ALIVE on it or something similar. In other words, your daemon (re: background) process is not closing the connection or is not getting the socket close from the foreground process.
A simpler solution to this is to have your daemon process read a configuration file periodically. It does a stat() on the file and if the file has been touched since the last check then the file is reread. If the file has not been touched since the last check, then the daemon skips it. The daemon just has to poll the file every now and then. The foreground process doesn’t care about when the daemon polls the file. It just reads the file when it starts, and writes the file when it’s done. No socket messiness.
Of course, if your foreground process is not running on the same machine as the daemon, then you’ll need to stick with the socket-based solution. But it didn’t sound like that was the case here.