Tuesday, 7 May 2013

Local syslog using rsyslogd

Phase one of the new logging system is to use the local syslog daemon.  This turned out to be pretty easy to set up, as glibc has the appropriate functions, so the line

syslog(LOG_MAKEPRI(LOG_LOCAL0, LOG_NOTICE), "Some message");

was all that was needed, and suddenly the log reports from Polly were appearing in the syslog file at /var/log/syslog

That was easy, wasn't it.  But I want to put the log messages in a separate file (or two) so now I need to learn how to configure rsyslogd in order to filter them out.  A brief study of the interweb and I'd added the command

local0.*    /var/log/polly.log

into rsyslogd's configs.  But it didn't work.  I fiddled.  I tweaked.  I learned how to validate the config by typing rsyslogd -N1  but no joy.  I tried other syntaxes (Just to confuse matters, rsyslogd accepts configuration commands in three totally different formats, and you are free to mix them in the same configuration file.)

if $syslogfacility-text == 'local0' then /var/log/polly.log

But my new log file remained steadfastly empty.

Eventually I spotted a log message, hidden in auth.log, complaining I was providing an "unknown facility/priority: 405".  Aha, does that tell me anything?

The facility/priority number in syslog is defined as priority plus 8 * facility, and that's what LOG_MAKEPRI is supposed to be doing for me.  Facility LOCAL0 is 16, and priority NOTICE is 5, so the combined value should be 133 (decimal).  A couple of printf lines soon revealed that LOG_MAKEPRI(LOG_LOCAL0, LOG_NOTICE) was returning 1029 (0x405).  It was soon obvious that the LOG_LOCAL0 macro had already got the multiplication by 8 included.  I changed the call to syslog to

syslog( (LOG_LOCAL0 | LOG_NOTICE), "message");

and my filter was suddenly working.

As far as I can see this is a bug in the glibc documentation.  Hmph!  Anyone know how I report that?  Before doing so, I downloaded the latest development source for glibc and they've already fixed it!  Personally I would have altered the documentation to match the code, but there you are.

Anyway, I was now able to create a proper rsyslogd configuration file, located in /etc/rsyslog.d/polly.conf containing

if $syslogfacility-text == 'local0' and ($msg startswith ' ETHPIC') then /var/log/polly-ethpic.log
if $syslogfacility-text == 'local0' and not ($msg startswith ' ETHPIC') then /var/log/polly.log

# Don't pass it on
local0.*    ~

This splits out the rather busy ETHPIC log entries into a different file.  The last line discards all LOCAL0 messages, so they don't get passed on to the later rule which puts everything into the messages file.

I am still weighing up the pros and cons of phase two of the syslog project.  This would involve reconfiguring rsyslogd to pass the log messages to another server, with the aim of reducing wear and tear on the Raspberry Pi's SD card.  Qustions which need to be considered include
  • Is SD card wear a problem?
  • What happens if the logging system is down?



Monday, 6 May 2013

Result!

Well, my guess turned out to be correct, and the temporary version of the Polly server with no log files is responding instantly to all button presses and commands.  All I need to do now is create some kind of non-blocking log system.  That sounds like an awful lot of coding, but wait, someone's already been there:  Unix's syslog system already has the capability to log to a remote system and would appear to do all I need.  So all that remains is to find out how to drive it...

Saturday, 4 May 2013

Up and Running

Apart from one intentional shutdown 25 days ago, PollyPi has been controlling my house without any problems for over a month, so I officially declare it to be fully operational!

What's next?

The first thing to look at is an intermittent problem where the response time of the system seems excessively slow.  The philosophy of putting all the control logic in the server and none in the remote nodes means when I press a button on a control box, a message has to go up and down before the button lights up.  This delay was never noticeable on the PC based system but it is sometimes obvious now.  The fact that it only occurs sometimes suggests it's not just a plain performance issue.  My gut feeling, based on no evidence whatsoever, is that the multiple writes to log files are clogging things up.  I can quickly confirm or disprove this by producing a test build which doesn't log anything.  Stay tuned ...