[ruby-talk:444294] Alternative logging solutions

Presently our applications use the standard Logger package that comes with Ruby and it works just fine. We use it in conjunction with the standard unix logrotate. Up to now everything has been working as expected Until the logrotate hit in the middle of some significant processing and a client had a query. What we found was a 2 second gap in the logs. The end of the old log was the start of a transaction and the start of the new log was the end of a completely different transaction. So we have lost partial information from two transactions and it is possible that there was at least one transaction between these two that was lost completely. Each transaction takes around 1 second to process, sometimes less This time it was not too big an issue and there were other ways to investigate the problem Is there some other way of logging that is even less likely to lose data?

Couple of solutions come to mind: 1. If this seldom happens the easiest solution would be to do nothing 2. Logger gem has support for 'period-based rotated logging'. The 'daily' option seems perfect 3. Switch from logrotate to syslog-ng, I have not used this myself but it seems to have an option for lossless rotation 4. Add important events that you never want to miss to a table in the database 5. Use an elaborate logging solution consisting of for instance: fluent-bit to collect log entries from your application loki to ingest/store log entries grafana to visualize/monitor/etc logs When using (managed) kubernetes this is pretty easy to set up. If you are going this way it might also be a good idea to use the 'semantic-logger' gem to generate log entries in json format. This way you can easily add more context to log entries and use this in grafana. On 04-08-2023 11:55, Peter Hickman via ruby-talk wrote:
Presently our applications use the standard Logger package that comes with Ruby and it works just fine. We use it in conjunction with the standard unix logrotate. Up to now everything has been working as expected
Until the logrotate hit in the middle of some significant processing and a client had a query. What we found was a 2 second gap in the logs. The end of the old log was the start of a transaction and the start of the new log was the end of a completely different transaction. So we have lost partial information from two transactions and it is possible that there was at least one transaction between these two that was lost completely. Each transaction takes around 1 second to process, sometimes less
This time it was not too big an issue and there were other ways to investigate the problem
Is there some other way of logging that is even less likely to lose data?
______________________________________________ ruby-talk mailing list --ruby-talk@ml.ruby-lang.org To unsubscribe send an email toruby-talk-leave@ml.ruby-lang.org ruby-talk info --https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-talk.ml.ruby-lang.org...

1) There is the possibility that the missing information could be crucial. Clients are settling bets based on these processes. If things go tits up on a major event (think Football World Cup and the like) a lot of money could be at stake and if it is not our problem but we cannot prove it then we could take a financial and reputational hit. Besides not knowing what happened when it went wrong makes fixing it hard 2) We used to use the rotation feature that was built into the Logger package but we found in the past that when summer time happened the logger died because the new file already existed (not sure what was happening but moving the logfile with that days timestamp elsewhere fixed it). It was in our support calendar to check for this specifically until we had migrated everything to use the standard logrotate. This was years ago and this may be fixed now but testing it is not convenient when summer time changes happen only twice a year :) 3) syslog does look like the next thing to look at. I had a feeling that the most reliable alternative would be this. I will have to dig into it further I have considered logging to a queue and have a reader create the actual log file but syslog will probably be the simpler solution Thanks for you input

You are welcome. Ah, financial transactions. In that case it seems to me that you have two separate requirements: 1. Logging of requests, responses, errors. So the usual stuff to monitor the application 2. An audit trail of each financial event. In my opinion this does not belong in a logfile but in a database. So in a single database transaction: create the bet, lock the customer account, debit the customer account and create an audit entry. If the transaction succeeds happy days, if it fails try to create a failure audit entry and report back to the customer. If all fails (a Ruby exception) then hopefully the logfile will at least record the error. On 04-08-2023 13:59, Peter Hickman via ruby-talk wrote:
1) There is the possibility that the missing information could be crucial. Clients are settling bets based on these processes. If things go tits up on a major event (think Football World Cup and the like) a lot of money could be at stake and if it is not our problem but we cannot prove it then we could take a financial and reputational hit. Besides not knowing what happened when it went wrong makes fixing it hard
2) We used to use the rotation feature that was built into the Logger package but we found in the past that when summer time happened the logger died because the new file already existed (not sure what was happening but moving the logfile with that days timestamp elsewhere fixed it). It was in our support calendar to check for this specifically until we had migrated everything to use the standard logrotate. This was years ago and this may be fixed now but testing it is not convenient when summer time changes happen only twice a year :)
3) syslog does look like the next thing to look at. I had a feeling that the most reliable alternative would be this. I will have to dig into it further
I have considered logging to a queue and have a reader create the actual log file but syslog will probably be the simpler solution
Thanks for you input
______________________________________________ ruby-talk mailing list --ruby-talk@ml.ruby-lang.org To unsubscribe send an email toruby-talk-leave@ml.ruby-lang.org ruby-talk info --https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-talk.ml.ruby-lang.org...

We are not doing the transactions, we are settling the market that the bets are placed against. We say something like "Player X scored the first goal" and the client then pays out the customers who made that bet and takes money from those that bet on a different goal scorer. If we get it wrong the client has paid money out they cannot get back (reasons) and then they have to pay out the correct bet, "Player Y is the first goal scorer". There are conditions that should make us delay announcing the settlement (goal being challenged / reviewed). It's not enough to know that a goal was scored but we need to be sure that the goal is valid. So we log like crazy to make sure we know why we made our decisions Around 50 to 800 events that may affect the settlement of a market occur per match so we can log Gbs a day. But if something goes wrong sods law says it will be in the 2 second window that is logrotate Anyhow that was a bit off topic On Fri, 4 Aug 2023 at 13:37, Ivo Herweijer via ruby-talk < ruby-talk@ml.ruby-lang.org> wrote:
You are welcome.
Ah, financial transactions. In that case it seems to me that you have two separate requirements: 1. Logging of requests, responses, errors. So the usual stuff to monitor the application 2. An audit trail of each financial event. In my opinion this does not belong in a logfile but in a database. So in a single database transaction: create the bet, lock the customer account, debit the customer account and create an audit entry. If the transaction succeeds happy days, if it fails try to create a failure audit entry and report back to the customer. If all fails (a Ruby exception) then hopefully the logfile will at least record the error.
On 04-08-2023 13:59, Peter Hickman via ruby-talk wrote:
1) There is the possibility that the missing information could be crucial. Clients are settling bets based on these processes. If things go tits up on a major event (think Football World Cup and the like) a lot of money could be at stake and if it is not our problem but we cannot prove it then we could take a financial and reputational hit. Besides not knowing what happened when it went wrong makes fixing it hard
2) We used to use the rotation feature that was built into the Logger package but we found in the past that when summer time happened the logger died because the new file already existed (not sure what was happening but moving the logfile with that days timestamp elsewhere fixed it). It was in our support calendar to check for this specifically until we had migrated everything to use the standard logrotate. This was years ago and this may be fixed now but testing it is not convenient when summer time changes happen only twice a year :)
3) syslog does look like the next thing to look at. I had a feeling that the most reliable alternative would be this. I will have to dig into it further
I have considered logging to a queue and have a reader create the actual log file but syslog will probably be the simpler solution
Thanks for you input
______________________________________________ ruby-talk mailing list -- ruby-talk@ml.ruby-lang.org To unsubscribe send an email to ruby-talk-leave@ml.ruby-lang.org ruby-talk info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-talk.ml.ruby-lang.org...
______________________________________________ ruby-talk mailing list -- ruby-talk@ml.ruby-lang.org To unsubscribe send an email to ruby-talk-leave@ml.ruby-lang.org ruby-talk info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-talk.ml.ruby-lang.org...

In addition to the other suggestions, you might try switching logrotate from copytruncate mode to move/signal and reopen the log when the process gets signaled. Though I've never used that technique in ruby; I usually jump straight to syslog. (Even better: syslog local and forwarded to a collector over TCP/TLS with a local spool to buffer messages when the remote connection is down.) Another option is to log to stderr and let the service manager (daemontools, s6, systemd, etc.) or container runtime capture and preserve the messages.

On Fri, Aug 4, 2023 at 4:55 AM Peter Hickman via ruby-talk < ruby-talk@ml.ruby-lang.org> wrote:
Until the logrotate hit in the middle of some significant processing and a client had a query. What we found was a 2 second gap in the logs. The end of the old log was the start of a transaction and the start of the new log was the end of a completely different transaction. So we have lost partial information from two transactions and it is possible that there was at least one transaction between these two that was lost completely. Each transaction takes around 1 second to process, sometimes less
Can you provide your logrotate configuration or an example of it? It's been a while since I messed with logrotate, but I seem to recall that there are cases when using compression where there could be gaps in the saved data. I also want to say that there's a workaround for that problem. In any case, seeing your configuration may help us all understand what's going on in your case. -Jeremy

Here is the logrotate /home/results/results/shared/log/*.log { daily missingok rotate 30 compress delaycompress copytruncate notifempty dateext create 644 results results } On Fri, 4 Aug 2023 at 13:49, Jeremy Bopp <jeremy@bopp.net> wrote:
On Fri, Aug 4, 2023 at 4:55 AM Peter Hickman via ruby-talk < ruby-talk@ml.ruby-lang.org> wrote:
Until the logrotate hit in the middle of some significant processing and a client had a query. What we found was a 2 second gap in the logs. The end of the old log was the start of a transaction and the start of the new log was the end of a completely different transaction. So we have lost partial information from two transactions and it is possible that there was at least one transaction between these two that was lost completely. Each transaction takes around 1 second to process, sometimes less
Can you provide your logrotate configuration or an example of it? It's been a while since I messed with logrotate, but I seem to recall that there are cases when using compression where there could be gaps in the saved data. I also want to say that there's a workaround for that problem. In any case, seeing your configuration may help us all understand what's going on in your case.
-Jeremy

On Fri, Aug 4, 2023 at 8:13 AM Peter Hickman via ruby-talk < ruby-talk@ml.ruby-lang.org> wrote:
Here is the logrotate
/home/results/results/shared/log/*.log { daily missingok rotate 30 compress delaycompress copytruncate notifempty dateext create 644 results results }
I'm pretty sure copytruncate is the problem. As documented in the manpage for logrotate for that option: "Note that there is a very small time slice between copying the file and truncating it, so some logging data might be lost." The amount of data lost will depend on how rapidly data is being written to the file. You'll want to switch away from that option, and the postrotate option may be the best way to go if you want to keep using logrotate. However, your app would need to be modified to be able to respond to some kind of signal so that it can re-open the log file. If modifying your application in that way isn't feasible, then you'll want to consider one of the other solutions suggested earlier in this thread. I personally recommend against using the syslog protocol due to limitations around handling embedded newlines in your log messages, but if you can avoid or escape the newlines, it should be fine. -Jeremy

Sounds like a thing to test would be removing copytruncate and seeing what happens. Thanks for the insight

A super quick and very dirty test of concept: test.rb has one thread that writes to logger and another thread that renames the file and sends a HUP signal (representing logrotate). $ rm /tmp/loggerhup* ; ruby test.rb /tmp/loggerhup & tail -F /tmp/loggerhup [1] 44773 tail: cannot open '/tmp/loggerhup' for reading: No such file or directory do tail: '/tmp/loggerhup' has appeared; following new file # Logfile created on 2023-08-04 17:23:46 -0400 by logger.rb/v1.4.2 I, [2023-08-04T17:23:47.795742 #44773] INFO -- : 1 I, [2023-08-04T17:23:47.952905 #44773] INFO -- : 2 I, [2023-08-04T17:23:48.051453 #44773] INFO -- : 3 I, [2023-08-04T17:23:48.274491 #44773] INFO -- : 4 I, [2023-08-04T17:23:48.359070 #44773] INFO -- : 5 D, [2023-08-04T17:23:48.797585 #44773] DEBUG -- : Received signal, reopening... tail: '/tmp/loggerhup' has been replaced; following new file # Logfile created on 2023-08-04 17:23:52 -0400 by logger.rb/v1.4.2 I, [2023-08-04T17:23:52.951757 #44773] INFO -- : 14 I, [2023-08-04T17:23:53.281451 #44773] INFO -- : 15 I, [2023-08-04T17:23:53.457322 #44773] INFO -- : 16 I, [2023-08-04T17:23:54.426525 #44773] INFO -- : 17 D, [2023-08-04T17:23:54.799463 #44773] DEBUG -- : Received signal, reopening... end tail: '/tmp/loggerhup' has been replaced; following new file # Logfile created on 2023-08-04 17:23:56 -0400 by logger.rb/v1.4.2 I, [2023-08-04T17:23:57.443366 #44773] INFO -- : 22 I, [2023-08-04T17:23:57.884349 #44773] INFO -- : 23 I, [2023-08-04T17:23:58.713242 #44773] INFO -- : 24 ^C[1]+ Done ruby test.rb /tmp/loggerhup After the run, no messages are missing: $ grep . /tmp/loggerhup* /tmp/loggerhup:# Logfile created on 2023-08-04 17:23:56 -0400 by logger.rb/v1.4.2 /tmp/loggerhup:I, [2023-08-04T17:23:57.443366 #44773] INFO -- : 22 /tmp/loggerhup:I, [2023-08-04T17:23:57.884349 #44773] INFO -- : 23 /tmp/loggerhup:I, [2023-08-04T17:23:58.713242 #44773] INFO -- : 24 /tmp/loggerhup-005:# Logfile created on 2023-08-04 17:23:46 -0400 by logger.rb/v1.4.2 /tmp/loggerhup-005:I, [2023-08-04T17:23:47.795742 #44773] INFO -- : 1 /tmp/loggerhup-005:I, [2023-08-04T17:23:47.952905 #44773] INFO -- : 2 /tmp/loggerhup-005:I, [2023-08-04T17:23:48.051453 #44773] INFO -- : 3 /tmp/loggerhup-005:I, [2023-08-04T17:23:48.274491 #44773] INFO -- : 4 /tmp/loggerhup-005:I, [2023-08-04T17:23:48.359070 #44773] INFO -- : 5 /tmp/loggerhup-005:D, [2023-08-04T17:23:48.797585 #44773] DEBUG -- : Received signal, reopening... /tmp/loggerhup-008:# Logfile created on 2023-08-04 17:23:48 -0400 by logger.rb/v1.4.2 /tmp/loggerhup-008:I, [2023-08-04T17:23:49.313963 #44773] INFO -- : 6 /tmp/loggerhup-008:I, [2023-08-04T17:23:50.008360 #44773] INFO -- : 7 /tmp/loggerhup-008:I, [2023-08-04T17:23:50.131976 #44773] INFO -- : 8 /tmp/loggerhup-008:D, [2023-08-04T17:23:50.798364 #44773] DEBUG -- : Received signal, reopening... /tmp/loggerhup-013:# Logfile created on 2023-08-04 17:23:50 -0400 by logger.rb/v1.4.2 /tmp/loggerhup-013:I, [2023-08-04T17:23:50.988746 #44773] INFO -- : 9 /tmp/loggerhup-013:I, [2023-08-04T17:23:51.255564 #44773] INFO -- : 10 /tmp/loggerhup-013:I, [2023-08-04T17:23:52.133295 #44773] INFO -- : 11 /tmp/loggerhup-013:I, [2023-08-04T17:23:52.193747 #44773] INFO -- : 12 /tmp/loggerhup-013:I, [2023-08-04T17:23:52.499093 #44773] INFO -- : 13 /tmp/loggerhup-013:D, [2023-08-04T17:23:52.798819 #44773] DEBUG -- : Received signal, reopening... /tmp/loggerhup-017:# Logfile created on 2023-08-04 17:23:52 -0400 by logger.rb/v1.4.2 /tmp/loggerhup-017:I, [2023-08-04T17:23:52.951757 #44773] INFO -- : 14 /tmp/loggerhup-017:I, [2023-08-04T17:23:53.281451 #44773] INFO -- : 15 /tmp/loggerhup-017:I, [2023-08-04T17:23:53.457322 #44773] INFO -- : 16 /tmp/loggerhup-017:I, [2023-08-04T17:23:54.426525 #44773] INFO -- : 17 /tmp/loggerhup-017:D, [2023-08-04T17:23:54.799463 #44773] DEBUG -- : Received signal, reopening... /tmp/loggerhup-021:# Logfile created on 2023-08-04 17:23:54 -0400 by logger.rb/v1.4.2 /tmp/loggerhup-021:I, [2023-08-04T17:23:55.304150 #44773] INFO -- : 18 /tmp/loggerhup-021:I, [2023-08-04T17:23:56.029091 #44773] INFO -- : 19 /tmp/loggerhup-021:I, [2023-08-04T17:23:56.160495 #44773] INFO -- : 20 /tmp/loggerhup-021:I, [2023-08-04T17:23:56.719132 #44773] INFO -- : 21 /tmp/loggerhup-021:D, [2023-08-04T17:23:56.800298 #44773] DEBUG -- : Received signal, reopening... $ cat test.rb #!/usr/bin/env ruby require 'logger' require_relative 'loggerhup' logger = Logger.new(ARGV[0]) logger.level = Logger::DEBUG logger.reopen_on_signal puts 'do' counter = 0 sleep 1 Thread.new do loop do logger.info(counter+=1); sleep rand end end sleep 1 Thread.new do loop do File.rename(ARGV[0], "%s-%03d" % [ARGV[0], counter]) rescue nil; Process.kill('HUP', 0); sleep(2) end end sleep 10 puts 'end' $ cat loggerhup.rb class Logger def reopen_on_signal(signal = 'HUP') if @_hupq or @_hupt warn 'Ignoring repeated call to #reopen_on_signal' return end @_hupq ||= Queue.new @_hupt ||= Thread.new do loop do @_hupq.pop debug 'Received signal, reopening...' reopen end end Signal.trap(signal) do @_hupq.push(nil) end end end Can't call Logger#reopen directly from the signal handler since it uses a Mutex, hence the Queue hack: https://bugs.ruby-lang.org/issues/14222#note-3
participants (4)
-
Frank J. Cameron
-
Ivo Herweijer
-
Jeremy Bopp
-
Peter Hickman