LMSherlock shared this graph with me
If you optimize after every 1000 reviews, there is a ~75% chance that logloss will be better. Though, I think this is just one collection, so the number could be different for other people.
Also, here are his thoughts:
The stochastic gradient descent is stochastic. There is not theory assuring the algorithm could find the global minimum. And you can see that the difference is very small between last weights and current weights.
And log loss is related to the average retention.