I suppose we need to conduct an experiment. I want you to do the following:
- Choose any collection (or your own).
- Select n, where n < total number of reviews. For example, if your collection has 10,000 reivews, you can set n=1,000.
- Optimize parameters based on n reviews.
- Increment n by 1.
- Optimize parameters again. If the new ones are better for the new dataset, keep them, otherwise keep old parameters.
Repeat the procedure described above until n=total number of reviews. In order to speed it up, you can increment n by 10 or 50 instead of 1, if you want to.
I want you to do it twice: with a fixed seed and with n used as seed. Then we’ll see whether there is a difference.