Reading @Expertium blog’s benchmark page, the following section caught my attention:
Consider the content of the cards: text, sound, and images. It would require adding another machine learning algorithm (or even several algorithms) just for text/audio/image recognition, and we wouldn’t be able to train it since Dae (the main Anki dev) can’t give us a dataset that has all of the content of cards. That is against Anki’s privacy policy, only scheduling data is available publicly.
It’s clear Damien won’t be able to provide the dataset with card content included, for obvious reasons, but what if users were asked to voluntarily send it themselves? Could that help gather a critical mass necessary for training?
Where will the data be sent? Or, you’re saying AnkiWeb should have a opt-in for sharing card data? Don’t you think it’s a privacy issue if someone unknowingly opts into that?
To any server chosen by Expertium/L.M.Sherlock/dae. As to how, either through a form or through an option inside Ankiweb/Anki/fsrs4anki helper.
In such case it would be off by default and a warning would be displayed to ensure a user is fully aware before proceeding, and if that’s not enough, users could be required to check consent boxes as well.
@dae what do you think? If this is feasible I’m more than happy to offer my help. Take your time if you need to research the matter before responding, just wanted to make sure this thread didn’t get lost in your notifications.
I’m skeptical it will yield useful results, and don’t have the time to update AnkiWeb to handle this. As users would have to opt-in anyway, I suggest you implement it as an add-on that will upload the data somewhere instead.
Thank you for your feedback, dae, it’s appreciated. I’m about as skeptical, but I think it’s worth a try. If @L.M.Sherlock and @Expertium have no provider or plan of their own, I’ll look for appropriate providers to handle the data and then work on writing the add-on.
Yes GDPR really require something is not enabled by default, and as someone who is adding/mining content from private conversation, it would be quite problematic that those are just exported by default
Do you mean by text analysis, it takes into account the interference of what the text of different cards could have upon the retention of other cards I suppose this does not work with various languages (like German) @L.M.Sherlock you might want to see this.