Import Problems / Suggestions

Hi,

for my studies I worked on a tool to create my flashcards in normal text files (markdown like, asciidoctor), so they could be part of my learning scripts, be ordered and managed by git.
I then wrote a script (asciidoctor extension -> https://github.com/Leon0402/asciidoctor-flashcard) so all flashcards would be extracted to a format Anki understands.

While it was kinda usable I noticed a bunch of limitations in what is possible and it would be great if they could be imroved.

Perhaps first the requirements I had (so my issues might be better understandable):

  • One File (or one Asciidoctor documents, which could consist of more files with include statements) should correspond to one (Sub)Deck
  • All capabilities of asciidoctor such as images, tables, lists should be supported (Asciidoctor is compiled to html)
  • I want to be able to change the deck later. Therefore it must be possible to edit a flashcard in my asciidoctor file and anki must recognize that it is still the same card. It must also be possible to add or remove cards
  • It should be rather simple. So easy importing, no manual changing of anything …

My general idea to fit these requirements is to add an id to every flashcard, so anki knows when a flashcard changes. And it’s all html, which anki is able to render.

Major Limitations:

  • Anki wants my IDs to be unique in the whole collection rather than in the (Sub)Deck. It’s quite difficult to keep track of ids I already used and the ide cannot help with automatically suggesting IDs because it has no way of knowing, which ids are already taken. Furthermore with deck sharing there is a high risk that cards are overwritten
    -> Proposal: Flashcards only have to be unique per (Sub)Deck. So Anki looks for matching (sub)decks + id to determine if the flashcard is new or not
  • There is no way to delete a card except manually doing it in Anki
    -> Proposal: Add an option to say “This import file is my complete deck”, delete cards which are missing

Minor Limitations / issues:

  • Media files are not automatically imported
    -> Proposal: Search for media included (anki already has the functionalities) and look for the images based on the path provided and important them automatically (Display error message if not found)
  • The import process is quite manual, you need to select the correct Deck and options and it’s super easy to override a different (Sub)Deck
    -> Proposal: Allow meta information in the first line of the .txt file such as the deck name, note type (Perhaps with the option to specify the note type for each card individually, which would really be powerful)
  • The note type needs to be created automatically including (!!!) the css style, so the html imported looks actually alright
    -> Proposal: Add some functionality to import note types. This would allow me to export the css code and fields needed (such as my special layout with an “id” field).
  • I need to manually import all files. This is quite some work as I made one Deck for each subject and one Subdeck for each Topic in that subject. So it could be easy like 50 files to import.
    -> Proposal: Add the possibility to import a complete folder (with .txt files) -> This really makes only sense if the “metadata” thing is added, so you can specify in the file, where these should be imported.

I know I’m asking quite a lot here :slight_smile: After working 3 month with it, you just notice a lot of things that are no ideal yet. In general, I would also be up with providing some pull requests if there is some general agreement on how the improved import system should look like.

1 Like

Can I get some opinions here? Or should I just implement it? I would rather want to avoid opening a PR which then is rejected, because you don’t like the ideas.

Wouldn’t it be easier to use the CSV importing functionality?

Why does this have to be integrated into Anki natively? Sounds like it would work well as an add-on.

2 Likes

What exactly do you mean with csv importing functionality?

I actually use txt seperated by comma / tab already (which kinda is csv). So I guess I’m already referring to csv above.

Why shouldn’t it?

I guess you could definitely realize some of this as an addon, like automatic media import. Let’s see:

  • Unique Ids → I guess you could possibly do this without changing Anki Core. For example by constructing the id based on the deck name (which is actually what I plan to do for now, but didn’t think about back then) → But imo handling uniqueness the way I suggested is more useful in general, so I see no particular reason to block a PR here. But it’s not as important as I originally stated anymore.

  • Delete Option → This is imo something for core Anki, it directly targets the import functionality of anki. Making it a plugin (if possible at all) doesn’t make sense here imo

  • Automatic import of media files → You could do this in a plugin. But as most of this is already in anki core (mainly searching for images in cards), it would just make sense to implement the rest as well in anki core

  • Txt Import format enhancements → Much easier to implement in Anki core and a huge imporovement to the import functionality. You could do this as a plugin, but then all the import functionality should really be a plugin

  • Note Type import → Perhaps the best candidate for a plugin as this sound a little bit niche in deed.

  • Import folder → Seems like a pretty basic operation, see no reason to not do in Anki

In the end, it probably depends on what rules you have for what should be core and what should be a plugin. In my opinion most should be in core, some don’t have to be. But I would still implement all of these in core as importing overall just seems like something that belongs in core.

What exactly makes you think that these shouldn’t be in core? What in particular don’t you see in core?

1 Like

The vast majority of users create their decks inside Anki or import premade decks. The minority who goes the extra mile to set up external tools for the creation of Anki cards can probably be bothered to install an add-on as well.
I’m not saying this shouldn’t be part of Anki, just that it might be better suited as an add-on since the number of users who would benefit from it is small compared to the cost, i.e. maintenance efforts and additional complexity for other users.

2 Likes

Yes, it’s hard to lock down a mechanism for totally open-ended imports. That’s why the CSV is there for something like that.

Thank you for the feedback Leon, I will bear your comments in mind when I update the importing code in the following months.

I’m basically agreeing with you here. Most users don’t need this import and they can be bothered with installing an extension. Although I would then say: Make the whole csv import a plugin. Or in general make all “additional import formats” except the basic (anki deck) a plugin and just make sure there is an API to actually realize such kind of plugins rather easily.
But as parts of it are already in anki core, I personally would prefer improving that. But obviously we could also move that to an extension. I just personally wouldn’t prefer it if functionality here is split up.

@dae Thanks for your answer! Do you have already plans to improve the import functionality? Is there a way to help you?
Not that I have too much free time, but perhaps some small contributions would already help :slight_smile:

Ah I actually found the GitHub Issue :wink:

Perhaps we could also think about what the end goal is i.e. standardize a new improved csv format, then it will be much easier for you and other people from the community to implement it, if there is some time.

My general idea here is: Add some header, where you can specify all the things, you would otherwise manually do in the import dialog (deck, note type, allowHtml …) → Although that wouldn’t be valid csv then anymore

It would be good to have a new CSV import system that creates the deck and populates the entries from a single CSV file with no need to create the deck to start with.
The CSV file title could have the names for the fields, assuming standard names, and the CSV file name itself could be the deck name.

It’s a little bit more tricky to define the card type, Basic or Cloze and so on. Maybe as part of the import dialog. There’s a need to also pre-filter the incoming entries,let’s say the Cloze cards do not have any clozures or they are missing, better to let know as important time rather than opening the deck and noticing issues during training.

Anyway, just a thought. I’m sure someone wants custom decks/cards, then it’s getting complicated. In most cases it’s better to stick to a template or known format than open up the can-of-worms.

I think, if we define a new format, it should be kinda flexible. As you said when sticking with (standard) csv it can get quite tricky.

Perhaps the most flexible way with using csv is embedded metadata: Model for Tabular Data and Metadata on the Web. But honestly, not sure how much standard that really is. It doesn’t seem to be very widespread.

So presumably using a different file format like yaml, toml, json would give the desired flexibility. Especially the first two would allow pretty much anything I guess, so it might be worth consideration.

The last alternative would be to define a custom file type. But imo it’s much better to use some existing standard (will also be easier to implement obviously)

JSON (and yaml, toml) is more flexible, but could be horrifying for non-programmers to set things up (like the last ; in JSON). CSV at least is easy to manipulate in a spreadsheet, export as CSV and then let it be imported.

1 Like

I honestly wouldn’t go with json, I just mentioned it for completeness :slight_smile: My favourite currently would be yaml.

While yaml can get quite complex (like with anchors and stuff) it doesn’t have to be. I’m a developer and biased, but I would say the yaml syntax would be at least as readable as csv if not more.
But you got a point. Obviously you can’t use with spreadsheet programs and it’s probably not suitable for non-programmers.
On the other hand: Who is the target group of this feature anyway?

  • Everyone for exporting & importing → As long as we use csv that won’t probably be the case as you can’t put all the information in csv. With yaml this might become an alternative to the sql export. But I guess for just exporting & importing you don’t need to understand the format anyway (same as for sql).
  • Third Party stuff like my application → Should be no problem for these
  • For editing outside of Anki → Definitely a use case. With a new format this might even get easier (like using asciidoctor for instance). Using Spreadsheets is not possible anymore with that format… but was that really a thing in the first place? I might be wrong here, but I doubt anyone did that.

Anyway, introducing a new format doesn’t mean we have to remove the old one. I think it’s a good idea to define some format, which can do everything (and is also future proof). Then there can be extension, which really just translate into that format.

For instance:
Say we use yaml, which offers everything we could possible need. Should be rather straightforward to write an Anki extension, which just converts csv to yaml. Same with other formats.
In the end this allows basically a good decoupling of anki. You can just always translate into the powerful format, instead of trying to build against the api of anki directly.

1 Like

I like yaml, too. But I suspect the target audience is not programmers, nor those who would figure out how to move something to yaml. My guess are Anki users who want to make their custom content or move content from some other source, where it’s easy for them to take table content or otherwise structured content into a spreadsheet. And from there export it out, as CSV as that’s really one of the few easier formats to use just now. For example, if they use Google Sheets, then this is doable.