Bug with importer for files containing quotes

If a wrong delimiter is guessed by the importer and the file contains quotes like this .txt file with commas as delimiter:

Row1,
Row2,"|""|"
Row3,"|""|"
...

The “new” importer will create a single row for Row2 and Row3 including extra columns.
This means the entire file will be displayed in the sample:

For a large enough file this results in the importer loading forever (the loading time grows exponentially with the file size).

The legacy importer handled this fine, but as it can no longer be used in the new version, users are forced to deal with file headers or avoid using other delimiters.

Are you able to provide an example file that fails to load in a reasonable time?

If you don’t want to use file headers, I suggest you pick a different delimiter, or quote all fields. Commas often appear in human text, so other characters will be preferentially chosen as the delimiter.

Sure, here is an example which takes 25 seconds to load for my pc.

I’ll be using file headers from now on though in order to prevent switching delimiters all the time on my file reader for datasets.

Thanks for that. I’ve logged the issue on Bug with importer for files containing quotes · Issue #3588 · ankitects/anki · GitHub so it doesn’t get lost. If you could leave that link active until someone has a chance to investigate, that would be appreciated.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.