Update deck via extended .csv with new synonyms, avoiding old duplicates

I have a .csv file containing my vocabulary list that I import into an Anki deck. The .csv file has the following formatting for each row:

back_korean,front_english

For example:
어휘,vocabulary

I keep studying and expanding my vocabulary list, thus, my .csv file grows. After I expand the .csv file, I want to reimport it to my Anki deck.

I want to update the existing notes when the first field matches, however, I want to accept new synonyms.

For example, if my csv file now has fields:
어휘,vocabulary
단어,vocabulary

I would like both cards to be present in the set. Anki needs to discard re-importing the first one, since it already was added, and include the second one, as it is new.

I don’t see how I can instruct Anki to do this, I would think some sort of third field would be needed for the import, for example, counting the row of the line in the .csv file, so that it can refer to that ID and decide whether an already existing note needs to be updated or whether a new note needs to be created.

How can I achieve this? Is there any plugin for this?

Instead of entering your new vocabulary in a csv and re-importing, have you considered just adding your new words directly as notes in Anki? Saves you a step.

If you’re stuck with the csv, there’s lots of ways to approach this –

  • Use Anki’s importing system to avoid making duplicate cards – Text Files - Anki Manual
  • Make a new csv after each import “batch,” so you won’t be re-importing duplicates
  • Use export (Exporting - Anki Manual) to create the csv that you then add additional vocab to, so that when you re-import Anki will recognize the unique IDs of existing notes
2 Likes

I edit a single Excel file, which automatically generates a .csv file upon saving it (I have automated this process). Entering the values in Anki manually is much slower than editing the Excel file, from which I can also produce e.g. a PDF etc.

With the aim of automating this process as much as possible, re-importing the .csv in batches is a quicker way of getting my deck updated.

Imagine this scenario:

In batch 1, there is this pair on the CSV file:
어휘,vocabulary

In batch 2, I added a synonym of the word “vocabulary” in Korean, so that I now have 2 words that mean “vocabulary” in English:
어휘,vocabulary
단어,vocabulary

If I now upload the .csv file to Anki, it will update the “old” vocabulary note with the “new” one, resulting in only this ending up in Anki:
단어,vocabulary

Naturally, I want both words to be present. However, Anki has no way of telling whether it should replace the “old” one by the “new” one or import the “new” one and keep the “old” one: the “update existing notes when first field matches” option is forcing it to update the old note.

This is the problem: on the one hand, I want Anki to not re-import old notes from the CSV that have already been imported before (by checking whether they already existed, and if so, by updating them with the new field on the CSV, as I could have updated a note because of a faulty translation or an enriched addition), on the other hand, I want it to add new synonyms that I might have included.

The only way I have come up with so far consists of appending an ID to the CSV file every time a synonym is included:
어휘,vocabulary -1-
단어,vocabulary -2-

This is the way of getting a unique identifier for each note. It’s not a really elegant solution, but I don’t see any other possibility if Anki only allows its import to be customized based on the matches of the first field.

1 Like

If you are using the non-unique English word as your first field, the solution seems to be – use the unique Korean word as your first field instead.

Or (as I already suggested), use the unique IDs that Anki generates.

1 Like

:+1: Best and easiest solution, imo.
Cards should not be affected, they still refer to the same field names.

If you cannot do this, because the Korean field may also have duplicates, consider redesigning your note-type.

My language notes have a dedicated field to hold synonymes. This goes against the advice of keeping cards atomic, small and simple. But I find it easier to study languages this way, seeing synonyms on the same card. (I have a field for antonyms, too). There will be redundancy either way, so I decided to take this route.

2 Likes

@Danika_Dakika Exporting a CSV from Anki and then having to edit it manually adds a lot of overhead.

@BookBorsh As I want to keep cards atomic, small, and simple, adding fields for synonyms doesn’t work for me.

The “best compromise” I have found for dealing with this is adding an ID to each column of the CSV file based on the number of encounters along the column:
어휘,vocabulary -1-
단어,vocabulary -2-

The -#id- tag is, thus, added to both the English words and the Korean words. By doing this, I avoid having any repeated fields.

However, this leaves one case uncovered: if I delete an entry in the CSV file, it is still kept in Anki… There is no workaround other than manually deleting it with my method.

TBH, I would wish Anki to be able to ingest a ID in the CSV file for each entry, so that it is formatted in this way:
word,translation,unique_ID

That would solve all problems at once.

That’s exactly what Danika_Dakika suggested to you, right in the first answer to your question. And even directed you to the help topic that covers it.

Assuming you let Anki handle the unique ID, here is what you do:
You enable the GUID column, then open the file in Excel. I edited Korean 1 and I added Korean 3a:

Instead of deleting rows in Excel, you might want to add a column that holds a tag. Then in Anki, you select all imported notes with that tag and delete them.
On your next export they will be gone from your Excel file, too.

2 Likes

@BookBorsh thanks for your thorough reply!

I just updated Anki to the latest version, I was not doing that before because of the latest version being incompatible with the Card Retirement add-on.

With the latest version, I am able to export my Anki deck with the GUID field enabled. I then import it to Excel with the following menu:

Note that I don’t simply drag and drop the file into Excel because the Korean words in Hangul lose their formatting that way.

When uploading the file, I cannot select any import options for choosing a value of header lines == 3, though:

I have tried to also select “Transform Data”, which allows me to, at most, select the first row as the header:

In any case, I can simply delete the first three rows from the final table in a manual fashion.

This is indeed a very elegant way of working, I am going to explore it more. I guess I would have to generate myself a random GUID that hasn’t been before used in the table for any new rows that I further add to the table in Excel, before uploading it to Anki?

I have no Excel at hand. Your table looks good to me, though, you have the same 3 header rows I get in my spreadsheet program.

Not a good idea. Do yourself a favor and READ the help topic that Danika_Dakika linked you to. You need the #header rows on your way back to Anki.

Nope. Study my screenshot and realize that Anki generates the IDs. You add new rows with the GUID field left empty.

Like I said, my example assumes that you let Anki handle the IDs. If you are not comfortable with it, then follow the other solution given to you by Danika_Dakika.

2 Likes

I have been playing with the exporting/importing option and have made it work. It looks to me like the most elegant solution, thanks @Danika_Dakika and @BookBorsh .

To make it perfect, I would love to be able to write a Python program that, whenever sees that the TXT file gets updated, automatically imports it into the Anki deck, runs the deletion of the notes tagged for deletion, exports it from Anki into my PC and replaces the old TXT file with it. I will explore this possibility, maybe with something like the Anki CSV Importer

1 Like

I have been using the exporting/importing of the TXT file with the GUID generated by Anki and I am very happy with it.

However, today I am facing an odd scenario: I am trying to include a new note in my Excel table. Thus, I leave the “GUID” column blank, as Anki will fill it up for me. The note has a translation that matches another note (as they are synonyms):

GUID_1 \t Korean_term_1 \t Translation_1
\t Korean_term_2 \t Translation_1

For whatever reason, instead of importing the second note, assigning it a unique GUID, and treating it as a different note, Anki tells me that “1 note was used to update existing ones”, and, thus, it updates the old note, keeping only Korean_term_2, which overwrites Korean_term_1.

Why is Anki not taking into account the GUID field anymore? @BookBorsh

The TXT file I import into Anki (which is the one I get after exporting, just with the added note) looks like this (\t being a tab space, used as the separator, and the dots marking the long list of other terms not relevant to this example):

#separator:tab			
#html:false			
#guid column:1			
#tags column:4
.
.
GUID_1 \t Korean_term_1 \t Translation_1
.
.
.
\t Korean_term_2 \t Translation_1		

I am not quite getting what I am doing wrong here?

Trying to debug the error I have just found, which I have mentioned in my previous comment, I have been playing around with a simple test deck, which starts empty:

#separator:tab
#html:false
#guid column:1
#tags column:1

I then extend the TXT:

#separator:tab
#html:false
#guid column:1
#tags column:1
\t key1 \t val1 \t
\t key2 \t val2 \t

I import that into Anki:

I export it again, its associated TXT now looks like this:

#separator:tab
#html:false
#guid column:1
#tags column:3
b_6Wi?(RWH	key1	val1	
B^!`PV?x<R	key2	val2	

Now, I extend the TXT with another key-value pair, this time, we are re-using an existing key:

#separator:tab
#html:false
#guid column:1
#tags column:3
b_6Wi?(RWH	key1	val1	
B^!`PV?x<R	key2	val2	
\t key1 \t val3 \t

I import it into Anki again:

And here comes the problem:

Anki does not detect that the new line, for which a GUID field was missing, is a new note: instead of assigning a new note to it, it uses key1 for comparing the notes; since key1 has already an existing note, it simply updates it

This is what the deck looks like when exporting it now:

#separator:tab
#html:false
#guid column:1
#tags column:3
b_6Wi?(RWH	key1	val3	
B^!`PV?x<R	key2	val2	

Thus, it seems like this guidance does not seem to work for the case of synonyms, @BookBorsh:

How can I make it work, so that I can manually extend the TXT file and import it into Anki, with Anki assigning a new GUID to it, also for the case of synonyms?

The only workaround I can find to the problem I have mentioned in my previous comments is to manage the GUID field by myself, that is, by including a unique random sequence of e.g. 10 chars in the GUID field for every new word that I add to the TXT.

I’ve created quite a few decks and earlier on I learnt that having a unique identifier for each Note was a requirement.

I use a spreadsheet (LibreOffice Calc) and manage the unique ID (I call it NoteID, but you can call it “Harry”, if you wish). Each sheet in the spreadsheet starts with a unique ID in the first field and I just drag that field down so that each subsequent row of data has an ID that is one greater than the ID on the previous line.

I have a summary sheet that tracks the first and last IDs on each sheet (tab) to ensure that there is no overlap. The summary sheet also suggests the next unique ID I may use for any new tab (for a new deck) that I might use.

Set up something similar and you will find it really easy!

I am not an expert on all things Anki/Excel. I can only tell you what works for me. Like you, I continue using spreadsheets alongside Anki. Planmaker instead of Excel and Akelpad to view/edit plain text files.

The one thing that strikes me as odd, is your \t notation. None of my files has it. Anki does not have it either, on your screenshots, when you export from Anki.

How did you arrive at the \t in your txt files? Did Excel insert them?

I like your approach. Very creative. :+1:
Would not work for me, because I enter new notes on both ends, directly in Anki as well as in my spreadsheet editor. But this might indeed be very good advice for @jae-joong who wanted to keep track of IDs outside of Anki all along.

I did not add any tab characters. You must have selected a form of text file in your export of the data from your spreadsheet (or you saved it as such). You should choose CSV, so that commas are inserted rather than a tab character (‘\t’).

Note, Microsoft Excel is not always correct in its CSV file creation. LibreOffice Calc has been proven to be superior in this area. (And it is free!) Just a warning.

See how you go. I used to add some cards from my smart phone as well, but it soon became clear to me that I preferred them managed. Now all my new cards are created from a spreadsheet. Including any updates or corrections. In that way, I have decided that my spreadsheet is the master copy, and the master copy is only ever imported into the desktop version of Anki.

Yep, I ended up doing something similar, I however manage it with a Python script: whenever I save the Excel spreadsheet, a script is triggered that converts it to a TXT file. It then generates a random 10-character alphanumeric+char string and makes sure it did not get used previously for any note. It then assigns it to the new note. I then have a feedback loop so that the Excel file gets updated with this information. I found this to be the way to handle GUIDs for synonym with Anki.

Roger that, I was just double-checking in case I had missed something. The docs don’t go into much detail concerning the GUID ingestion, so I was left wondering, did I miss some step?

Concerning the \t: I am exporting the Excel table to TXT using the tab space as a separator (sep='\t'). This is a premeditated choice, as I have certain translations for Korean words that have a longer explanation that includes commas, so I am avoiding commas on purpose as a separator.

Naturally, the \t are not printed out in the TXT file, there is just simply a tab space in its place. I simply wrote it myself to highlight that there is a tab space there, not a regular space.

This is how the actual TXT file looks like for the example I mentioned above:

#separator:tab
#html:false
#guid column:1
#tags column:3
b_6Wi?(RWH	key1	val1	
B^!`PV?x<R	key2	val2	
	 key1 	 val3 	

As mentioned in my previosu comments, the use of a tab space for separating columns is premeditated, I am avoiding commas on purpose.

I fully agree with your way of working, I do exactly the same: notes are added on the master file, which is handled in Excel. The file gets then ingested into Anki Desktop, and from there, it syncs to the Anki servers.

You do know the the CSV protocol insists that fields containing commas be protected (usually with double quotation marks)? Your software should manage that upon export or save.