Advanced Spanish Words deck

SurpriseDog · June 8, 2024, 12:25am

This is the forum for the Advanced Spanish Words

I started this deck after doing one of the top 5k Spanish decks and finding they were missing a lot of words, so I’ve been adding every advanced word I come across watching YouTube and other media.

Originally it was intended for personal use, before I decided to share it so it may be missing a few words that I already knew before starting anki.

This is a work in progress and suggestions are welcome. Also if you have any free to use pictures or sentences for the slang deck, share them here.

suiyuan · June 8, 2024, 10:25am

Great job! Congratulations! That’s really a lot of work.

A couple of questions:

how did you get the ChatGPT examples, did you use a specific addon? did you use the API?
did you happen to share your FPM program somewhere?

Thanks for sharing!

SurpriseDog · June 8, 2024, 2:32pm

This is the prompt I use:

I am going to give you a word in spanish and I would like you to define the word and make a series of example sentences in spanish with their english translation.

The format of each example sentence should be: spanish sentence ; english translation.

Please always use this format!

The sentences should be short and simple!

Use bullet points for each sentence. Here is the first word: indicado

===

Usually it will screw up the first time and you have to argue with it to make it follow instructions. Then once it settles into a good pattern, you can just keep supplying fresh words and will keep generating what you see in the cards.

I’ll post a link to wordtree.py on github, once I get it uploaded

suiyuan · June 8, 2024, 5:52pm

Cool, thanks a lot!

As a suggestion for the deck, I think I would be good to have a different field for examples (and images, I guess, even POS when present). That way they can be easily styled differently if needed.

Regarding ChatGPT, so do you do that manually and then copy each definition and example from the chat to each note? That’s a lot of work! ¡¡Ánimo!!

SurpriseDog · June 8, 2024, 10:46pm

Yeah I thought about automating the sentence thing, but chatgpt screws up frequently. More importantly, I’m also learning the cards as I create them, so I like to look at each one and read a few sentences for my own understanding.

My current workflow is to check each candidate word for frequency, (and sometimes google trends, quora hits…), give the word to chat, look for a free image and then copy paste it all together. By far, finding relevant free images take the most time so that’s where I’m asking if anyone wants to contribute.

suiyuan · June 9, 2024, 10:24am

Makes sense, I’m sure the whole process will help you remember everything better.

Regarding images, I can think of two options. Using an addon like Batch Download Pictures From Google Images and setting it up so it looks for images on Wikimedia or custom websites like Pixabay or Freepik (using “site:pixabay.com”), for example.

Also, you could use AI to generate images, which I believe are royalty free. There are many (pretty fast) options for this now 1, 2, 3, 4, 5…

SurpriseDog · June 9, 2024, 5:32pm

Here’s a link to my word fpm program on Github: GitHub - SurpriseDog/WordTree: Look up any word and determine its FPM (Frequency Per Million) including ALL of it's conjugations.

I’ve fixed some bugs to get this working in Windows, but it’s only been thoroughly tested in Linux where I use it everyday. If you find a Windows or mac bug, copy paste the command line used and the terminal output to the forum thread or raise an issue on my github.

This the command line I usually run when checking out new words for ASW: ./wordtree.py --book ‘Harry Potter. La coleccion completa - J.K. Rowling.txt’ --anki ~/.local/share/Anki2/User\ 1/collection.anki2 --nostars

• The book is a collection of the first 10 books of Harry Potter in .txt (must be in text format)
• --anki points at my personal anki collection so I can make sure a word isn’t already in my anki collection. (this location is different in Windows/Mac)
• --nostars ignores conjugations with abnormally high fpms. 9/10 times these are nouns with another meaning that has nothing to do with the original verb. Let me know if you think I should make this the default.

suiyuan · June 9, 2024, 5:52pm

You mean like past participles turned nouns (like hecho) and things like trabajo meaning job instead of I work?

SurpriseDog · June 9, 2024, 6:01pm

Yes, but trabajo (861 fpm) doesn’t meet the threshold to set off trabajar (234 fpm). It’s based on how much more frequent a conjugation is vs the root verb.

A better example would be privar at 0.7 vs privado at 44 fpm. Privado also means private, not just the past participle of privar so it’s fpm is way too high.

By eliminating the star words, I have it reporting privar at a total of 5 fpm vs 88 with all the star words.

When you run the program, this will make more sense.

suiyuan · June 9, 2024, 6:12pm

Got it, thanks for explaining! That makes sense. And it sounds like it should probably be the default, yes.

I’ll try to play with it when I have a bit of time.

Btw, the link to the OpenSubtitles corpus at the end of the readme seems to be broken.

SurpriseDog · June 22, 2024, 1:57pm

Will this thread get locked if no one posts every 30 days? I’ve noticed quite a few threads have a note from a bot:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

I intended this to be an open thread to gather feedback far into the future, so I’m just wondering what I have to do to keep it open.

Danika_Dakika · June 22, 2024, 3:01pm

No, in this category posts don’t automatically lock.

SurpriseDog · June 22, 2024, 3:11pm

So threads like this one were locked manually?

https://forums.ankiweb.net/t/bihevioral-table-of-elements-body-language/32134/2

Danika_Dakika · June 22, 2024, 3:19pm

That post was created in a different (wrong) category, and inherited the settings from there. If you scroll down the category page for Shared Decks - Anki Forums, you’ll see that only a scant few posts are locked.

otherdave · July 10, 2024, 6:48pm

I LOVE THIS and thanks so much for sharing the code.

A few quick questions:

Do you want questions about the repo here or maybe as issues in GitHub?
I think --csv is broken, or I’m not using it correctly. I can post details here or GH.
If I wanted to point wordtree to process every word in a file, what would the command line parameters be? What do I set min/max/threshold to?

Regarding #3 - I have been wanting to write a program that would let me give it a list of conjugated and pluralized words like:

Hablo
Hablaste
Tarjetas

And get back:

Hablar
Tarjeta

That is, get the infinitives and the singular form of nouns/adjectives/etc…

Your code is super-close already! So I wanted to start down the road of adding this functionality.

My reason for this is that I use LingQ to read and it lets you create a list of vocab words while you go. It doesn’t do anything to try and find common words. So if I tell it that I know the word “perro”, it will show me “perros” as an unknown word. I want to run my list of unknown words through an app that will generate the infinitives and roots of words and THEN make my flashcards from there.

SurpriseDog · July 11, 2024, 4:43pm

I fixed the output csv function. Looks like a minor bug.

If you want to process a file, the command line is: wordtree.py input.txt

Just put a new word on every line of the file. So input.txt would like:

word1
word2
word3

The --min/max are not required inputs. Personally I use --min 2 when I’m processing a list of words I collected from watching tv. In my experience, it could be months from when I see a 0-2 fpm word until I see it again, so they are not very useful. I write down English words I don’t know as well and for them I usually set it at --min 0.2 to avoid words that are outside my bailiwick

–threshold just controls how much text appears in the terminal and has no effect on csv output.

otherdave · July 11, 2024, 8:37pm

This is exactly what I needed, thanks so much!

First of all, my words weren’t one per line, so that was bad on my part. After that, your updated code worked perfectly.

Export my LingQs → cut out the column of just the terms / words → wordtree-to-csv → sort and hack the CSV

Now I’ve got my list of new terms and I’m ready to make flashcards.

wordtree only checks that the word isn’t already in Anki, correct? It doesn’t actually add it?

SurpriseDog · July 11, 2024, 9:27pm

Glad it worked for you!

And that is correct, I’m only doing read only mode on the anki database to avoid causing any problems.

otherdave · July 12, 2024, 11:03am

Super. Does it end up checking all of the decks you have? (I’m not sure how the anki DB is structured). Also, it gives me errors and says it can’t open my DB:

$ ./wordtree.py input.txt --noentry --anki ./database.anki2                                          
Language set to default: es Spanish
Use --lang to change

Using frequency file: freq/es.xz
Loading anki database...
	Connecting to database: ./database.anki2
	Error: unable to open database file
	
Failed to connect to database.
	Make sure the filename is correct or try closing anki before proceeding.

This is on a mac and I copied my anki database to the same working directory as word tree to avoid any directory/spaces/etc… issues. When I try to open it from the actual DB location, I get the same error. Anything I can do to help debug?

SurpriseDog · July 12, 2024, 11:43am

Can you post the original full path of the filename you are trying to open? On my computer it’s ‘/home/username/.local/share/Anki2/User 1/collection.anki2’

I noticed yours was called database.anki2, but it should be called collection.anki2

Topic		Replies	Views
HELP ! Why are the new words in this deck hidden? Help	3	105	July 26, 2024
[AnkiWeb] Downloading decks UX Suggestions	1	350	May 1, 2021
Anki Languages MegaDecks Suggestions	1	1287	June 8, 2024
Contact with the maker of an Anki deck Help	4	500	May 1, 2023
Adding new words to a deck Help	5	1057	May 1, 2023

Advanced Spanish Words deck

Related topics