Advanced Spanish Words deck

Thanks, I fixed it locally. You will see the change in the next upload. Feel free to contribute anything else you find along the way.

The current LLM is Mistral Large 2411 Q3. That’s the biggest I could fit locally in ram, hence the Q3, but it’s also produced the best results so far. Most of the LLM I tried don’t actually follow instructions, so I had a bunch of python code checking the output result to make sure there was exactly 10 sentences without extraneous text, but Mistral was much better behaved without need any retries.

It will still vary the output a little bit, such as using different types of bullet points and random indention, so I have some extra code to extract each bullet point and fix the indention after the fact. The goal with LLM isn’t to make perfect output, it’s to make output that’s good enough so that it’s machine readable andy my code can fix it up after the fact. For example, I extract the bullet points with: re.match(r"^\s*[-•*◦–‣]", line) to get every possible type a query response might give. Bullet points are actually useful for text extraction, because otherwise the code won’t know how to differentiate the actual sentences from the extraneous text.

This is my current prompt.

prompt = f'''Write exactly {expected} Spanish-English sentence pairs using the Spanish word {word} in context. The sentences should accurately reflect how the word is used in real life. Vary the sentence structure using the word in different positions. Try using different conjugations of the word in different contexts.
	
	The definition of {word} should be similar to '{back}'
		
	The sentence pairs consist of a Spanish sentence using {word} in context along with it's English translation.
	Make bullet points for each sentence. Only one sentence per bullet point.
	
	Procedure:
	Output the spanish sentence on a bullet point.
	Then output the english translation on the next bullet point.
'''

For your prompt, I suggest breaking it up into 2 different prompts, one to make the sentences and another to provide the “cultural notes.” The LLM don’t handle slang as well, so I always exclude them from the automated run.

1 Like