Running into a slight issue having to do with the custom unicase
collation used in the new database schema. This is a simple case folding library used in the backend to define the ordering on some text fields.
Unfortunately, it plays havoc with sqldiff
, which for obvious reasons, doesn’t have the definition of this function, and thus errors-out. See the output below.
The point of all this is to compute diffs using sqldiff
, which outputs a “patch” in the form of a block valid SQL commands, and then parse that SQL to apply a minimal changeset in a filesystem representation of the collection. In order to do this, I’ve written a custom SQLite EBNF grammar in Lark to convert these commands into python dataclasses. So I’ve basically rolled-my-own Anki database-specific ORM. Those dataclasses are then used to make changes in the filesystem, but that’s out of scope for this question.
I’ve been fuzzing the parser I wrote using hypothesis. Basically, I’ve got a state machine that makes performs arbitrary, random operations on an initially empty collection via the anki
python API (stuff like adding notes, modifying notetypes, etc.) and then I take two random snapshots of the database during this mutation process, and diff them with sqldiff
. Then I parse the diff and see if it breaks the parser or any of the downstream logic.
To give some context, here’s a bit of example code from the state machine that adds a randomly generated note to the collection:
@precondition(lambda self: len(list(self.col.decks.all_names_and_ids())) >= 1)
@rule(data=st.data())
def add_note(self, data: st.DataObject) -> None:
"""Add a new note with random fields."""
nt: NotetypeDict = data.draw(st.sampled_from(self.col.models.all()), "nt")
note: Note = self.col.new_note(nt)
n: int = len(self.col.models.field_names(nt))
fieldlists: SearchStrategy = st.lists(self.fields, min_size=n, max_size=n)
note.fields = data.draw(fieldlists, "add note: fields")
dids = list(map(lambda d: d.id, self.col.decks.all_names_and_ids()))
did: int = data.draw(st.sampled_from(dids), "add note: did")
self.col.add_note(note, did)
Now the workaround I’ve found for the above issue is to downgrade the database to v11
before running sqldiff
. So we start with an empty collection, perform a bunch of operations that insert/update/delete notes, decks, models, and tags, and then snapshot, so call col.close(downgrade=True)
, copy the .anki2
file, and then perform some more ops, and snapshot again, and diff the two files.
Unfortunately, after profiling with pyinstrument, the col.close(downgrade=True)
and subsequent col.reopen()
calls are extremely slow in comparison to everything else pytest
is doing. I believe if I can avoid the downgrade step, a simple col.save()
will do the trick, and that will speed things up dramatically. This is why I need to workaround the unicase
collation. Alternatively, a way of simply dumping the database file in v11
format without closing the collection would work as well, but I’m not sure the current API supports that. I’m not afraid of digging into the rust backend or rewriting the downgrade logic in python, but I’m not sure if that’s ill-advised or not.
I’d love to hear anyone’s thoughts on how best to approach this! Thanks!
sqldiff: SQL statement error: no such collation sequence: unicase
"SELECT B.id, 1, -- changed row
A.name IS NOT B.name, B.name,
A.mtime_secs IS NOT B.mtime_secs, B.mtime_secs,
A.usn IS NOT B.usn, B.usn,
A.config IS NOT B.config, B.config
FROM main.deck_config A, aux.deck_config B
WHERE A.id=B.id
AND (A.name IS NOT B.name
OR A.mtime_secs IS NOT B.mtime_secs
OR A.usn IS NOT B.usn
OR A.config IS NOT B.config)
UNION ALL
SELECT A.id, 2, -- deleted row
NULL, NULL,
NULL, NULL,
NULL, NULL,
NULL, NULL
FROM main.deck_config A
WHERE NOT EXISTS(SELECT 1 FROM aux.deck_config B
WHERE A.id=B.id)
UNION ALL
SELECT B.id, 3, -- inserted row
1, B.name,
1, B.mtime_secs,
1, B.usn,
1, B.config
FROM aux.deck_config B
WHERE NOT EXISTS(SELECT 1 FROM main.deck_config A
WHERE A.id=B.id)
ORDER BY 1;
"