Diffing .anki2 files since schema v15

langfield · November 13, 2022, 4:23pm

Running into a slight issue having to do with the custom unicase collation used in the new database schema. This is a simple case folding library used in the backend to define the ordering on some text fields.

Unfortunately, it plays havoc with sqldiff, which for obvious reasons, doesn’t have the definition of this function, and thus errors-out. See the output below.

The point of all this is to compute diffs using sqldiff, which outputs a “patch” in the form of a block valid SQL commands, and then parse that SQL to apply a minimal changeset in a filesystem representation of the collection. In order to do this, I’ve written a custom SQLite EBNF grammar in Lark to convert these commands into python dataclasses. So I’ve basically rolled-my-own Anki database-specific ORM. Those dataclasses are then used to make changes in the filesystem, but that’s out of scope for this question.

I’ve been fuzzing the parser I wrote using hypothesis. Basically, I’ve got a state machine that makes performs arbitrary, random operations on an initially empty collection via the anki python API (stuff like adding notes, modifying notetypes, etc.) and then I take two random snapshots of the database during this mutation process, and diff them with sqldiff. Then I parse the diff and see if it breaks the parser or any of the downstream logic.

To give some context, here’s a bit of example code from the state machine that adds a randomly generated note to the collection:

    @precondition(lambda self: len(list(self.col.decks.all_names_and_ids())) >= 1)
    @rule(data=st.data())
    def add_note(self, data: st.DataObject) -> None:
        """Add a new note with random fields."""
        nt: NotetypeDict = data.draw(st.sampled_from(self.col.models.all()), "nt")
        note: Note = self.col.new_note(nt)
        n: int = len(self.col.models.field_names(nt))
        fieldlists: SearchStrategy = st.lists(self.fields, min_size=n, max_size=n)
        note.fields = data.draw(fieldlists, "add note: fields")
        dids = list(map(lambda d: d.id, self.col.decks.all_names_and_ids()))
        did: int = data.draw(st.sampled_from(dids), "add note: did")
        self.col.add_note(note, did)

Now the workaround I’ve found for the above issue is to downgrade the database to v11 before running sqldiff. So we start with an empty collection, perform a bunch of operations that insert/update/delete notes, decks, models, and tags, and then snapshot, so call col.close(downgrade=True), copy the .anki2 file, and then perform some more ops, and snapshot again, and diff the two files.

Unfortunately, after profiling with pyinstrument, the col.close(downgrade=True) and subsequent col.reopen() calls are extremely slow in comparison to everything else pytest is doing. I believe if I can avoid the downgrade step, a simple col.save() will do the trick, and that will speed things up dramatically. This is why I need to workaround the unicase collation. Alternatively, a way of simply dumping the database file in v11 format without closing the collection would work as well, but I’m not sure the current API supports that. I’m not afraid of digging into the rust backend or rewriting the downgrade logic in python, but I’m not sure if that’s ill-advised or not.

I’d love to hear anyone’s thoughts on how best to approach this! Thanks!

sqldiff: SQL statement error: no such collation sequence: unicase
"SELECT B.id, 1, -- changed row
       A.name IS NOT B.name, B.name,
       A.mtime_secs IS NOT B.mtime_secs, B.mtime_secs,
       A.usn IS NOT B.usn, B.usn,
       A.config IS NOT B.config, B.config
  FROM main.deck_config A, aux.deck_config B
 WHERE A.id=B.id
   AND (A.name IS NOT B.name
        OR A.mtime_secs IS NOT B.mtime_secs
        OR A.usn IS NOT B.usn
        OR A.config IS NOT B.config)
 UNION ALL
SELECT A.id, 2, -- deleted row
       NULL, NULL,
       NULL, NULL,
       NULL, NULL,
       NULL, NULL
  FROM main.deck_config A
 WHERE NOT EXISTS(SELECT 1 FROM aux.deck_config B
                   WHERE A.id=B.id)
 UNION ALL
SELECT B.id, 3, -- inserted row
       1, B.name,
       1, B.mtime_secs,
       1, B.usn,
       1, B.config
  FROM aux.deck_config B
 WHERE NOT EXISTS(SELECT 1 FROM main.deck_config A
                   WHERE A.id=B.id)
 ORDER BY 1;
"

system · December 13, 2022, 4:24pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[BUG] When modifying a note during studying, type in field is not updated Help	4	208	May 1, 2023
Python checksum != rust checksum? Help	23	1329	May 1, 2023
Updated SQLite but doesn't sync changes to AnkiWeb Syncing & AnkiWeb	3	224	February 9, 2024
Database documentation Development	6	4748	May 1, 2023
Anki2 database questions Development	4	558	July 12, 2023

Diffing .anki2 files since schema v15

Related topics