Details about the Database Structure

The details about the Anki database structure are not described anywhere in the official Anki manual. There are some details on AnkiDroid’s GitHub Wiki but it describes only about the v2 scheduler and not the v3 scheduler. Another source of information is this page.

I realize that writing documentation about the database structure would not be a priority. But I think that writing the documentation is still worth the effort.

  • It would make it easier for developers to develop add-ons.
  • It would make it easier for semi-developers (like me) to modify add-ons to suit their needs if the main developer is unwilling to make the change (or has abandoned the project).

Also, even if the main developers are unwilling to take up the task, they can create a GitHub issue and start accepting PRs.

1 Like

Maybe we need to wait for the AnkiDroid wiki to be updated to V3.

I think that the correct place for this documentation is Anki’s GitHub page or ankiweb.net because it relates to all versions of Anki, not only AnkiDroid. Though, I would still be very much satisfied if the complete documentation becomes available on AnkiDroid’s GitHub page.

Secondly, I think that AnkiDroid developers would now have less incentive to update the documentation because they don’t have to reverse engineer Anki now (i.e. they can directly use Anki’s backend code).

I develop add-ons, so it would be useful to have it. But probably the official Anki has no incentive to create a database structure. Desktop Anki has few contributors, so it is mostly created by official Anki. And there are very few developers of Anki add-ons, and broken add-ons are rarely repaired by non-authors.

You are right. I also find that there are many inconsistencies between the database documents and the current behaviors. It costs much time to read the source code.

For example:

The interval is negative and in seconds for (re)learning cards in the v2 scheduler only. In the v3 scheduler, it is in days and positive.

1 Like

I’d rather invest time in providing a good API for accessing the data, than encouraging people to read and write to the database directly. That way, any changes to the underlying database format don’t lead to breakages, as we can freely change the underlying format while preserving the API, and it avoids issues like this one that popped up today: Newly added notes are immediately deleted (broken deck / collection?).

The interval is negative and in seconds for (re)learning cards in the v2 scheduler only. In the v3 scheduler, it is in days and positive.

The Rust code contains a fair bit of documentation. If you look up RevlogEntry:

/// Positive values are in days, negative values in seconds.
#[serde(rename = "ivl", deserialize_with = "deserialize_int_from_number")]
pub interval: i32,

v3 will use seconds or days depending on if it’s an intraday/interday learning card; v2 always used seconds. If you interpret the value based on the sign, the code should work for both v2 and v3.

3 Likes

I’d rather invest time in providing a good API for accessing the data, than encouraging people to read and write to the database directly.

So, what do you think about opening a GitHub issue to attract PRs (for providing APIs)?

I’m afraid I have a backlog of other tasks I need to get through first before I’ll have the bandwidth to review/consider further API changes/requests. Some low-hanging fruit in the future will be ensuring we expose ways to get at the typed versions of Deck and Notetype, and encourage people to migrate over to using them instead of the current nested dictionaries that lack static typing.