Hi, lines and potentially arrows are on their way as part of the upcoming annotation improvements. That will allow you to label completely unlabeled images, but the labels would still be part of the image.
I think that supporting the use case you mention, where the image does not contain the labels and the prompt and answer are rendered separately is not something that’s feasible to do with the current implementation. Consider that IO notes are effectively just an image cloze, and regular clozes also do not support this kind of separation between the context they are in and the prompt location. In the text world, the solution here would be using a basic-like note type that splits the question and answer, and I think that also continues to be the most viable way to implement image-based prompts like this, where you have one image field and multiple numbered fields for each label, conditionally generating separate cards.
Generally speaking, as either using prelabelled images, or adding the labels to the image yourself covers most use cases, it’s a bit hard to make the case for a completely different prompting approach like this. Consider also that this solution does have disadvantages like the answer not being visible at a glance and more visual parsing needed.
If the visual clutter part is more secondary and the main thing you need is type-in-the-answer support, that’s something I am hoping to add in the new IO add on I’m working on that will be based on the native implementation.
(I might be able to explore your exact use case as well in the future, but I have to focus on the core for now and getting the add-on out, as it’s already been too long)