Many text formats aspire to simplicity, with the belief that data models are an "implementation detail". My inclination is to err in that direction, because I fear that trying to start discussion by agreeing on a serialized data model leads to this series of unfortunate reasoning:
- Let's agree on a data model before we agree on syntax
- Great, we have a data model, how do we serialize it?
- Why invent another data serialization format; why don't we use something like JSON or XML?
- Result: a large, complicated data hierarchy that is difficult/impossible to author with a text editor, and difficult to spot errors with human inspection.
Having seen the development of many "Document Object Models (DOMs)" over the years (including working closely with the folks defining a document object model for MediaWiki markup), I've been hesitant to tackle such a complicated issue so early in the development of a new format that seems so clear in my mind. However, I've come to realize that my ideas about the data that is "important" (or "interesting" to me) and the data that is "unimportant" (or "uninteresting" to me) may be very important to others, and I want to build consensus around my idea of what ABIF can be. After mulling over the discussions in several issues here (particularly issues #6 and #14 regarding the metadata format), it occurs to me that a core data model may be helpful.
Here's my take on a core data structure that ABIF files should resolve to, expressed as a partial JSON file (NOTE: this comment is subject to revision):
{
"metadata":
[
{
<key-1>: <value-1>,
<key-2>: <value-2>,
<key-3>: <value-3>,
...
<key-n>: <value-n>
}
],
"candidates":
[
{
<candidate-id-1>: <candidate-information-1>,
<candidate-id-2>: <candidate-information-2>,
<candidate-id-3>: <candidate-information-3>,
...
<candidate-id-n>: <candidate-information-n>
}
],
"ballot_bundles":
[
{
<ballot-bundle-id-1>: <ballot-bundle-1>,
<ballot-bundle-id-2>: <ballot-bundle-2>,
<ballot-bundle-id-3>: <ballot-bundle-3>,
...
<ballot-bundle-id-n>: <ballot-bundle-n>
}
[
Expressing this as JSON is tricky, because JSON dictionaries are unordered key-value pairs, and there's not a great way to stipulate "order matters!". Moreover, I would like to make sure it's possible to build the data structure above using a single-pass parser. That's going to have all sorts of really tricky implications. I think we can pull it off if we have keep a shared data model in mind, but we're going to have to do things that make people who love beautiful context-free grammars (CFGs) cringe.
Many text formats aspire to simplicity, with the belief that data models are an "implementation detail". My inclination is to err in that direction, because I fear that trying to start discussion by agreeing on a serialized data model leads to this series of unfortunate reasoning:
Having seen the development of many "Document Object Models (DOMs)" over the years (including working closely with the folks defining a document object model for MediaWiki markup), I've been hesitant to tackle such a complicated issue so early in the development of a new format that seems so clear in my mind. However, I've come to realize that my ideas about the data that is "important" (or "interesting" to me) and the data that is "unimportant" (or "uninteresting" to me) may be very important to others, and I want to build consensus around my idea of what ABIF can be. After mulling over the discussions in several issues here (particularly issues #6 and #14 regarding the metadata format), it occurs to me that a core data model may be helpful.
Here's my take on a core data structure that ABIF files should resolve to, expressed as a partial JSON file (NOTE: this comment is subject to revision):
Expressing this as JSON is tricky, because JSON dictionaries are unordered key-value pairs, and there's not a great way to stipulate "order matters!". Moreover, I would like to make sure it's possible to build the data structure above using a single-pass parser. That's going to have all sorts of really tricky implications. I think we can pull it off if we have keep a shared data model in mind, but we're going to have to do things that make people who love beautiful context-free grammars (CFGs) cringe.