Wednesday, December 4, 2013

What about dates? Solved?

You might think this is solved in our semantic digitized world.
But as the title suggests, in my opinion it isn't, but could have been. And thus can be.

Did you started to digitize your old photo's from the prints or negatives? If so, did you started adding the Exif meta data?
Even if you didn't (yet), think about what you remember of where and when the photo has been made.

The Exif 2.2 specification gives us:
DateTimeOriginal
The date and time when the original image data was generated. For a DSC the date and time the picture was taken
are recorded. The format is "YYYY:MM:DD HH:MM:SS" with time shown in 24-hour format, and the date and time
separated by one blank character [20.H]. When the date and time are unknown, all the character spaces except
colons (":") may be filled with blank characters, or else the Interoperability field may be filled with blank characters.
The character string length is 20 bytes including NULL for termination. When the field is left blank, it is treated as
unknown. 
This seems a correct solution if you know at least what the year was. If you don't know the year exactly you're already in trouble.
But the tools we have? Just a small example of the one I use for most of my graphic stuff GraphicConverter 



Even if the specification contains some degree of flexibility not all our tools take this into account.

And as most users will, I figured out a personal workaround. Compatibility? None. Yek.

I really would like the people who write standards and (the ones who make software) to think about how our real data world looks like. Not only in our all digital world but also at the frontier of the old analog and the digital world.

The whole day long. How long is that? 24 or 48 hours?
A day takes 48 hours to travel around the world through the 34 time zones.

Now look again at the Exif date definition above. Unless the date tag is also associated with the geolocation tags (which gives us the time zone) this date definition doesn't tell us when exactly the photo was made. Even specified up to the second its precision is 24 hours. Yek again.

Thus a date without timezone information (direct or indirect) is a period of 48 hours.

Let me take a little sidestep here.
In some ontologies a "point" in time is defined. Although mathematically correct it is of little practical use. In my perception the only valid concept when talking about time are periods. It is the units you're currently using that define whether 1956 is point in time or not. When an ontology defines a "point" in time there is an assumption that there will never be a subdivision of the unit of time used to define the point. 3rd Yek.

In my post about information extraction I was essentially talking about entity identification.
And information extraction of dates?

As we saw above to increase the precision we need geographical information. But there is something more to it.

The Maya stela mentioned in the post gives us the birth of Lady K'atun-Ajaw Lady Namaan-Ajaw :
Source

tzik haab bolon pik lajchan winikhaab cha' tuun mih winal waklajun k'iin (spelling might vary)
Which is 9 Pih 12 K'atun 2 Tuns 0 Winal 16 K'in (Maya long count)
Using the famsi converter we obtain:
Friday, July 3, 674 CE
or
Sunday, July 5, 674 CE
or
Tuesday, July 7, 674 CE
or
... several other conversions are possible

Scientists do not agree on the correlation between the Maya calendar and our modern calendar (and that's fine, no Yek here). And even if they would have agreed now, will this be the case tomorrow?
In this example, as the geographic coordinates are well known, the precision is related to the various possible correlations.
If we want to talk about this day of birth we have to indicate a period and the range of correlations taken into account. It might 1 day if we take the most accepted correlation (in bold).

For the information extraction side of this example we should conserve the fact (9 Pih 12 K'atun 2 Tuns 0 Winal 16 K'in) and associate the derived information to calculate with it.

If you want the play a bit with calendars, the Swiss fourmilab has a nice tool for it.


To proleptic or to not proleptic.
Writing:
"On October 10, 1582 ..."
Where was the author?

It certainly wasn't in a catholic European country as this day never existed. They switched from the Julian to the Gregorian calendar where October 4 was followed by October 15, 1582.
Perhaps in the UK or one of its colonies (including the USA) where the switch only occurred in 1752.
The proleptic gregorian calendar doesn't take into account these missing days. You can find more on it in the wikipedia page including some examples of software using the proleptic calendar.


Modern programs/services can convert from one time system to another. So where are the remaining Yek's?

Almost all of the left over Yek's are related to the context in which a particular date reference was produced. In other words meta-data. And this holds as well for computer generated dates as for human crafted stuff.

Besides the geographic aspect, working with dates in our actual calendar is most of the time fine.
But ...

What do we (= human beings) use in our daily life as dates? The Second World War rings a bell?
You'll know probably that neither the beginning nor the end was at the same moment for all the countries involved.
So what if I'll ask my future OK-system: "What volcanoes erupted during WWII?
I'm convinced you can fill in the Yek's here.

"Yesterday" makes sense if I know when that was written, right? Else tomorrow never will be.

Please fill in your own examples:
....
....
....


Perhaps it is time to reach a helping hand to our upcoming AI's.

Imagine a timeline (not a calendar) with a unit of a second and the possibility to add a fractional part. It doesn't really matter if the unit is a second or some smaller unit as there will always be a fractional part.
Our next bigger unit would be the minute which is not decimal, so that doesn't seem to be a good choice.
Although the second was initially defined in relation to the duration of day, this has been superseded by an atomic measurement.

Why is the choice of an non-earth related unit important? If we want such a timeline to be applicable at the various time scales we're using (astronomical, geological, archeological, historical, ...) we quickly notice that doing so is not the best thing we can come up with. You do know that the earth is slowing down, don't you? And if you didn't, you do know now.

Handy would it also be (who talks like that?) when the timeline would contain only positive numbers. The origin should then be before the Big Bang. Let's say 100 billion years ago. This gives us a comfortable margin compared to the actual 13.8 estimated.

A timeline of 64-bit signed integers gives us room for another 192 billion years or so. We'll have some time ahead to think of a replacement system by then (using unsigned integers give us room for 484 billion years in the future, also fine).

Each calendar, or whatever time reference system is used can be calibrated on such a time line and a unique bidirectional convertor is needed for each system. These convertors use the specific knowledge of it's reference system and are independent of all the other convertors.


The image above comes from my post of a year ago in the Linked Open Vocabularies community which gives a little more technical details.

The first and only Woopah in this post.


Some decades ago (2 more precisely) I made such a system called DatExpert (a HyperCard extension). And oh what a wonderful and precise world that was.

Doing it again? With pleasure and the sooner the better.
But not as a lonely wolf anymore and certainly as open source.

If you feel also the need to move our digital world a bit and you're not the only one, I can set up a community for it.

Just re-share, +, and comment this post.

2013-12-05 : edit corrected some typos