Developer Diary: The Future of Data Semantics

Developer Diary

Developer Diary · You Heard It Here First · 19 December 2003

The Future of Data Semantics

Broad-based interest in data semantics is growing. Semantics are the subject of several W3C standards and global language efforts. A typical example is the Dublin Core Metadata Initiative. This effort is well known among database specialists. The use of semantics and the interest in creating domain-specific systems that define data is leading to greater interoperability and access to information. Is it also showing the way towards artificial intelligence?

When semantics are used the definer creates symbols that describe other symbols. For example in a database a field may be unnamed. It is just "Field 1"; not much information there. If it called SST that is more helpful. In the context of oceanography it would probably be referring to sea surface temperature. If the field had a description attribute identifying it as "sea surface temperature" that would remove all doubt even without the context. If there is another attribute, units, with the value "degrees centigrade" there is even more information. But what is "units"? For an English speaker the meaning is obvious except to a small child in which case more explanation might be necessary. By repeating this process to a sufficient degree can intelligent awareness be created?

To answer this question consider metadata more closely. Just what is it? To me the most basic example of metadata is string length. Modern languages like Java and VB implement strings by prepending them with their length. It may seem obvious to a programmer today to implement a string by tagging it with its length, but this was not always the case. In the old days other conventional (non-metadata) methods were used. For example in the language C a string was and still is implemented by a delimiter, the value 0 (out of 256 possible values) indicating the end of the string. In this case a certain data value has a coded significance. This defective implementation which dates to 1970 when programming technique was in its infancy has had untold costs over the last 30 years. For one thing every string operation done in C, for example a copy, is slower than in modern languages because a search must be done for the delimiter. Another huge cost has been the bugs. Every time a programmer forgot to append a '0' to the end of a string a nasty, difficult-to-discover bug was created. It seems completely stupid, but many programmers spend their whole lives using this defective implementation. The moral is to see that even the simplest possible use of metadata is not obvious and the importance of separating data from control information is also non-obvious.

This is what semantics are: control information. In the beginning you start with the raw signal (the data) which in itself is meaningless. Then you add a control. By considering the control itself to be a signal you can add new controls on top of it and so on. Where does it end? Where is the meaning? The meaning comes with a relation. For example, when a human sees "unit", they relate that to their knowledge, "ah ha, I know what a unit is" and the meaning is created. The same thing happens within the computer. For example, if the data is being plotted by an automatic plotter, the plotter logic might be: if a "units" attribute is found with the data then label the axis with it. The computer may have no idea what "degrees centigrade" is, but that does not matter. What is important is that the computer knows what to do with it. The computer has made a relation.

The computer makes a relation the same way as the human: a control signal is matched to internal logic. This is reason to think that by creating semantics we are on the path to creating intelligence. In quality computers still lag far behind the life form: humans have millions of nerve endings, but computers have only hundreds of i/o gates, humans have billions of unique logic interconnections, but the most advanced computer programs have only millions. In time the balance may change.

An example of this change can be seen in the Lord of the Rings. In these movies the great battle scenes were created by using small agent programs that would try to act as individual warriors. These little programs would be replicated thousands of times, each one driving a single soldier image on the field of the cinema battle. Even though the programs were relatively simple the mass movements of the armies in battle taken from afar has a realistic and life-like look which is attributable to the agents. Each one reacts as best it can to the inputs around it. Is it being attacked? Is it attacking? Is it afraid? Does it see an enemy soldier? The semantics of battle are spelled out in the simplest terms yet that is enough to bring life to the screen. Will it one day bring life to the computer?

return to John Chamberlain's home · diary index

Developer Diary · info@johnchamberlain.com · bio · Revised 19 December 2003 · Pure Content