A fundamental change is occurring in the way scientific data is created, published and analyzed. A number of different groups are feverishly working on semantic protocols that will allow scientific data to be seamlessly exchanged. This will increase the amount, quality and speed with which researchers can access data. Many of the ad hoc methods scientists must currently use will go away and be replaced by integrated tools like the OPeNDAP Data Connector. To get a sense for the impact compare the way data flows in a typical contemporary lab with the vision being worked on:
What Happens Now
------------------------------------
An instrument records the data
A technician downloads from the instrument, and ...
...preprocesses the raw data file
...cleans up and adds any special values, codes
...saves the data in a file with an ad hoc format
One or more other scientists/technicians process the data into one or more output products
(the output product might be in a standard format like HDF)
The user finds out about the data by any of a myriad non-standard ways
The user has to figure out somehow which data file they want
The user downloads (or orders) the data file
The user possibly has to parse or process the file somehow
The user opens in their analysis program
The user repeats the above steps for different analysis programs
The user can only output whatever the analysis programs support
The Future
------------------------------------
An instruments streams data to a database
The data is automatically catalogued by the site
The site catalogue is automatically published to various global catalogues
The user uses an integrated client (like the ODC) to search for data
The source database is published by various streaming server
The integrated client can browse, query and preview any data from any server
The integrated client can exchange data live with any analysis tool
The integrated client can re-publish data and output it to any generic format
Integrated client's like the OPeNDAP Data Connector are middleware that act as the lingua franca for scientists, allowing them to collect, analyze and publish data with greater ease than was ever possible before. In a sense they are kind of like Napster for Scientists. Ultimately the result will be an explosion in the quantity of scientific discoveries.