FAIR Data Principles & Digital Twins
IOTICSIOTICS
In plays, films, books, and music, there is often a key moment where everything in the story comes together. Software and data engineers know these moments, when after days of work you get everything together in your code and data so you can finally write and run the âifâ statement. A version of the âifâ statement might be, if river.level > X and rainfall.forecast > Y, thenâŠ. When it comes to rainfall and rivers, the âthenâ part could involve millions of pounds of damage, weeks of transport disruption, and possible loss of life. We will use this flooding algorithm to explore how fair data principles and digital twins could interact and cooperate.
'In a FAIR world, computers can find and understand data, but we still canât program them with that âifâ statement when the data is in large datasets.' -IOTICS
The âifâ statement is our first kind of data interaction. A computer algorithm brings two pieces of data together so they can be compared, and some insight can be gained. But what those pieces of data are and how they get to the âifâ statement is more complex than you might think.
Thereâs no search engine for dataâ: not publicly and rarely within enterprises. There are attempts at searchability such as data.gov.uk, but they are intended for people, not algorithms. It is said that data scientists spend at least 50 percent of their time looking for dataâ rather than looking at data. This epic waste of time is because data is hidden, deliberately or unintentionally, in silos, in datasets, behind APIs, or in program-unfriendly formats, such as PDF. This is not findable by machines, but what if it was?
When it comes to access and interoperability, the two are linked. A computer may be able to find some data but may not be able to understand it. It would help interoperability if the date could have some metadata to indicate that the river level was measured in meters and the rainfall in millimeters, for example. We now have the find, access, and interoperate, and the data interaction in the âifâ statement is re-using that data for our new purpose.
This is the basis of the FAIR data principles, conceived by a consortium of leading scientists and organizations to ensure that scientific data sets could be found and used by machines, with minimal human intervention. FAIR stands for Findable, Accessible, Interoperable, and Reusable â and itâs going mainstream.
In a FAIR world, computers can find and understand data, but we still canât program them with that âifâ statement when the data is in large datasets. In our flooding scenario, what our algorithm also needs is the river level at a specific location and the rainfall forecast at a different location, probably well upstream from the place where the flood is likely to occur. So, even if our algorithm can find the right dataset, it still needs to know how to run a query against the dataset to find the data it wants.
There is an element of granularity of the data that is important - and thatâs where digital twins come in. Digital twins are a virtualization of an assetâs data. The asset itself is a useful level of granularity here. Our algorithm needs to choose the appropriate rainfall forecasts and required river levels. Metadata about the assets beyond their location might also be useful. Knowing who operated them would help our algorithm assign weight to the readings if some operatorsâ data proved more reliable and accurate than others. Having some provenance of the data as actually coming from that twin and the twin really being the one operated by the Environment Agency, for example, would build trust in the output of our algorithm. The exchange of metadata between twins to establish trust and access is our second data interaction.
The final step to get to the âifâ statement is about timeliness. Homeowners wonât appreciate being told on Wednesday that a flood would occur on Tuesday when their houses are already knee-deep in muddy water. The data needs to flow between the twins and the algorithm as close to real time as possible so that the predictions are available in a timely way. This is not just important in our flooding scenario; itâs important in business, where latency between something happening and the business reacting to it can cost millions.
We have reached a point where we have an algorithm running, exchanging data with digital twins. But what does the algorithm do in the âthenâ part of the âifâ equation? What if it could share the data back with other digital twins, or create new twins of the likely flood locations and have them share into a growing ecosystem of cooperative twins?
If the algorithm has its own digital twin, it simplifies the model where everything is a twin and creates symmetry. The twin of the algorithm interacts with the twins of the data sources. Data interactions are twin interactions and twin interactions are the exchange of data and metadata between twins. If fair data principles and twins could interact and cooperate, imagine what transformations could be achieved.
The Most Comprehensive IoT Newsletter for Enterprises
Showcasing the highest-quality content, resources, news, and insights from the world of the Internet of Things. Subscribe to remain informed and up-to-date.
New Podcast Episode
Related Articles