Are Data Historians Holding Back Industry 4.0?
QuixQuix
People have been discussing Industry 4.0 for over a decade, but now it’s finally getting serious. The recent AI hype could be credited with reigniting the conversation.Â
Thanks to extensive publicity of OpenAI, it’s commonly known that the more data you feed AI, the smarter it becomes. And who are the gatekeepers of data in the industrial landscape? Data historians.
The data historian market is aware of this and is rapidly modernizing. Yet, it’s still cumbersome to move data from offline OT systems into online IT systems.
In this article, we will look at the weaknesses of legacy data historians and how the market is adapting to overcome these weaknesses. We will also look at whether data historians are always the best source of OT data, and when it's better to bypass them. Â
So far, it’s been easy to tune out the Industry 4.0 hype. There’s been a lot of hot air, but we have also seen early adopters make eye-opening efficiency gains.
The trickle-down to small and medium-sized enterprises will soon become a flood. The SMEs that are still doing things “the way it’s always been done” will lose out because more flexible competitors will become much more efficient.
Some early adopters have figured out how to reliably and efficiently get their OT data into IT systems. Often this has involved building their complex proprietary solutions.Â
SMEs can’t afford to build their software, so the data historian market evolved to make this transition easier. Unfortunately, most SMEs are still stuck with legacy data historians. This is a big problem.
Before criticizing legacy data historians, let’s start on a positive note. Data historians have always had a “very particular set of skills”. They are designed to record, store, and retrieve high-frequency time-series data from industrial control systems like SCADA, PLCs, and DCS.Â
Plant personnel can then review that data for live monitoring or analyzing historical trends. They can do things that regular databases can’t do, such as:
Parsing domain-specific data: OT systems like PLCs and SCADA systems have very specific ways of formatting and structuring data. Data historians are purpose-built for handling this kind of data and they’re often integrated with asset frameworks that map raw sensor signals to meaningful operational models. This makes it easier to interpret and use the data effectively.
Interfacing with older systems: Many industrial facilities rely on proprietary systems that are tightly coupled with data historians from the same vendor. This makes data historians invaluable for maintaining compatibility in environments with aging infrastructure while still providing access to critical operational data.
Tailoring the user experience to industrial engineers:Â Data historians often offer a comprehensive suite of features and are also designed to be intuitive for industrial engineers so they can get operational insights without advanced programming skills or IT support.
However, when it comes to Industry 4.0, they have a lot of weaknesses too, which is why the market is adapting.
There are many types of data historians, but for this article, I’ll group the product ecosystem into “traditional” and "ISV." If you’re stuck with a legacy data historian it’s more likely to be from a traditional vendor—although some ISVs have plenty of legacy versions floating around too.
Traditional data historians are more tightly coupled with ‌specific hardware brands such as Siemens or Allen Bradley (Rockwell). They often rely on proprietary data formats, lack robust APIs, and require significant effort to integrate with modern IT systems and cloud-based platforms.Â
Some “traditional” vendors are modernizing their products to meet Industry 4.0 demands, with features like cloud integration and APIs. However, the older legacy versions (still widely in use) lack this interoperability.
Vendors in this category include:
Note that this categorization isn’t fully precise because these data historian software offerings are often sold and function independently of their hardware lines.
The data historians marketed by ISVs tend to have a different design philosophy because they’re not tied to a specific OT system. They can adapt faster to changes in the industry and tend to prioritize flexibility, openness, and scalability.Â
Nowadays, they claim to offer broad interoperability, real-time data processing, and cloud integration which makes them more naturally aligned with the goals of Industry 4.0.
Examples of ISVs include:
Although some ISVs have been around for decades (e.g. OSIsoft and Canary), many newer ISVs have popped up to fill a growing demand for greater interoperability with cloud systems.
Both traditional vendors and ISVs are offering more “modern” data historians to address a fundamental set of problems that come with older legacy systems.
Hubert Yoshida, former CTO of Hitachi Vantara, famously wrote that “OT is from Mars and IT is from Venus”—a perfect analogy for the disconnect between operational technology (OT) and information technology (IT). Data historians sit squarely in this divide.
The strengths of older data historians are also their weaknesses. They excel at capturing and organizing domain-specific data for OT systems, such as SCADA or PLCs, but often struggle to translate this data into formats usable by IT systems.Â
These systems require data in structured formats like Parquet, ORC, or JSON, optimized for distributed processing and advanced analytics. Getting data from a historian into such formats typically involves bespoke ETL (Extract, Transform, Load) pipelines—processes that are both time-consuming and resource-intensive.
While the broader software industry offers numerous ETL tools for IT-to-IT integrations, the OT world presents unique challenges. Data historians frequently use proprietary formats and interfaces, requiring custom integration for every specific implementation.Â
This lack of standardized interoperability is a significant obstacle to bridging OT and IT. The industry needs data historians who can seamlessly interface with both SCADA systems and modern IT platforms, eliminating the need for custom-built solutions.
As I said before if you can’t efficiently pipe clean, reliable, and consolidated OT data into IT systems, you won’t be able to see the benefits of modern data-driven solutions.
The limitations of legacy data historians often lead to a phenomenon known as "data cherry-picking."Â
Since accessing data from these systems can be complex—due to proprietary interfaces and a lack of modern APIs—users focus on the easiest-to-access datasets, ignoring potentially valuable information. This piecemeal approach limits the scope of analysis and hinders innovation.
Legacy historians also frequently silo the data by physical assets or production areas, storing it across separate servers rather than in a unified, searchable system.Â
This fragmented architecture stems from older historians' inability to handle the volume of data generated by modern industry. To avoid overwhelming these systems, organizations resort to distributing data across different physical storage locations.
Many legacy systems require users to request access through system administrators, who may restrict access or discourage intensive queries to prevent system crashes. High licensing fees, often tied to the volume of monitored data, further encourage a narrow focus, discouraging comprehensive data collection.
Legacy historians also lack robust support for metadata and context. Without a framework to establish relationships between data points (e.g., which sensor belongs to which machine), users struggle to form a holistic understanding of their systems.Â
Instead, they rely on what’s immediately accessible and understandable, leaving critical insights unexplored.
Data cherry-picking leads to uniform decisions and duplication of effort because everyone is working from a different dataset. If you can centralize data, you reduce the likelihood of cherry-picking.
Not entirely. While modern data historians represent a significant leap forward, they still have a few weaknesses:
Although modern historians are more user-friendly than traditional systems, they remain niche technologies requiring expertise to implement and manage effectively.Â
Professionals tasked with deploying these systems often lack the necessary training. This shortage of skills and familiarity can lead to incomplete implementations or inefficient data pipelines, ultimately diminishing the benefits these systems should deliver.
Modern historians can handle high-frequency data better than their predecessors, but for extremely granular data, they may still fall short. Real-time systems like Quix are often better suited for such use cases, where data needs to be processed sooner.
Modern data historians' pricing models are often tied to the volume of data signals monitored or the number of devices integrated, leading to significant expenses.Â
Open-source alternatives and general-purpose time-series databases may offer more cost-effective solutions, albeit without domain-specific optimizations. However, there’s still reason to be optimistic.
Some companies augment their data historians with a general-purpose time series database. It’s also possible to bypass the historian altogether.
However, I’m not telling you to ditch your current historian. I’m just saying you’ll need other tools to get to Industry 4.0. Some of them can extend your current historian, others will work alongside it.
Let’s take a closer look at the latter scenario. In some cases, you might want to ingest data closer to the source instead of from the historian. It can be a lot cheaper because you can use open-source systems that are designed for processing high-velocity data.Â
To understand how this works, let’s look at how data gets from machine to historian.
Both traditional and modern data historians rarely connect directly to machines. Instead, the path from machine to data historian typically involves several intermediary systems:
Machine → OPC Server → SCADA System → Data Historian
Each layer in this chain adds value by refining, tagging, and contextualizing data from its raw format. However, there are still situations where it’s better to bypass parts of this chain.
Each system in the chain often operates at a different sampling resolution, with granularity decreasing as data moves downstream. For example:
If your application requires high-resolution, real-time data—such as for vibration analysis, predictive maintenance, or equipment fault detection—pulling data directly from the OPC server is often the better choice. This avoids the loss of granularity introduced by downstream systems.
Ingesting data closer to the source also reduces latency and is generally cheaper. For instance, connecting directly to the OPC server bypasses licensing fees tied to data volumes stored in SCADA systems or historians.
It’s also cheaper to store the data because it is continuously aggregated in real-time rather than being stored in its most fine-grained form. In the IT world, this is called “shifting-left”.
There are many paths to Industry 4.0 and sometimes data historians get in the way, but you can go around them. Historians are great but their limitations can prevent companies from getting real-time insights and high-resolution analytics.
Data historians have improved, but they haven’t eliminated issues like granularity loss, proprietary complexity, and high costs. In many cases, the solution lies in rethinking how data is processed and analyzed—sometimes bypassing traditional pathways in favor of real-time capabilities.
Â
The Most Comprehensive IoT Newsletter for Enterprises
Showcasing the highest-quality content, resources, news, and insights from the world of the Internet of Things. Subscribe to remain informed and up-to-date.
New Podcast Episode
Related Articles