Are Data Historians Holding Back Industry 4.0?

Quix

- Last Updated: February 6, 2025

Quix

- Last Updated: February 6, 2025

People have been discussing Industry 4.0 for over a decade, but now it’s finally getting serious. The recent AI hype could be credited with reigniting the conversation.

Thanks to extensive publicity of OpenAI, it’s commonly known that the more data you feed AI, the smarter it becomes. And who are the gatekeepers of data in the industrial landscape? Data historians.

The data historian market is aware of this and is rapidly modernizing. Yet, it’s still cumbersome to move data from offline OT systems into online IT systems.

In this article, we will look at the weaknesses of legacy data historians and how the market is adapting to overcome these weaknesses. We will also look at whether data historians are always the best source of OT data, and when it's better to bypass them.

Adapt or Be Left Behind

So far, it’s been easy to tune out the Industry 4.0 hype. There’s been a lot of hot air, but we have also seen early adopters make eye-opening efficiency gains.

The trickle-down to small and medium-sized enterprises will soon become a flood. The SMEs that are still doing things “the way it’s always been done” will lose out because more flexible competitors will become much more efficient.

Some early adopters have figured out how to reliably and efficiently get their OT data into IT systems. Often this has involved building their complex proprietary solutions.

SMEs can’t afford to build their software, so the data historian market evolved to make this transition easier. Unfortunately, most SMEs are still stuck with legacy data historians. This is a big problem.

Legacy Data Historians

Before criticizing legacy data historians, let’s start on a positive note. Data historians have always had a “very particular set of skills”. They are designed to record, store, and retrieve high-frequency time-series data from industrial control systems like SCADA, PLCs, and DCS.

Plant personnel can then review that data for live monitoring or analyzing historical trends. They can do things that regular databases can’t do, such as:

Parsing domain-specific data: OT systems like PLCs and SCADA systems have very specific ways of formatting and structuring data. Data historians are purpose-built for handling this kind of data and they’re often integrated with asset frameworks that map raw sensor signals to meaningful operational models. This makes it easier to interpret and use the data effectively.

Interfacing with older systems: Many industrial facilities rely on proprietary systems that are tightly coupled with data historians from the same vendor. This makes data historians invaluable for maintaining compatibility in environments with aging infrastructure while still providing access to critical operational data.

Tailoring the user experience to industrial engineers: Data historians often offer a comprehensive suite of features and are also designed to be intuitive for industrial engineers so they can get operational insights without advanced programming skills or IT support.

However, when it comes to Industry 4.0, they have a lot of weaknesses too, which is why the market is adapting.

State of the Data Historian Market

There are many types of data historians, but for this article, I’ll group the product ecosystem into “traditional” and "ISV." If you’re stuck with a legacy data historian it’s more likely to be from a traditional vendor—although some ISVs have plenty of legacy versions floating around too.

Traditional

Traditional data historians are more tightly coupled with ‌specific hardware brands such as Siemens or Allen Bradley (Rockwell). They often rely on proprietary data formats, lack robust APIs, and require significant effort to integrate with modern IT systems and cloud-based platforms.

Some “traditional” vendors are modernizing their products to meet Industry 4.0 demands, with features like cloud integration and APIs. However, the older legacy versions (still widely in use) lack this interoperability.

Vendors in this category include:

Rockwell Automation: FactoryTalk Historian (now leveraging AVEVA PI).
GE Vernova: Proficy Historian.
Siemens: SIMATIC PCS 7 Process Historian.
ABB: ABB Ability™ Symphony® Plus Historian.
Honeywell: Honeywell Batch Historian.
Emerson: DeltaV Batch Historian.

Note that this categorization isn’t fully precise because these data historian software offerings are often sold and function independently of their hardware lines.

Independent Software Vendor (ISVs)

The data historians marketed by ISVs tend to have a different design philosophy because they’re not tied to a specific OT system. They can adapt faster to changes in the industry and tend to prioritize flexibility, openness, and scalability.

Nowadays, they claim to offer broad interoperability, real-time data processing, and cloud integration which makes them more naturally aligned with the goals of Industry 4.0.

Examples of ISVs include:

AVEVA: AVEVA Historian (integrating the older OSIsoft PI system)
dataPARC: PARCserver.
Canary Labs: Canary Historian.
Factry: Factry Historian.
eLynx Technologies: eLynx Data Historian.
Inductive Automation: Tag Historian Module.
Prosys OPC: Prosys OPC UA Historian.
VROC: DataHUB+.

Although some ISVs have been around for decades (e.g. OSIsoft and Canary), many newer ISVs have popped up to fill a growing demand for greater interoperability with cloud systems.

Challenges for Data Historian Vendors

Both traditional vendors and ISVs are offering more “modern” data historians to address a fundamental set of problems that come with older legacy systems.

Interoperability Between OT and IT Systems

Hubert Yoshida, former CTO of Hitachi Vantara, famously wrote that “OT is from Mars and IT is from Venus”—a perfect analogy for the disconnect between operational technology (OT) and information technology (IT). Data historians sit squarely in this divide.

The strengths of older data historians are also their weaknesses. They excel at capturing and organizing domain-specific data for OT systems, such as SCADA or PLCs, but often struggle to translate this data into formats usable by IT systems.

These systems require data in structured formats like Parquet, ORC, or JSON, optimized for distributed processing and advanced analytics. Getting data from a historian into such formats typically involves bespoke ETL (Extract, Transform, Load) pipelines—processes that are both time-consuming and resource-intensive.

While the broader software industry offers numerous ETL tools for IT-to-IT integrations, the OT world presents unique challenges. Data historians frequently use proprietary formats and interfaces, requiring custom integration for every specific implementation.

This lack of standardized interoperability is a significant obstacle to bridging OT and IT. The industry needs data historians who can seamlessly interface with both SCADA systems and modern IT platforms, eliminating the need for custom-built solutions.

As I said before if you can’t efficiently pipe clean, reliable, and consolidated OT data into IT systems, you won’t be able to see the benefits of modern data-driven solutions.

Data Cherry-Picking and Siloed Architectures

The limitations of legacy data historians often lead to a phenomenon known as "data cherry-picking."

Since accessing data from these systems can be complex—due to proprietary interfaces and a lack of modern APIs—users focus on the easiest-to-access datasets, ignoring potentially valuable information. This piecemeal approach limits the scope of analysis and hinders innovation.

Legacy historians also frequently silo the data by physical assets or production areas, storing it across separate servers rather than in a unified, searchable system.

This fragmented architecture stems from older historians' inability to handle the volume of data generated by modern industry. To avoid overwhelming these systems, organizations resort to distributing data across different physical storage locations.

Many legacy systems require users to request access through system administrators, who may restrict access or discourage intensive queries to prevent system crashes. High licensing fees, often tied to the volume of monitored data, further encourage a narrow focus, discouraging comprehensive data collection.

Legacy historians also lack robust support for metadata and context. Without a framework to establish relationships between data points (e.g., which sensor belongs to which machine), users struggle to form a holistic understanding of their systems.

Instead, they rely on what’s immediately accessible and understandable, leaving critical insights unexplored.

Data cherry-picking leads to uniform decisions and duplication of effort because everyone is working from a different dataset. If you can centralize data, you reduce the likelihood of cherry-picking.

A Path to Industry 4.0?

Not entirely. While modern data historians represent a significant leap forward, they still have a few weaknesses:

Skill Gaps

Although modern historians are more user-friendly than traditional systems, they remain niche technologies requiring expertise to implement and manage effectively.

Professionals tasked with deploying these systems often lack the necessary training. This shortage of skills and familiarity can lead to incomplete implementations or inefficient data pipelines, ultimately diminishing the benefits these systems should deliver.

Granularity Trade-Offs

Modern historians can handle high-frequency data better than their predecessors, but for extremely granular data, they may still fall short. Real-time systems like Quix are often better suited for such use cases, where data needs to be processed sooner.

Cost

Modern data historians' pricing models are often tied to the volume of data signals monitored or the number of devices integrated, leading to significant expenses.

Open-source alternatives and general-purpose time-series databases may offer more cost-effective solutions, albeit without domain-specific optimizations. However, there’s still reason to be optimistic.

Don’t Rely Exclusively on a Data Historian

Some companies augment their data historians with a general-purpose time series database. It’s also possible to bypass the historian altogether.

However, I’m not telling you to ditch your current historian. I’m just saying you’ll need other tools to get to Industry 4.0. Some of them can extend your current historian, others will work alongside it.

Let’s take a closer look at the latter scenario. In some cases, you might want to ingest data closer to the source instead of from the historian. It can be a lot cheaper because you can use open-source systems that are designed for processing high-velocity data.

To understand how this works, let’s look at how data gets from machine to historian.

Both traditional and modern data historians rarely connect directly to machines. Instead, the path from machine to data historian typically involves several intermediary systems:

Machine → OPC Server → SCADA System → Data Historian

The OPC server handles protocol translation, converting raw signals into a standardized format.
The SCADA system organizes these signals (or "tags") into meaningful structures that align with operational workflows.
The data historian performs further transformations, to place the data into a broader hierarchy or operational context.

Each layer in this chain adds value by refining, tagging, and contextualizing data from its raw format. However, there are still situations where it’s better to bypass parts of this chain.

Why Ingest Data Closer to the Source?

Each system in the chain often operates at a different sampling resolution, with granularity decreasing as data moves downstream. For example:

OPC servers may provide raw, high-frequency data at millisecond intervals.
SCADA systems aggregate this into lower-frequency data for operational purposes.
Historians may store aggregated data for long-term trends.

If your application requires high-resolution, real-time data—such as for vibration analysis, predictive maintenance, or equipment fault detection—pulling data directly from the OPC server is often the better choice. This avoids the loss of granularity introduced by downstream systems.

Ingesting data closer to the source also reduces latency and is generally cheaper. For instance, connecting directly to the OPC server bypasses licensing fees tied to data volumes stored in SCADA systems or historians.

It’s also cheaper to store the data because it is continuously aggregated in real-time rather than being stored in its most fine-grained form. In the IT world, this is called “shifting-left”.

Many Paths

There are many paths to Industry 4.0 and sometimes data historians get in the way, but you can go around them. Historians are great but their limitations can prevent companies from getting real-time insights and high-resolution analytics.

Data historians have improved, but they haven’t eliminated issues like granularity loss, proprietary complexity, and high costs. In many cases, the solution lies in rethinking how data is processed and analyzed—sometimes bypassing traditional pathways in favor of real-time capabilities.