A Guide to IoT Rules Engines: Decision Trees
WaylayWaylay
A popular way of capturing the complexity of conditional rules is by using decision trees, which are graphs that use a branching method to illustrate every possible outcome of a decision.
Drools, mostly known for its rules engine based on forward-chaining, has an extension to integrate with decision tables, using an excel sheet in combination with snippets of embedded code to accommodate any additional logic or required thresholds.
Decision trees are useful when the number of states per each variable is limited (such as binary YES/NO states) but can become overwhelming when the number of states increases. This is because the depth of the tree grows linearly with the number of variables, but the number of branches grows exponentially with the number of states.
With 6 Boolean variables (True or False), there are 2^2^6 = 2^64 = 18,446,744,073,709,551,616 distinct decision trees (in literature, often referred to as the “hypothesis space for decision trees” problem).
Majority voting isn't possible, unless we branch even further, where multiple distinct outcomes are also part of the tree structure. Conditional executions should come out of the box. As the name suggests, decision trees are all about conditional executions.
Decision trees are never implemented as such in an IoT context. In expert systems, where decisions are outcomes of Q&A scenarios, logic would follow conditional execution, as new data (questions) are served to the decision tree engine. In an IoT context, we feed rules engines with data and expect decisions to come back as a result. In that case, we talk about decision tables, which means we feed data into the decision tables and results (decisions) come back at once.
Decision trees are easily interpretable which makes them attractive for Applications where this capability is essential (such as healthcare, among others).
Decision trees use a white box model. Important insights can be generated based on domain experts describing a situation and their preferences for outcomes. But decision trees are unstable, meaning that a small change in the data can lead to a big change in the structure of the optimal decision tree.
They are also often relatively inaccurate. Calculations can get very complex, particularly if many values are uncertain and/or if many outcomes are linked.
Decision trees cannot model uncertainty and utility functions, unless—just as with time information—we add these within the tree as decision nodes, which complicates decision tables even further.
Decision trees are easy to understand and interpret. People are able to understand decision tree models after just a brief explanation. Still, decisions cannot be seen or inspected once the rule is instantiated and are only represented as labeled “arrows” in the graph during the design phase.
When implemented as decision tables, the explicability drops further as each row in the table is a rule with each column in that row being either a condition or an action for that rule. This results in the total sequence being unclear—no overall picture is given by decision tables.
Decision trees are a popular way of reigning in logical complexity. Are they useful for IoT systems? Are they scalable (no)? Let's learn more.
Decision trees are mostly used for graphical knowledge representation. It's extremely hard to build a rules engine with decision trees and even harder to build applications on top of it. They're hard to extend with any third-party systems. Also, any small change in the training data can lead to a big change in the structure of the optimal decision tree.
Applying the same decision tree rule across multiple devices in the IoT domain is close to impossible, as most of the decision trees implement rules by mixing logic residing in decision tables with actions defined separately in code, making it extremely difficult to manage the complete process.
Short answer: no. Decision tree rules are stateless, which means that, in theory, it should be easy to run multiple rules in parallel. However, you cannot, within one instance of a rule, distribute the load to different processes while executing that one particular rule. The fact that the depth of the tree grows linearly with the number of variables but the number of branches grows exponentially with the number of states makes decision trees hard, if not impossible, to scale. Calculations can get very complex, particularly if many values are uncertain and/or if many outcomes are linked.
This article was originally published on Waylay's blog.
The Most Comprehensive IoT Newsletter for Enterprises
Showcasing the highest-quality content, resources, news, and insights from the world of the Internet of Things. Subscribe to remain informed and up-to-date.
New Podcast Episode
Related Articles