Data Mining Work Flow Ontology

eProPlan page moved to:

A major challenge for third generation data mining and knowledge discovery systems is the integration of different data mining tools and services for data understanding, data integration, data preprocessing, data mining, evaluation and deployment, which are distributed across the network of computer systems. In e-Lico WP6 we are building an intelligent discovery assistant (IDA) that is intended to support end-users in the difficult and time consuming task of  designing KDD-Workflows out of these distributed services. The assistant will support the user in checking the correctness of workflows, understanding the goals behind given workflows, enumeration of AI planner generated workflow completions, storage, retrieval, adaptation and repair of previous workflows. It should also be an open easy extendable system. This is reached by basing the system on a data mining ontology (DMO) in which all the services (operators) together with their in-/output, conditions and effects are described.

This approach is described in:

The DMO for planning is divided into several parts:

These ontologies are developed using Protégé 4.0 (build 111 or higher).

To use Protégé 4.0 (build 111 or higher) for planning we are developing the eProPlan plug-in (To use a Protégé 4.0 Plugin, you need to first install Protégé and then simply place the jar-file you get via the links below into the folder named “Plugin” inside the protégé directory).

eProplan movies:

Comments, enhancement proposals and bug reports can be submitted in our bug-tracker.


Updated at 22.March.2010eProPlan-I: A new reasoner plug-in for Protégé 4.0 combining FacT++ for DL-Reasoning with the XSB based f-Logic system Flora2 for Instance Reasoning (incl. SWRL-Rules). To use (by selecting it from the Reasoners menu from Protege) this reasonert:

  1. You must install XSB Version 3.1 (Sources and Binary for Windows) or 3.2 (Sources and Binary for MacOSX) and
  2. XSB (version 3.1 as well as 3.2) has a problem converting negative reals to strings, if you wana use negative real values in data properties you have to store this patched string.P in $XSB/syslib/ and run make inside this directory (or run makexsb from $XSB/build after replacing string.P with our version).
  3. You must install Flora2 version 0.95 (Androcymbium). (Make shure that the shell-script runflora is working,  we are using it.).
  4. Finally go to Protege Preferences->eProPlan-I  and set the path to Flora 2 directory and the path to a temporary directory with r/w/x rights.

The plugin uses FaCT++ as reasoner for the TBOX inferences, i.e to reason about concept-subsumption from the ontology. Flora-2 does the instance reasoning. It can infer concept membership based on: Sup-/Super-concept relations, Concept definitions, Domain and Range Restrictions of Properties, and SWRL-Rules that conclude on concepts.
It can infer Properties based on: Sub-Properties, Property characteristics e.g. transitive, and SWRL-Rules that conclude on Properties. It does not (not even in the simplest form): Propagate constrains along properties, e.g. from C :< P only R, C(a), P(a,b) it does not infer R(b),  Reasoning by case. It always treats differently named individuals as distinct. Due to XSB/Flora’s tabling it can evaluate a lot of rules, a "normal" prolog interpretation would get lost in an infinite recursion. In principle it reasons about both negation as "complement"-membership of concepts and properties and negation as failure, however Protégé 4.0 SWRL Editor doesn’t allow you to enter such rules.


We provide a view with a console that displays all the Flora2 commands and results. Also the user can write directly the commands to the console.

Figure 1 : The eProPlanI Console Window to inspect and test-call the reasoner/planner


The user can choose to either install XSB and Flora2 locally or to install them on a server. Before using the eProPlanI reasoner she should set into eProPlanI Preferences tab the path to both Flora2 and XSB. Also she can change the planner and compiler settings.

Figure 2 : eProPlan Preferences for the local version

Figure 3 : eProPlan Preferences for the server version


Updated at 22.March.2010eProPlan-O:  The basic plugin to edit operator conditions and effects (short description).

This Protégé 4.0 (build 111+) plug-in provides a special Class View named "eProPlan: Operator Conditions & Effects" which allows the user to edit the "condition" and the "effect" annotations of subclasses of the Operator class. These are the basis of the STRIPS-like (but we do not change the world, we only extend it with new objects, but that is not really a restriction, you can make a new possible world for every operator
applications) planning done by eProPlan.
Operator Conditions and Effects are like the normal Protégé 4.0 SWRL-rules with some syntax extensions and some restrictions resulting from the purpose of these rules.

Figure 1 : The eProPlanO Tab to Model Operator

Use of concept expresions: In premisses of Conditions and Effects you can use not only concept names as one-place predicates, but also concept expressions enclosed in "[]", i.e. an atom using a concept expression
looks like [concept-expresion](?Var), e.g.
[DataTable and
(targetAttribute exactly 1 Attribute) and
(inputAttribute min 1 Attribute) and
(targetColumn only (DataColumn and columnHasType only (Scalar or Categorial))) and
(inputColumn only (DataColumn and columnHasType only (Scalar or Categorial)))

Use of negation-as-failure not(atom-conjunction): In premisses of Conditions and Effects you can use negation-as-failure, e.g. not(MissingValueFreeDataTable(?D), ScaledTable(?D)), i.e. the atoms in- side must be enclosed in "". (a not used inside concept expresions is the normal DL complement-reasoning, not negation-as-failure). In the conclusion of a condition new(?this), OperatorName(?this) and all required inputs (i.e. uses and sub-properties, parameter and sub-properties and simpleParameter and subproperties) of this operator must appear. The first argument of these inputs must be ?this, the second argument must be a variable bound by
the premisses (i.e. occur in an atom there, but not only inside a negation-as-failure), e.g. uses(?this,?D).
new(?New) is a built-in that generates a new unique individual and returns it in ?New.

In the premisses of an effect OperatorName(?this) must occur. In the conclusion of an effect not only new(?New), but also newFor(?New,?Old), copy(?New,?Old,atom-conjunction), copyComplex(?New,?Old,<?V1,...,?Vn>) may occur. All variables in an atom in the conclusion must be bound by the premisses or as ?New-variable in any previous new/copy-built-in. An atom without ?this or any ?New-variable from any previous new/copy-built-in is NOT allowed, i.e. we do not allow changing the previous-world, but you can re-use (MetaData) parts of the previous-world.

newFor(?New,?Old) is a built-in that for EACH different binding of ?Old (within ?this Operator) a new instance ?New is generated.

copy(?New,?Old,atom-conjunction) is a built-in that for EACH different binding of ?Old (within ?this Operator) a new instance ?New is generated and every thing that is STORED (not inferred) for ?Old and NOT
matches any atom in the conjunction is copied (?Old should be one argment in each atom, the other could be ?_ or bound before).
copyComplex(?New,?Old,<?I1,...,?In>) is a build-in that or EACH different binding of ?Old (within ?this Operator) generates a new instance ?New and for each object ?I related to ?Old via a subProperty of complexObjectPart(?Old,?I) a new ?NI is generated as well, except for those ?I, that are member in <?I1,...,?In> (for every binding of <?I1,...,?In> for ?Old in ?this). Everything STORED (not inferred) for ?Old or a copied ?I is copied to ?New or the corresponding new ?NI. Everything that is stored for any of the <?I1,...,?In> is not copied (be the other argument in a property be ?Old, something different or an ?I not in <?I1,...,?In>).
?_ is the always different anonymous variable (you may know from Prolog as _), that therefore can only be used where unbound variales are allowed, i.e. inside premisses and as the 2nd argument in a property in copy.
We extended the SWRL rule checkers and implemented two new ones: one for the conditions and one for the effects. The editors also have support for autocompletion suggesting to the user what to type next.

Added at 24.Feb.2010

Figure 2: The eProPlanO-Edit-Dialog for Operator Conditions & Effects

eProPlan-O tab layout for download here.


Updated at 22.March.2010eProPlan-M: A plug-in to edit the task/method decompositions that are used by our HTN-Planning approach.

To do planning with HTNs we have to model a set of tasks: the end-user chooses one for planning (indirectly via choosing a goal), when he presses the plan button in eProPlan-P.
Each task has a set of methods that can solve the task. In the Ontology this is modeled via the objectProperty solvedBy. This plugin offers the Task/Method decomposition view to allow easy editing.
Each method has a condition (editable via condition/contribution View, same syntax as operator condition premisses, no conclusion), that have to be satisfied to choose a method. Applying a method means decomposing it into a sequence of subtasks or operator(-application)s. This is modelled with the step1 , ..., stepn subObject property of decomposedTo. This plugin offers the Task/Method decomposition view to allow easy editing. This is easiest understood as a very powerful grammar (we have first-order logic conditions on grammar rules and parameter passing, editable in the Method Bindings View, so we have a turing-machine equivalent grammar formalism):

  • tasks are non-terminal symbols,
  • operators are terminal symbols,
  • methods are the grammar rules,
  • the plans (workflows) are the words in the language specified by this grammar,
  • planning is enumerating all words in the language

Added at 24.Feb.2010

Figure 1: The eProPlanM Tab to Model Tasks and Methods for HTN planning

eProPlan-M tab layout for download here.


eProPlan-P: The view consists of two Protégé 4.0 Individual Views, one showing all applicable Operators per IO-Object (Data, Model, Report), the other showing the current DM Workflow. It also allows to select one of the applicable
Operators and apply it. This plugin depends on the eProPlanI plugin since it makes several calls to the Flora2 knowledge base (getting the applicable operators, applying an operator, planning).

Applicable operators
An operator has certain conditions and effects. Both can be inherited from superclasses. Also each operator has a type. The conditions together with the type can specify if an operator is applicable or not. Therefore an operator is applicable if its type is basic (no further decomposable) and its conditions can be fulfilled. The applicable operators view uses in the background a compiler which compiles the operators conditions and effects into Flora2 thus producing the file op-defs.flr. This compilation is done every time a change is made either in the conditions or effects or in the operator’s type an the user calls the reasoner or the "Infer" button or the "Plan" button. The file is loaded into Flora2 and then the applicableOp predicate is called. The view displays a tree consisting of three levels:

  • The first level of the tree contains all the individuals whose types are subclasses of IOObject, more precisely all the individuals which can be used by an Operator.
  • The second level of the tree consists of all applicable operators, more precisely only basic operators whose conditions are satisfied (there is a set of individuals which satisfy all the conditions).
  • The third level of the tree represents a parameter of the operator - the object properties : uses or sub properties of uses, produces and sub properties of produces or parameter and its sub properties, and the data property simpleParameter and it’s sub properties. Which parameters exactly an operator has is inferred by the condition of the operator. Each solution of the premisses of the condition produces one parameter list corresponding to the conclusion of the condition.

On the top the view there is a toolbar with three buttons used to fill the information into the view as follows:

  • The "Infer" button compiles the operators information or the ontology if any change was made the last compilation into Flora2 and refreshes the information from the tree. The button is always enabled since if the ontology is changed it will be recompiled when "Infer" is called. This only works, if eProPlan-I is selected as the current reasoner.
  • The "Apply" button is used to apply an operator with a certain parameter. In order to be able to apply an operator the user has to select its parameter list (which is on the third level of the tree), otherwise the button is disabled. When the button is pressed the operator is applied and it adds the new produced individuals in the ontology and their applicable operators.
  • The "Plan" button is enabled when eProPlan-I reasoner is selected and the ontology is classified. When clicking on the "Plan" button a dialog is displayed containing a tree with the available task instances. The tree contains only those individuals which are connected to an individual goal through the object property useTask. The user needs to select one of the individuals from the tree (on the second level of the tree) and the "Ok" button is enabled. If "Ok" is pressed the HTN AI-planner is called and the plan is displayed in the Plan Graph view.

Added at 24.Feb.2010

Plan Graph View
This view displays the plan as an workflow-graph. It consists of nodes (with labels and icons) – either Operator or IO-Object individuals connected by edges – properties that connect nodes. The top of the view has a toolbar with buttons that can be used to zoom in/out or to delete a node/edge from the workflow. Figure 7 displays the eProPlan-P tab which contains both of the views described before.

eProPlan-P tab layout for download here.


Updated at 22.March.2010eProPlan-G: A plug-in that allows to specify the Goal of the DM Workflow and the Data Tables to be used. At the moment goals have to be asserted with the normal Protege methods for asserting individuals. e.g. by generating an individual my_goal asserting that it is of type PredictiveModeling and that the planner should use the task my_task of type Demo, i.e. the following facts:

PredictiveModeling(my_goal), useTask(my_goal, my_task), Demo(my_task).


The plugin consists of two Protégé 4.0 Individual Views : the Data Table View and the Select table view.

Data Table View

The Data Table view consists of two parts: the top part which displays the available data set from the RapidI repository and calls the Data Table service. The user should choose first an individual goal to which the new data set will belong. He can choose from the first drop down list an object property that relates goals to data sets. Also he must selected one of the available individuals from the second drop down list. There are three ways to acquire the data set descriptors (metadata): choosing a data set from the repository, specifying a link to a data set or also by applying an operator on the initial data set. The last one is not available yet in our
implementation. By pressing the URL button the user is asked for a username and password (at the moment the RapidI service for repository browsing needs authentication). If the authentication succeeds a new dialog displays the structure of the repository.


Added at 24.Feb.2010Added at 24.Feb.2010

Figure 1: The authentication window & the repository structure

If the user doesn’t press the URL button and just write in the text field a valid URL then the data from this URL will be analyzed. When the Analyze button is pressed a table is displayed with all the information necessary to describe the data table. The data table obtained from calling the web-service is displayed at the bottom of the view. The data table is in fact an individual with several property characteristics. The user can edit some of the columns such as the type of the columns, the role of the attributes and also can replace the current attribute with others belonging to the same data table format. When the user presses "Store" the obtained set of individuals with their characteristics (with the modifications made after editing) is added in the current active ontology. Also axioms for asserting the format of the table and the attribute of this format are added.

Added at 24.Feb.2010

Figure 2: The eProPlan Data Table Analyser Interface

Select Table View
It displays the contents of each table. Consists of a Protege Individual View which is available each time the user selects an individual of type Data Table. Therefore it helps the use to explore the contents of several tables and compare them. Opposed to the previous view the user can only view the table and not edit it.


eProPlan-G tab layout for download here.


InstanceGraph Protege plug-in. This is just a small experiment for selecting a Graph Library for our eProPlan-P's Workflow view, but it turned out to be quite useful for individual inspection. This uses the jGraph - A Java Open Source Graph Drawing Component, but also the jGraph LayoutPro which is not Open Source (but commercial for commercials and free for research). Therefore this plug-in only contains a Dummy Layout (random placement), but you automatically get jGraph LayoutPro's Hierarchical Layout, if you replace the jgraphlayout.jar inside our ch.uzh.ifi.ddis.instgraph_1.0.0.jar with the jgraphlayout.jar demo version or a licensed version you acquired from jGraph.