RapidMiner Extensions

RapidMiner is the open source data mining solution used within e-Lico for executing data mining operators and workflows. Within e-Lico, we have developed various extensions for RapidMiner.

Using the RapidMiner Community Extension, the user can share data mining workflows on the myexperiment.org portal.

The Image Mining Extension uses the image mining Web service provided by NHRF to execute image mining methods within RapidMiner.

The Market Basket Analysis Extension provides the Rapid Miner operators that build upon the association rule mining framework, but provide additional analytic capabilities beyond simple associations.

IDA Wizard

RapidMiner IDA Extension

This RapidMiner extension uses the Intelligent Discovery Assistant (IDA) developed at UZh to create data mining workflow plans directly in RapidMiner. Input objects can be loaded from the repository, and the plans can be immediately loaded in RapidMiner. The extension is still in beta stage.

Installation

To install, make sure you have a running and patched Flora and XSB (as described on the eProPlan page), download the Jar attached to this page, and copy to your rapidminer/lib/plugins folder. In RapidMiner, open the preferences, navigate to the e-LICO tab and specify the path to your Flora extension as well as an (arbitrary) temporary directory.

 

Usage

To use the extension, choose "Start IDA Wizard" from the Tools menu. You can then specify a Task and Goal and drag data sets onto the specific data requirements. When planning completes, you can open the generated process in RapidMiner and continue working with it as with any other process.

This video shows how you can use the IDA Wizard:

 

AttachmentSize
rapidminer-IDA-5.0.000.jar8.34 MB

Community Workflow Sharing Tool

Using the RapidMiner community extension, you can share your RapidMiner workflows with data mininers all over the word using the myexperiment.org portal. On the myExpeiment web site you can discuss data mining processes, exchange workflows, and meet data miners working on similar problems. Based on workflows shared on myexperiment.org, we will develop tools to assist you in designing new processes.

Using the Community Extension

To show the myExperiment browser, go to the View menu in RapidMiner and enable the MyExperiment Browser tab. I recommend to minimize the tab and open it whenever you need it. You can immediately start browsing the workflow repository, but in order to upload workflows, you need to make an account on myexperiment.org.

Browsing and Opening Workflows

Within RapidMiner, you can open all public RapidMiner workflows. You can recognize the RapidMiner workflows by looking at the "Workflow Type" entry on the upper right corner. To browse the workflow on the Web, click the "Browse" button to open a Web browser showing the respective site.

 

Browsing and downloading workflows on myexperiment.org

Sharing and Uploading your Workflows

To upload your current RapidMiner process to myexperiment.org, click on upload and fill in the following dialog:

Uploading a workflow to myexperiment.org

The description is automatically extracted from the process comment if it exists. Make sure to set the sharing permissions such that people can actually see your workflow.

Installation

To install the Community Extension, download the jar file and place it in the lib/plugins directory in your RapidMiner installation directory. Alternatively, and much easier, use the RapidMiner update mechanism by choosing "Update RapidMiner" in the Help menu.

AttachmentSize
rapidminer-Community-5.0.001.jar61.77 KB

R Package

The RapidMiner R extension allows integration of RapidMiner with the widely used open source statistics package R.

Eight R modelling methods are directly available as RapidMiner operators. RapidMiner ExampleSets can be directly used as input to R operators, and are internally converted to the table representation of R. Furthermore, the user can execute arbitrary R scripts as RapidMiner operators. To save the user from defining frequently used scripts over and over again for each process, they can define such re-usable scripts as custom RapidMiner operators.

For the analysis of bio-data, the R bioconductor packages are particularly relevant. These also operate with this extension, and an example process has been posted to myExperiment.

Intro Video

Download

The extension is available from the RapidMiner update server. Furthermore, more up-to-date development builds are posted on this site.

AttachmentSize
rapidminer-R Extension-5.1.000.jar2.27 MB

Image Mining Operators

With this RapidMiner extension, you can use image mining Web services provided by NHRF within RapidMiner.

Using the Image Mining Operators

The image mining plugin contains two classes of operators plus three helper operators:

List Images

Use this operator to specify a list of directories which will be scanned for images. The operator will then create an example set containing example (row) per image and three attributes (columns): "directory" (the directory from which the image was loaded), "file" (the filename of the image) and "id" (a generated id). After that, you can modify this information or filter examples before you proceed with uploading. A typical step would be to use the "directory" attribute as the label.

Upload Images

This operator takes an example set as produced by the "List Images" operator and uploads them to the server. An "image_reference" column is then appended to the example set. This reference can be used in all subsequent operators to reference or download the image.

Image Transformation

This group of operators perform image transformation methods on the server. The image is not downloaded. These operators require an "image_reference" column to exist. It will generate another image_reference column to reference the result.

Feature Extraction

These operators extract features from images, transforming it in to example sets (tables) that can then be further processed by RapidMiner. In principle, a feature extraction method turns each image into a table, but most of the time, this table will have only a single row. Still, this operator generates a collection of tables, which you can merge into one, using the regular "Append" operator.

Visualize Images

If you insert this operator, you will be presented with an image inspection dialog in the result view whenever double-clicking an example in a plot. This dialog will show the different (intermediate) versions of the image generated so far. More precisely, all "image_reference" attributes in the example set will be displayable.

An image mining process in RapidMiner

The image above shows a typical RapidMiner image mining process using the above-mentioned operators.

Installation

To install the Image Mining Extension, download the jar file and place it in the lib/plugins directory in your RapidMiner installation directory. Since it is still in the beta stage, it is not yet available from the RapidMiner update server.

AttachmentSize
rapidminer-ImageMining-1.0.jar2.75 MB

Market Basket Analysis Operators

This extension consists of the operators provided by PUT that implement 3 pattern mining algorithms for extended market basket analysis. These models build upon the association rule mining framework, but provide additional analytic capabilities beyond simple associations. The first model allows to mine transactional database for negative patterns represented as dissociation itemsets and dissociation rules. The second model of substitutive itemsets filters items and itemsets that can be used interchangeably as substitutes, i.e., itemsets that appear in the transactional database in very similar contexts. Finally, the third model of recommendation rules uses an additional itemset interestingness measure, namely coverage, to construct a set of recommended items using a greedy search procedure. All operators accept the collection of discovered frequent patterns as input data, and produce itemsets and rules as their outputs.  The Figure below shows an example of using proposed operators within a data mining workflow inside Rapid Miner.

Installation

To install the Market Basket Analysis Extension, download the jar file and place it in the lib/plugins directory in your RapidMiner installation directory.

 

AttachmentSize
rapidminer-MarketBasketAnalysisOperators-1.0.000.jar45.79 KB

Subgroup Discovery Operator

Two subgroup discovery algorithms are available in this Rapidminer extension: the SD algorithm and the CN2-SD algorithm . Both implementations are fully compatible with other RapidMiner operators. This was achieved by taking the existing RM Subgroup Discovery operator as a guideline and using the white paper provided by the Rapidminer support. Some further details are available here.

Installation: Put the jar files in "lib/plugins".

AttachmentSize
rapidminer-CN2-SD-1.0.0.jar23.06 KB
rapidminer-SD-1.0.0.jar23.1 KB

Covering Feature Selection Operator

The operator selects a small set of features enabling that a complete and consistent classifier for all examples may be constructed.

The operator supports construction of redundant feature sets so that more than one complete and consistent hypothesis may be generated and that outliers, examples enabling reduction of the size of the minimal set of features, may be detected and eliminated from the set of examples. The included help file describes theoretical basis of the implemented algorithm and specifies the available parameters.

Installation: Put the jar file in "lib/plugins".

AttachmentSize
rapidminer-CFSwOD.jar28.19 KB
CFSwOD-description.doc35.5 KB