RapidMiner is the open source data mining solution used within e-Lico for executing data mining operators and workflows. Within e-Lico, we have developed various extensions for RapidMiner.
Using the RapidMiner Community Extension, the user can share data mining workflows on the myexperiment.org portal.
The Image Mining Extension uses the image mining Web service provided by NHRF to execute image mining methods within RapidMiner.
The Market Basket Analysis Extension provides the Rapid Miner operators that build upon the association rule mining framework, but provide additional analytic capabilities beyond simple associations.
This RapidMiner extension uses the Intelligent Discovery Assistant (IDA) developed at UZh to create data mining workflow plans directly in RapidMiner. Input objects can be loaded from the repository, and the plans can be immediately loaded in RapidMiner. The extension is still in beta stage.
To install, make sure you have a running and patched Flora and XSB (as described on the eProPlan page), download the Jar attached to this page, and copy to your rapidminer/lib/plugins folder. In RapidMiner, open the preferences, navigate to the e-LICO tab and specify the path to your Flora extension as well as an (arbitrary) temporary directory.
To use the extension, choose "Start IDA Wizard" from the Tools menu. You can then specify a Task and Goal and drag data sets onto the specific data requirements. When planning completes, you can open the generated process in RapidMiner and continue working with it as with any other process.
This video shows how you can use the IDA Wizard:
Attachment | Size |
---|---|
rapidminer-IDA-5.0.000.jar | 8.34 MB |
Using the RapidMiner community extension, you can share your RapidMiner workflows with data mininers all over the word using the myexperiment.org portal. On the myExpeiment web site you can discuss data mining processes, exchange workflows, and meet data miners working on similar problems. Based on workflows shared on myexperiment.org, we will develop tools to assist you in designing new processes.
To show the myExperiment browser, go to the View menu in RapidMiner and enable the MyExperiment Browser tab. I recommend to minimize the tab and open it whenever you need it. You can immediately start browsing the workflow repository, but in order to upload workflows, you need to make an account on myexperiment.org.
Within RapidMiner, you can open all public RapidMiner workflows. You can recognize the RapidMiner workflows by looking at the "Workflow Type" entry on the upper right corner. To browse the workflow on the Web, click the "Browse" button to open a Web browser showing the respective site.
To upload your current RapidMiner process to myexperiment.org, click on upload and fill in the following dialog:
The description is automatically extracted from the process comment if it exists. Make sure to set the sharing permissions such that people can actually see your workflow.
To install the Community Extension, download the jar file and place it in the lib/plugins directory in your RapidMiner installation directory. Alternatively, and much easier, use the RapidMiner update mechanism by choosing "Update RapidMiner" in the Help menu.
Attachment | Size |
---|---|
rapidminer-Community-5.0.001.jar | 61.77 KB |
The RapidMiner R extension allows integration of RapidMiner with the widely used open source statistics package R.
Eight R modelling methods are directly available as RapidMiner operators. RapidMiner ExampleSets can be directly used as input to R operators, and are internally converted to the table representation of R. Furthermore, the user can execute arbitrary R scripts as RapidMiner operators. To save the user from defining frequently used scripts over and over again for each process, they can define such re-usable scripts as custom RapidMiner operators.
For the analysis of bio-data, the R bioconductor packages are particularly relevant. These also operate with this extension, and an example process has been posted to myExperiment.
The extension is available from the RapidMiner update server. Furthermore, more up-to-date development builds are posted on this site.
Attachment | Size |
---|---|
rapidminer-R Extension-5.1.000.jar | 2.27 MB |
With this RapidMiner extension, you can use image mining Web services provided by NHRF within RapidMiner.
The image mining plugin contains two classes of operators plus three helper operators:
Use this operator to specify a list of directories which will be scanned for images. The operator will then create an example set containing example (row) per image and three attributes (columns): "directory" (the directory from which the image was loaded), "file" (the filename of the image) and "id" (a generated id). After that, you can modify this information or filter examples before you proceed with uploading. A typical step would be to use the "directory" attribute as the label.
This operator takes an example set as produced by the "List Images" operator and uploads them to the server. An "image_reference" column is then appended to the example set. This reference can be used in all subsequent operators to reference or download the image.
This group of operators perform image transformation methods on the server. The image is not downloaded. These operators require an "image_reference" column to exist. It will generate another image_reference column to reference the result.
These operators extract features from images, transforming it in to example sets (tables) that can then be further processed by RapidMiner. In principle, a feature extraction method turns each image into a table, but most of the time, this table will have only a single row. Still, this operator generates a collection of tables, which you can merge into one, using the regular "Append" operator.
If you insert this operator, you will be presented with an image inspection dialog in the result view whenever double-clicking an example in a plot. This dialog will show the different (intermediate) versions of the image generated so far. More precisely, all "image_reference" attributes in the example set will be displayable.
The image above shows a typical RapidMiner image mining process using the above-mentioned operators.
To install the Image Mining Extension, download the jar file and place it in the lib/plugins directory in your RapidMiner installation directory. Since it is still in the beta stage, it is not yet available from the RapidMiner update server.
Attachment | Size |
---|---|
rapidminer-ImageMining-1.0.jar | 2.75 MB |
This extension consists of the operators provided by PUT that implement 3 pattern mining algorithms for extended market basket analysis. These models build upon the association rule mining framework, but provide additional analytic capabilities beyond simple associations. The first model allows to mine transactional database for negative patterns represented as dissociation itemsets and dissociation rules. The second model of substitutive itemsets filters items and itemsets that can be used interchangeably as substitutes, i.e., itemsets that appear in the transactional database in very similar contexts. Finally, the third model of recommendation rules uses an additional itemset interestingness measure, namely coverage, to construct a set of recommended items using a greedy search procedure. All operators accept the collection of discovered frequent patterns as input data, and produce itemsets and rules as their outputs. The Figure below shows an example of using proposed operators within a data mining workflow inside Rapid Miner.
To install the Market Basket Analysis Extension, download the jar file and place it in the lib/plugins directory in your RapidMiner installation directory.
Attachment | Size |
---|---|
rapidminer-MarketBasketAnalysisOperators-1.0.000.jar | 45.79 KB |
Two subgroup discovery algorithms are available in this Rapidminer extension: the SD algorithm and the CN2-SD algorithm . Both implementations are fully compatible with other RapidMiner operators. This was achieved by taking the existing RM Subgroup Discovery operator as a guideline and using the white paper provided by the Rapidminer support. Some further details are available here.
Installation: Put the jar files in "lib/plugins".
Attachment | Size |
---|---|
rapidminer-CN2-SD-1.0.0.jar | 23.06 KB |
rapidminer-SD-1.0.0.jar | 23.1 KB |
The operator selects a small set of features enabling that a complete and consistent classifier for all examples may be constructed.
The operator supports construction of redundant feature sets so that more than one complete and consistent hypothesis may be generated and that outliers, examples enabling reduction of the size of the minimal set of features, may be detected and eliminated from the set of examples. The included help file describes theoretical basis of the implemented algorithm and specifies the available parameters.
Installation: Put the jar file in "lib/plugins".
Attachment | Size |
---|---|
rapidminer-CFSwOD.jar | 28.19 KB |
CFSwOD-description.doc | 35.5 KB |