Insightful Miner 3
Scalable data mining and analysis

What is data mining--and, more important, why should you (as a quality professional) care?

Data mining was born in marketing, where it was used to ferret out unsuspected linkages between variables in huge data sets generated by computerized cash registers in retail shopping. The classic example is the discovery that people who buy diapers also like to buy ham--so put them close together and you can increase sales of both. In science, it has revealed new insights into the ways in which industrial activity in Eurasia affects the spread of Nile Fever in the United States. It can bring to light patterns of influence on production quality that you would never have dreamed possible.

Insightful Miner 3 is Insightful Corp.’s dedicated data mining product that works either alone or hand-in-glove with its high-end analysis product S-Plus. IM3 has the ubiquitous drag-and-drop visual programming approach familiar throughout this market sector and offers packaged export of code to either S-Plus scripts or fully portable ANSI C routines for use elsewhere.

The expression language is similar to that in S-Plus, and where S-Plus 6.1 is also available, there’s a library extending certain native IM3 functions to their full S-Plus versions. This library also adds S-Plus graph nodes.

Featured is an instantly usable explorer model, with each page holding a library of components (the S-Plus library, if present, being one of these). Additional libraries can be created and managed by the user. One point to beware, though: Although S-Plus will run on any version of Windows from 98 to NT 4.0 and upward, IM3 insists on XP Professional (I found it just as usable under XP Home Edition) or NT 4.0 SP6.

IM3 is omnivorous in its acceptance of data input. If your data set can be imported into almost any mainstream spreadsheet, database or analytical package, or any other program for which ODBC drivers are installed, then it’s accessible to Miner. The import filters have both intelligent defaults and extensive tuning controls, allowing easy navigation to specific worksheets or tables within the source. The more common sources are handled by native drivers, and even archaic records could be imported via text files. For S-Plus users, there are also dedicated programming nodes for directly reading or writing data in S-Plus chapters or transport files.

Once the data are in, there’s a good set of tools for preparatory manipulation, cleaning and evaluation. These include dual-input comparison nodes usually used to compare outputs such as predicted and actual results but also effective in trapping transcription or other input errors. Data sets can be transposed--a useful trick if used with care and forethought.

In functional terms, nodes come in several classes. In addition to standard links that pass Cartesian data sets from one node to another, there is a model transfer type. Prediction, C-generation and markup language export (hypertext or the XML predictive model dialect) all sport these new ports on their output side; principal components, regressions (Cox, linear and logistic), K-means, naive Bayes, neural nets and classification or regression trees all have model ports on the input side as well.

IM3’s prediction node is the de facto centerpiece of the whole show. The node has twin input ports--standard port for the data and model port to provide the basis on which predictions will be made. Models can be copied to storage inside the prediction node or left dynamic. Results on my two industrial test bed contracts, testing predictions against known past outcomes, were impressive.

The worksheet offers a number of useful features, including user controls on data block memory usage. Components can be swept up together and represented as a black-box “collection node,” saving space and improving visual comprehension.

For applications development, there are several optimization and convenience features. You have an option to add a parameters table to S-Plus script properties dialogs. Validity checking, a radio button specifying where and how names, types, etc., are to be provided. There’s more, but suffice to say that the script node is a well-implemented facility that extends the reach of IM3 models.

All in all, IM3 allows highly flexible exploration of data in a very approachable way, allowing beginners to achieve valuable plug-and-go results while experts move rapidly toward optimized solutions in a larger environment.

About the author

Felix Grant is a lecturer and research consultant in the United Kingdom.

Insightful Miner 3
by Insightful Corp.

Requirements: Desktop or server editions available; 256 MB RAM; 300 MB disk; Windows 2000, 2003 (desktop and server versions), NT 4.0 or XP. Also supports Microsoft Terminal Services and Sun Solaris 2.6, 7, 8 and 9.

Price: Insightful Miner Server starts at $27,000 for four users.

Insightful Corp.
1700 Westlake Ave. N., Ste. 500
Seattle, WA 98109
Phone: (206) 283-8802
Fax: (206) 283-6310
E-mail: info@insightful.com
Web: www.insightful.com