The five free data mining tools listed below are equally as capable as many products that have high price tags attached to them.They are in no way inferior, and most are Open Source with a large community of knowledgeable developers.
Knime is a widely used open source data mining, visualisation and reporting graphical workbench used by over 3000 organisations. Knime desktop is the entry open source version of Knime (other paid for versions are for organisations that need support and additional features). It is based on the well regarded and widely used Eclipse IDE platform, making it as much a development platform (for bespoke extensions) as a data mining platform.
This is a very capable open source visualisation and analysis tool with an easy to use interface. Most analysis can be achieved through its visual programming interface (drag and drop of widgets) and most visual tools are supported including scatterplots, bar charts, trees, dendograms and heatmaps.
A large number (over 100) of widgets are supported. These cover data transformation, classification, regression, association, visualisation and unsupervised learning methods. There are also some specialised add-ons covering bioinformatics, text mining and other specialist requirements. The environment is extendible through Python scripting and this includes creating new widgets if needed.
The documentation is good too and includes first steps, detailed widget descriptions and scripting. It runs on Windows, Mac OS X and Linux.
Strictly speaking R is a programming language, but there are literally thousands of libraries that can be incorporated into the R environment making it a powerful data mining environment. In reality R is probably the most flexible and powerful data mining environment available, but it does require high levels of skill.
From a career perspective learning R is a good investment. Many enterprise tools support R (SAP Predictive Analysis, Tibco Spotfire for example) and it addresses much more than data mining. Revolution Analytics has based its products on R and have added a graphical front-end. They also offer a free version of R that is claimed to be faster than the general distribution.
This is perhaps the most widely used open source data mining platform (with over 3 million downloads). It incorporates analytical ETL (Extract, Transform and Load), data mining and predictive reporting. The graphical user interface and visualisation tools are excellent, with considerable intelligence built into the workflow construction process. This provides on-the-fly error recognition and suggested quick fixes. Its meta data transformation capability is unique among tools of this nature allowing results to be inspected at design time.
It incorporates over 500 operators and includes the WEKA machine learning library. Many extensions are available for analysis of time series and text and other specialised processes.
Most data sources are supported including Excel, Access, Oracle, IBM DB2, Microsoft SQL Server, Sybase, Ingres, My SQL, text files and others.
Rapid-i provides support and training services for organisations that want a supported product.
This set of data mining tools is incorporated into many other products (Knime and Rapid Miner for example), but it also a stand-alone platform for many data mining tasks including preprocessing, clustering, regression, classification and visualisation. The support for data sources is extended through Java Database Connectivity, but the default format for data is the flat file.
WEKA comes from the highly respected machine learning group at the University of Waikato, New Zealand (same origin as the 11AntsAnalytics Excel data mining tool).
Models can be built using a graphical user interface or a command line input.