Last Updated: February 5, 2008
For statistical computing resources and other software for accurate computing, such as high-precision libraries, optimizers, and random number generators see our statistical computing page. And for software written by me for data distribution, accuracy, and replication see my software page. For sources of research data, see my Data Resources page.
| Where to Start |
||
| The R Statistical Language | The open source statistical
language of choice for most tasks. Based on the 'S' language. Thousands
of contributed packages |
GPL |
| Other General Statistics Packages |
||
| ADE | A modular multi-variate analysis program which includes modules for spatial data analysis. Plays well with R. | GPL |
| Adamsoft | A general purpose package that specialized in client-server based data management, and large-data/low memory computations. Good for large datasets. | GPL
|
| DataPlot | A powerful, but somewhat byzantine package from the National Institute of Standards | OSS |
| Gretl | An open source econometrics package that plays nicely with R | GPL |
| Macanova | Reasonably powerful &
programmable, if not easy to use. |
GPL |
| PSPP | Aspires to replace SPSS. Reads SPSS files and provides the data manipulation functions, but is missing most of the analytical features. | GPL |
| WinIADAMS | A free Windows package for exploratory analysis, time series, and linear models. Nice interactive multi-dimensional table browser and interactive plots. | No
Source |
| Accurate
Statistics |
||
| (The following modules for R, are very useful for highly accurate statistical computing on hard problem. For more resources, and computing libraries, see my Resources for Accurate Computing page. ) | ||
| accuracy | Sensitivity analysis and true random number generation | GPL |
| gmp | Multiple precision arithmetic |
GPL |
| rgenoud | Optimizer using genetic algorithms and derivatives | GPL |
| rstream | Parallelizable random number generators | GPL |
| trust | Trust region based optimization | GPL |
| UNF | Universal Numeric Fingerprints -- format independent data validation. | GPL |
| Data-Interactive Graphics, and Data
Visualization |
||
| Gaugin | Grouping, glyphs, tableplots, oh my.
|
GPL |
| GGobi | Supports data interactive
visualization, exploration, comp, and analysis. Includes automated
projection pursuit in high-dimensions. |
GPL |
| KLIMT | Interactive analysis of classification and regression trees
|
GPL |
| LabPlot |
Data analysis and visualization |
GPL |
| Mondrian | Mondrian is especially useful
for interactive visualization of categorical data, and very large
datasets. |
No
Source |
| OPEN DX | Generates visualizations and
animations for very large scale scientific data |
OSS |
| ParaView |
Parallel visualization of large
datasets. |
GPL |
| VISIT |
Parallel large data
visualization software |
|
| VISTA |
Dynamic, interactive, multi-view
graphics. Plus a very interesting visual user-interface, akin to
data-desk, but more advanced statistically. |
GPL |
| Data Visualization |
||
| Almost
all of the tools listed on this page have some sort of graphing
capabilities. These packages specialize in it. |
||
| Gnuplot | Command-line driven plots in 2D,
and 3D. |
GPL |
| GUPPI | Extensible plotting tool for
Gnome. |
GPL |
| Jas3 | A visualization and curve fitting package in java. | GPL |
| SciGraphica | High performance plotting package similar to Microcal Origin. | GPL |
| Image and Plot Analysis |
||
| These
packages can be used to manipulate images, extract quantitative
information from images, including recovering data from published plots
and graphs. |
||
| DataScan | Extracts information from
topographic images, microscopic images, and others. |
OSS |
| g3data |
Specifically for extracting data
from published graphs. |
GPL |
| Image/J | Can extract data from scanned maps, charts, graphs and even photos. | OSS |
| Scion
Image |
Programmable image program with
data capture capabilities. |
No
source |
| Data Mining |
||
| Also see the next category on related text mining software | ||
| Databionic | Clustering, visualization, and classification using emergent self-organizing maps. | GPL |
| Knime | Supports data pipelines for data processing, clustering, supervised learning, etc. GUI, CLI and API based. | OSS |
| ORange | Predictive modeling, ensemble methods, clustering and validation, using C components and GUI widgets, and Python integration. | GPL |
| Rattle | A Gnome based interface that glues together a large number of (clustering, association, machine learing, evaluation) modules in R for data mining |
GPL |
| Tanagra | Supports data processing streams including clustering, supervised learing, meta-spv, and cross-validation. Provides a GUI interface. |
OSS |
| Text Manipulation, Management, Mining and
Analysis |
||
| A list
of commercial and non-commercial tools for qualitative analysis is part
of the open
directory project and a well-subscribed discussion list about
software can be found as part of jisc,
and a comparison of QDAS packages is here.
The ML-Interfaces package on BioConductor provides a
uniform interface to a large set of machine learning packages in R. |
||
| AnSWR | From the CDC, for mixed
qualitative/quantitative analysis. |
No
Source |
| EZ-Text |
From the CDC, for textual data
analysis. |
No
Source |
| Judge |
Performs automatic classification and clustering of documents, | GPL |
| Kea | Performs automated key phrase extraction. | GPL |
| Language Archiving Technology |
A hosted service for text
management and analysis. |
Hosted |
| Perl | The programming language for supreme text mangling. | OSS |
| SIL tools |
If you have a lot of text on-line, the concordance, indexing, and database from the Summer Institute of Linguistics may be what you need. | No
Source |
| Tabari | Uses special purpose rules for categorizing news events from new text. | GPL |
| Tams | Textual analysis and markup. |
GPL |
| TextStat | Another indexing/concordance package. | GPL |
| Weft |
For qualitative data management
and coding. |
OSS |
| Weka | Weka is a collection of machine learning algorithms for data mining, including text mining. (R-Weka connects Weka and R, and is available on CRAN). | GPL |
| YALE (now RapidMiner) | A flexible standalone package that contains many data mining algorithms. | GPL |
| Spatial Statistics and GIS |
||
| In
addition to the individual packages below, the Free GIS Site and OpenSourceGis sites maintains
lists of many open-source GIS packages. The CISSS Tools
Clearinghouse maintains links to many spatial analysis programs.
Kelly pace gives a list of links to software
for advanced spatiotemporal econometrics. The AI-geostats software page has a
links to geo-spatial statistics programs and code. And Rgeo lists lots of
contributed packages for doing geospatial statistics with R, including 'fields', 'geoR',
'graper' , 'grass', and 'spatstat'. |
||
| Choroware | Chloropleth maps with genetic algorithm generated class intervals. | GPL |
| CrimeStat | Network, spatial and statistical analysis for crime data. Created for the National Institute of Justice. | No Source |
| Fragstats | Designed to compute a wide variety of landscape metrics for categorical map patterns | GPL |
| Geoda | Unusual in in its combination of GIS and spatial econmetrics. | No
Source |
| Geovista Studio | General GIS toolkit and exploratory data analysis system | GPL |
| Grass | One the most powerful, free, geographic information system for the display of spatial data. | GPL |
| SatScan | Space-time scan statistics -- for analysis of disease and other clusters distributed in space and time | No
Source |
| SAGA | Combines GIS with kriging and terrain analysis | GPL |
| Spatial Econometrics Lib. |
A library of Matlab functions
for advanced spatial, and spatiotemporal econometric analysis |
OSS |
STARS |
Space time analysis of regional
systems. Designed for the dynamic exploratory analysis of data measured
for areal units at multiple points in time. If you have
spatial time-series data, check this. |
GPL |
| Survey
Data Collection and Analysis |
||
| The
general software packages above have some facilities for survey
analysis. The programs below specialize in data collection and/or the
analysis of complex surveys. Also see the Epidemiology
section. |
||
| AM | Handles analysis of complex survey samples, such as NAEP and TIMMS | No
Source |
| dopoxtools | Free research web survey hosting | Hosted |
| Mod_survey | A very mature open source survey
system. It is implemented as a drop-in apache module. It supports
creation of survey templates using XML, and export of the resulting
data in a number of interchange formats. Mod_survey can be configured
in a decentralized way, so that all users on a particular web server
can administer their own surveys independently. (Also see YaaCs, below) |
GPL |
| OpenSurveyPilot | Server based web survey system |
GPL |
| PHPEsp | PHP based web survey system | GPL |
| Lime Survey | PHP based web survey system |
GPL |
| protogenie | Free research web survey hosting | Hosted |
| PsychExps | A repository of experimental design scripts to be run under the macromedia authorware environment. | Mixed |
| SurveyWiz | Simple JavaScript based web
survey system |
GPL |
| TESS | Time-Sharing Experiments for the Social Sciences. n NSF funded infrastructure to provide both web and phone surveys. | Hosted |
| WebExp2 | A java-based system for on-line psych experiments. | No
Source |
| YaaCs | A CATI system that uses Mod_survey for the data collection, and offers additional management of other phases of the survey work flow -- questionnaire building, interviewer management, etc. | GPL |
| Agent-Based Simulation |
||
| The International Society for Artificial Life maintains a list of links to many agent-based simulation framework. | ||
| Ascape | Agent based simulation package |
GPL |
| EVO |
A simulation environment for
co-evolution, based on SWARM |
OSS |
| MASON |
A java-based agent-based modeling system popular in political science | OSS |
| NetLogo |
An updated dialect of the Logo
language for multi-agent simulation |
No Source |
| REPAST | A multi agent simulation
toolkit, with multiple implementations and built in adaptive features |
OSS |
| Sesam | Simulation system with cool visual model building interface. | GPL |
| Swarm | A mature, full-featured
framework for agent-based modeling, built in Objective C |
GPL |
| Dynamic Event Simulation |
||
| This overlaps with Agent-Based Simulation above. I have listed only packages below, but several programmng libraries are also available, including: DSOL (Java), SimPy (Python), Adevs (C++) and DeX (Python, C++, Scripting). | ||
| Desmo-J | Discrete event simulation framework | GPL |
| OMNet++ | OMNeT++ is a component-based, modular and open-architecture simulation environment with strong GUI support and an embeddable simulation kernel, focussing on communication networks, but general enough to be used for network, systems, and business process simulation. | Academic Source License (not open source) |
| Monte Carlo and Markov-Chain Monte Carlo
Simulation |
||
| R, and
many of the other general packages above can be used for MC simulation. R also has a number of modules to
perform Bayesian MCMC analysis directly, and through communicating with
BUGS, and JAGS. |
||
| JAGS |
Just another GIBBS sampler. A
program for Bayesian hierarchical models. ("Not unlike BUGS") |
GPL |
| MCMCpack |
An R module to perform MCMC based
analysis. Very easy to use, since it contains a large variety of
pre-configured models |
GPL |
| McSim | A specially tailored Monte Carlo
simulation package. Goes well beyond general packages. |
GPL |
| OpenBugs | Open source rewrite of BUGS for bayesian simulation | GPL |
| WinBUGS |
Still the best BUGS for windows,
but not OSS. |
No
Source |
| Specialized Statistical Packages | ||
| Fityk |
Nonlinear peak fitting. |
GPL |
| Gambit | game theory made simple(r) | OSS |
| gSwing |
Election result tracking and
display |
GPL |
| M.D. Anderson Cancer Center | Has useful biostat software from
the biostats department. |
Mixed. |
| MDSX | Multidimensional Scaling Routines for Windows | No Source |
| MPCA |
Discrete and independent
component analysis. |
GPL |
| MX | Structureal Equation Modeling
(like LISREL) |
No
Source |
| PAST | PAlaeontological STatistics. Not strictly social science, of course, but the correspondence analysis, geometric analysis and cladistics could be applied fruitfully. | No Source |
| TETRAD | A LISREL like structural equation modeling program | GPL |
| TDA | Transition Data Analysis.A system for analyzing event data , supports lots of options and models | GPL |
| Voteview |
Voteview and nominate are for
viewing and analyzing roll-call voting. |
GPL |
| Epidemiology | ||
| The CDC Software Page also offers a set of special packages for sampling design factors, meta-analysis, and spatial analysis.The WWW Virtual Epidemiology Library . | ||
| MIX | Guided interactive meta-analysis. |
GPL |
| Epidata | Provides for programmed data
entry and simple analysis. |
No source. |
| Epigrass |
Epigrass is a software for visualizing, analyzing and simulating of epidemic processes on geo-referenced networks. |
GPL
|
| Epi-info | Epidemiological statistics,
maps, reports. |
No
Source |
| Openepi |
Javascript-based (on or off-line) simple epidemiological statistics. |
OSS |
| Netepi |
Web based secure data entry and
analysis for epidemiology. |
GPL |
| WinPepi |
over 75 modules for common epidemiolical methods. |
No Source |
| Data Cleaning, and Management |
||
| For managing qualitative data, see the Text Tools section. For other database options see the Free SQL List and The ACM's Sigmod List | ||
| Berkeley DB |
A fast key-value based DB. Very
lightweight (much more lightweight than SQL, and does not require
separate server running). Very fast for key-based retrievals.Also see
thefilehash and R.huge packages
for using key-value DB's in R. |
OSS |
| CCOUNT | Does data cleaning, advanced cross-tabulation, and other market research function. Also reads many mainframe-style data formats (e.g. EBCDC, Column Binary). Modeled after SPSS Quantum. | GPL |
| CSPRO | Does form base data entry,
crosstabulation, and mapping. From the U.S. Census. |
|
| HDF |
Hierarchical Data Format -- a
portable format for representing and manipulating large scientific
datasets. The latest version is compatible with netcdf. Also see the
netcdf packages for R. |
|
| MySql | One of the most mature and stable open source SQL databases. | GPL |
| netCDF |
A portable format for
repesresenting and manipulating large scientific datasets. Also see the
netcdf package in R. |
GPL |
| PostGRES | One of the most mature and
stable open source SQL databases. |
GPL |
| R DBI |
Connects R and SQL databases. |
GPL |
| Matrix Algebra, Symbolic Algebra, and
Computational Algebra Systems |
||
| These are standalone systems. For related programmer's libraries see my Resources for Numerical Accuracy listing. The following feature comparison contrasts these and a dozen other more specialized packages. | ||
| Axiom |
Computer algebra. Lots of functions. Good documentation | GPL |
| Ginac | A computer algebra system. (C++
Library) |
GPL |
| FreeMat | Matrix algebra system. Matlab
compatibility and built-in parallelization. |
GPL |
| GAP |
Computer algebra system for
group theory. Computatinal discrete algebra. |
OSS |
| JACAL. | A computer algebra system. | GPL |
| Magnus |
Computer algebra system for group theory. | GPL |
| Mathomatic |
Yet another computer algebra
system |
GPL |
| Maxima | A computer algebra system. | GPL |
| OCTAVE | A matrix manipulation/mathematics environment like Matlab. Mature. | GPL |
| PARI/GP | A computer algebra system with arbitrary precision arithmetic, like Maple or Mathematica. | GPL |
| RLAB | A matrix manipulation
environment. |
GPL |
| SAGE | General purpose mathematical
computing environment |
GPL |
| SciLab | A matrix manipulation/mathematics environment like Matlab. Mature. | GPL |
| Tela | Tensor computing |
GPL |
| YACAS | Yet another computer algebra
system. (Eponymous) Comes with Euler, for numerical programming. |
GPL |
| Yorick |
An older matrix language. |
OSS |
| Social Network Analysis |
||
| Also see
the Spatial category above for software with
complementary and overlapping spatial network and display features. |
||
| Egonet |
Collection and analysis of egocentric network data. |
No Source |
| GraphViz |
Mathematical graph visualization |
OSS |
| Nettvis |
Analyze and visualize social
networks. Includes an on-line service. |
GPL |
| Pajek | Graph clustering, partitioning, citation analysis, network comparison (differences, unions), metrics. | No Source |
| SocNetV |
Provides core graph measures for
social network analysis |
GPL |
| STOCNET | Analysis of some interesting
models, including evolution of social networks, blockmodeling, dyadic variable and actor anlaysis, maximum likelihood analysis of longitudinal (evolution of) networks (through SIENNA) , core network analysis. |
GPL |
| Tulip | Visualization for extremely large graphs. Plugins are available for clustering and core graph metrics. | GPL |
| VISONE | Provides core graph measures for social network analysis | No
source |
| WinMine | Bayesian and dependency (decision-tree) network builder | No
source |
| Differential Equations and Dynamic
Simulation |
||
| A good list of dynamic simulation packages is maintained by the SIAM activity group on dynamic systems. | ||
| PETC | scientific toolkit for differential equations | No Source |
| scirun | A scientific environment for simulation and PDE's. | No
source |
| SUNDIALS | Nonlinear and differential/algebraic equation Solver | OSS |
There are some web-based statistics tutorials out there, but none that I like. I recommend some readings:
"Entia non sunt mutiplicanda sine necessitate" - William of Ockham's rule
"Ad indicia spectate." - Micah's corollary
"Doing econometrics is like trying to learn the laws of electricity by playing the radio." - Orcutt's observation
"One problem with political science is that its laboratories are unsecured, allowing real people to roam around inside them, spitting in test tubes and fiddling with computers" - Walter Kirn
"You can see a lot, just by looking." - Yogi Berra
| Copyright © 1995-2008 | Micah Altman |