Resources


Sometimes finding the right tools and documentation is half the battle; these are some resources that I have found useful. Most of the software listed here is free and open source.

Scientific Computing


Python and SciPy
SciPy is an open-source library for scientific computing in the Python programming language. In combination with NumPy, Matplotlib, and the IPython shell, it provides a complete solution for scientific and numeric computing comparable with MATLAB. SciPy can be used interactively with IPython, or used directly in Python programs and scripts.

R
R is a language and environment for statistical computing and graphics. In combination with the many available packages, it is an ideal environment for many data exploration, summarization, statistical hypothesis testing, data modeling, classification, and clustering tasks. It is capable of producing publication-quality plots and figures. The ggplot2 and lattice packages provide powerful multivariate graphing and plotting.

The Learning R blog contains excellent illustrations and sample code demonstrating the kinds of graphics that can be generated using R using the lattice and ggplot2 packages. Quick-R is another valuable website for getting up to speed with R.

Octave
GNU Octave is a high-level language and interactive shell for numerical computations. The language is mostly compatible with MATLAB.

Scilab
Scilab is another high-level language and environment for numerical and scientific computation, similar to MATLAB.

Maxima
Maxima is a general purpose computer algebra system suitable for manipulating symbolic and numerical expressions. It has several graphical front ends, including wxMaxima (cross-platform) and Cantor (KDE based).

Sage
Sage is another free open-source computer algebra and general mathematical software system. It is Python-based and its facilities can be used from general Python programs.

Plotting and Visualization


Gnuplot
Gnuplot is a portable tool for 2D and 3D plotting and visualization that supports a wide variety of output formats.

Matplotlib
Matplotlib is a 2D plotting library for Python capable of producing publication-quality figures in a variety of formats.

Lattice and ggplot2

The lattice and ggplot2 packages allow you to create sophisticated statistical trellis plots in R. These packages are perfect for creating histograms, scatterplots, boxplots, and plotting density estimates. They are particularly good at handling multivariate and categorical data.

Processing
Sometimes trellis plots are insufficient for visualizing complex multi-dimensional datasets. In these situations, a more effective visualization can often be had by animating the data, or by plotting in three dimensions. Processing is a Java-based open source programming language for programatically creating images and animations. It allows you to quickly plot and visualize data in two or three dimensions, and animate this data over time.

NodeBox
NodeBox is a Mac OS X application that provides similar capabilities to processing, except that it uses a Python code instead of Java.

Graphviz
Graphviz is a suite of tools for graph visualization. It allows you to specify the nodes and edges of a graph using a simple language called DOT, and will automatically layout the graph for you. It can produce graphical output in a variety of bitmap and vector formats. The popular Omnigraffle diagram editor can import DOT files, allowing you to tweak the output.

The dot2tex program can be used to generate LaTeX output from Graphviz. Using it, you can embed LaTeX math into graphs. It uses TikZ/PGF to render graphs, and so generally produces very high quality output.

TikZ/PGF
The TikZ and PGF packages allow you to programmatically create fantastic looking technical graphics in LaTeX. The syntax can be a little tricky at first, but the graphics produced are great. The texample.net site has a wealth of illustrative examples, including: a Kalman filter system diagram, a Computer Science mind map, an illustration of bootstrap resampling, a merge sort recursion tree, and a neural net.

Typesetting and Document Preparation


LaTeX
LaTeX is the standard for preparing and typesetting scientific documents. Some useful LaTeX packages include:
  • amsmath: an extension package for LaTeX that provides various features to facilitate writing math formulas and to improve the typographical quality of their output.
  • booktabs: a package for creating publication quality tables for LaTeX.
  • hyperref: a TeX package for adding hyperlinks to PDF and HTML outputs. Also allows you to set the PDF metdata, and create PDF tables of content automatically.
  • microtype: enables micro-typographic extensions in pdfTeX. Often simply including this package will improve the appearance of your document.
  • TikZ and PGF: packages for programmatically creating graphics in TeX.
The LaTeX Wikibook is a free online reference for LaTeX.

Writing Tools
DeTeX is a small program to remove TeX and LaTeX constructs from a text file. It is useful for stripping out TeX formatting constructs if you need to process or compute statistics on the plain text. For example, to count the words in a LaTeX document you could run: detex paper.tex | wc -w.

GNU Aspell is a free and open source spell checker. It can handle LaTeX documents by specifying --mode=tex on the command line.

GNU style and diction are two standard UNIX commands for analyzing English and German language text documents. Diction uses heuristics to identify poor style and common language errors. Style analyses the surface characteristics of a document, reporting statistics and readability indices.

Reference Management
BibTeX simplifies formatting and managing references in LaTeX documents. Managing BibTeX databases can, however, often be a chore. BibDesk is a worthy tool for managing BibTeX databases on Mac OS X. JabRef is a decent Java based bibliography manager.