The default plot method for dataframes in R is to show each numeric variable in a pair-wise scatter plot. I find this to be a really useful first look at a dataset, both to see correlations and joint distributions between variables, but also to quickly diagnose potential strangeness like bands of repeating values or outliers.

From what I can tell, there are no builtins in the python data ecosystem (numpy, pandas, matplotlib) for this so I coded up a function to emulate the R behaviour. You can get it in this gist (feedback welcomed).

Here’s an example of it in action showing derived time-series features (12 hour rates of change) for some clinical variables.

plot_correlogram(df)

### Like this:

Like Loading...

*Related*

Unless I’m misunderstanding, this already exists as

pandas.tools.plotting.scatter_matrix(df, …)

See http://pandas.pydata.org/pandas-docs/version/0.15.0/visualization.html#plotting-tools

Well, would you look at that. Apparently I didn’t look around enough! Thanks for the heads up.

Pingback: Distilled News | Data Analytics & R

Heres a similar implementation using dataframes with the attractive Seaborn package. Good work!

http://stanford.edu/~mwaskom/software/seaborn/examples/scatterplot_matrix.html