The default plot method for dataframes in R is to show each numeric variable in a pair-wise scatter plot. I find this to be a really useful first look at a dataset, both to see correlations and joint distributions between variables, but also to quickly diagnose potential strangeness like bands of repeating values or outliers.
From what I can tell, there are no builtins in the python data ecosystem (numpy, pandas, matplotlib) for this so I coded up a function to emulate the R behaviour. You can get it in this gist (feedback welcomed).
Here’s an example of it in action showing derived time-series features (12 hour rates of change) for some clinical variables.
plot_correlogram(df)
Unless I’m misunderstanding, this already exists as
pandas.tools.plotting.scatter_matrix(df, …)
See http://pandas.pydata.org/pandas-docs/version/0.15.0/visualization.html#plotting-tools
Well, would you look at that. Apparently I didn’t look around enough! Thanks for the heads up.
Pingback: Distilled News | Data Analytics & R
Heres a similar implementation using dataframes with the attractive Seaborn package. Good work!
http://stanford.edu/~mwaskom/software/seaborn/examples/scatterplot_matrix.html