March 28, 2015

R style default plot for Pandas DataFrame

By Corey Chivers ¶ Posted in Rstats ¶ Tagged datascience, healthcare, rstats, visualization ¶ 4 Comments

The default plot method for dataframes in R is to show each numeric variable in a pair-wise scatter plot. I find this to be a really useful first look at a dataset, both to see correlations and joint distributions between variables, but also to quickly diagnose potential strangeness like bands of repeating values or outliers.

From what I can tell, there are no builtins in the python data ecosystem (numpy, pandas, matplotlib) for this so I coded up a function to emulate the R behaviour. You can get it in this gist (feedback welcomed).

Here’s an example of it in action showing derived time-series features (12 hour rates of change) for some clinical variables.

plot_correlogram(df)

4 thoughts on “R style default plot for Pandas DataFrame”

Bernard says:

March 28, 2015 at 1:53 pm

Unless I’m misunderstanding, this already exists as

pandas.tools.plotting.scatter_matrix(df, …)

See http://pandas.pydata.org/pandas-docs/version/0.15.0/visualization.html#plotting-tools

Reply
- Corey Chivers says:
  
  March 28, 2015 at 4:12 pm
  
  Well, would you look at that. Apparently I didn’t look around enough! Thanks for the heads up.
  
  Reply
Pingback: Distilled News | Data Analytics & R
asdfdsf says:

March 30, 2015 at 9:46 am

Heres a similar implementation using dataframes with the attractive Seaborn package. Good work!

http://stanford.edu/~mwaskom/software/seaborn/examples/scatterplot_matrix.html

Reply

bayesianbiologist

Corey Chivers on P(A|B) ∝P(B|A)P(A)

R style default plot for Pandas DataFrame

4 thoughts on “R style default plot for Pandas DataFrame”

Leave a comment Cancel reply

Share this:

Related

4 thoughts on “R style default plot for Pandas DataFrame”

Leave a comment Cancel reply