R's need for speed:
plotting millions of points in seconds!
Have you ever had to generate a scatterplot with one million points, or more? As a bioinformatician working in the academia, and specifically on large datasets, this happens to me almost on a daily basis.
My tool of choice for plotting is always R, and more specifically the grammar of graphs
ggplot2
package. But, when handling such large amounts of data, I always encounter
quite the bottleneck: plotting can take forever.
Until now, I usually plotted just a few randomly selected points while fixing the figure style. Only then I would generate the final plot using all data points, sometimes waiting more than 5 min for it to be generated and exported to a png file.
Today, I finally got tired of it and went down a rabbit hole of DDG searches (yes, I use DuckDuckGo, and you should too!). Here is what I unearthed.
How fast is plotting with R and ggplot2
?
Let’s start by generating a dataset of 1 million X and Y coordinates, normally distributed:
How long would the default R plot()
and ggplot
methods take to plot this?
And here is our starting point: R would take around 11.5 s and ggplot
even longer,
with ~13.6 s.
Using pch='.'
is fast (!!!)
One of the tips I found on the web comes from a StackOverflow answer,
recommending to use the pch='.'
option to plot data points as non-aliased single
pixels.
This provides a ~5x speed up, from 13.6 s to less than 3 s!
scattermore
is faster
Then, I found another StackOverflow answer, with a user recommending
his new R package scattermore
(last commit to the package was on Jan 31st, 2021, at the
time of writing this post), which uses a C script to rasterize the dots as a bitmap and
then plot them with R.
The overall speed up now is of ~13x: from 13.55 s to ~1 s!
So, if you have to plot a huge amount of points into a scatterplot, as I often do, I
would highly recommend using scattermore
. And a huge shoutoute to exaexa
for
implementing and sharing this amazingly fast package!
If you already heard about this package, good for you. Otherwise, I hope this piece helped you somehow ☺️ Peace out ✌️