Since last year Revolution Analytics has been publishing beta versions of Revolution R Open and finally in April this year they released RRO 8.0.3. The current release is RRO 3.2.2 (naming was adapted to fit the R version it is built upon). This post will give you an introduction on my favorite new features, how to install RRO and on performance benchmarks.
RRO 3.2.2 is based on the base R that we all know and which is published by the R Foundation. My favorite features of this new RRO version are:
- 100% compatible with R and therefore your existing R code and all packages
- Reproducible R toolkit – every version of RRO comes with a snapshot of CRAN, which means that installing packages in RRO always gives you the same version of a package as long as you have the same RRO version.
- Daily CRAN Snapshots if you need a different version of a package: checkpoint(“YYYY-MM-DD”)
- No more problems when moving code to another machine as long as you are using the same version of RRO
- Nice package explorer
- Also sorted by task
One feature that sounded very interesting but I could not get to work as expected is the Intel Math Kernel Libraries that should improve performance. Intel MKL contains highly vectorized and threaded functions like fast fourier transformation, linear algebra and methods for statistics and vector math. You just have to install it and RRO automatically detects it. As I saw when preparing this post, the library works on the basic matrix computations but when tested on real data with real packages that use matrix functions in their core, the code performance did not improve.
The installation of Revolution R Open is very easy:
- Download RRO
- Download MKL from the above link if you want to make use of the multithreaded performance (not required on Mac OS X)
- If you’ve never used R before and don’t have an IDE, I recommend using RStudio
The information after starting RStudio looks a bit different for me (different version information and so on), especially this is interesting if you’ve installed MKL as well (the default will be using all cores your machine has):
Multithreaded BLAS/LAPACK libraries detected. Using 4 cores for math algorithms.
As many benchmarks already report, RRO is really an improvement over Base R with regards to performance and using 4 cores decreases run time a little bit more. These improvements only appear for certain operations though. In general performance gains are reported for matrix calculations and matrix functions, Cholesky factorization, singular value decomposition, principal component analysis and linear discriminant analysis. These are some benchmarks I found:
- Revolution RevoR Enterprise Benchmark Details
- How the MKL speeds up Revolution R Open
- 40-percent faster R without any code changes
I tried all these tasks as well on both my Surface 3 running Windows 10 (4 GB RAM, 4 cores) and on my Mac Pro running Windows 7 (16 GB RAM, 4 cores) and observed similar improvements for the above mentioned tasks.
Motivated to try these packages on real data and with methods one would expect to perform matrix computations, I downloaded the packages fabia (Factor Analysis for Bicluster Acquisition) and irlba (Fast Truncated SVD, PCA and Symmetric Eigendecomposition for Large Dense and Sparse Matrices). To test fabia I used the datasets that comes with the package. To give irlba a real task, I used the Netflix Prize dataset which contains movie ratings of 480,000 users for 18,000 movies. The results were disappointing, the performance of the algorithms was the same, no matter whether I used base R, RRO with 1 core or RRO with 4 cores.
I looked into the code of the fabia package and it does use matrix calculations and also the method svd (singular value decomposition) which should both be faster using RRO (as shown above). But there is so much code and computations around, that this is probably not the slowest part of the method anyway and performance improvements there do not stand out at all. In the irlba package, they use a loop and also some matrix calculations but also lots of other code. Additionally here we did not use a real matrix but a sparse matrix (with package Matrix), therefore I believe the Math Kernel Library can not really improve the computations. A friend suggested the package pracma. With its method itersolve you can solve systems of linear equations. Unfortunately this method also performs iterations and the run times are the same in all three settings.
My conclusion about the Revolution R Open performance improvements is that it only helps in certain cases (mainly linear algebra) but you will not always get a gain out of it. Since it does not slow other computations down or have any other disadvantages, it will not hurt you using it and sometimes you will benefit from it. Furthermore the functionality of the reproducible R kit (daily snapshots of CRAN) will make your workflow a lot comprehensible and less prone to error.
Further links:
- Introducing Revolution R Open and Revolution R Plus
- R benchmarks (R25 is often used)