====== Michel Jan Marinus Bieleveld ======
{{:bie5782:01_curso_atual:alunos:trabalho_final:michel.bieleveld:ddc74ee1fe4bb8d915e7b781a6efe5f61d11bd2a-standard.jpg?nolink& |}}
Doutorado no Laboratório de Automação Agrícola da Escola Politécnica da USP.
O título de minha tese é: ".?.", orientado pelo prof. Antonio Mauro Saraiva.
===== Meus Exercícios =====
Linque para a página com os meus exercícios resolvidos;
* {{:bie5782:01_curso_atual:alunos:trabalho_final:michel.bieleveld:excercise1.r|Aula 1 - Introdução ao R}}
* {{:bie5782:01_curso_atual:alunos:trabalho_final:michel.bieleveld:excercise2.r|Aula 2 - Funções matemáticas}}
* {{:bie5782:01_curso_atual:alunos:trabalho_final:michel.bieleveld:exercise3.r|Aula 3 - Leitura e manipulação de dados}}
* {{:bie5782:01_curso_atual:alunos:trabalho_final:michel.bieleveld:exercicio4.r|Aula 4 - Análise exploratória}}
* {{:bie5782:01_curso_atual:alunos:trabalho_final:michel.bieleveld:exercicio5.r|Aula 5 - Criação e edição de gráficos}}
* {{:bie5782:01_curso_atual:alunos:trabalho_final:michel.bieleveld:exercicio6v2.r|Aula 6 - Testes de significância}}
* {{:bie5782:01_curso_atual:alunos:trabalho_final:michel.bieleveld:exercicio7.r|Aula 7 - Regressão linear simples}}
* [[exercicios8|Aula 8 - Regressão linear múltipla]]
* {{:bie5782:01_curso_atual:alunos:trabalho_final:michel.bieleveld:exercicio9v2.r|Aula 9 - Construção de funções simples}}
===== Debugging =====
[[debug|A small debugging tutorial with debug()]]
===== Final work =====
I have created a package and have considered the commentaries of Leo and Diago. Therefore I have let go the requirement of improving the speed of the matrix multiplication. Still I have learned considerable amount of working with packages, compiling c++ code on windows for R and documenting the functions. I agree that the C++ version has perhaps not to do so much with R itself and have therefore added a performance measurement function completely in R. Without further ado here are the links to the package and documentation. In addition since the supplied binary will only work on Windows 64bit with R version 2.15 I have added a screenshot that shows the installation, loading, help file and running a build in example.
{{:bie5782:01_curso_atual:alunos:trabalho_final:michel.bieleveld:screenshot.png?direct&300|}}
{{:bie5782:01_curso_atual:alunos:trabalho_final:michel.bieleveld:bie5782_1.0.zip|}}
{{:bie5782:01_curso_atual:alunos:trabalho_final:michel.bieleveld:bie5782-manual.pdf|}}
==== Installation and loading of the package ====
Download the package and set your R working directory to the same directory as where you saved the package. Because we will not be using a repository we will have to supply an additional argument to the install.packages command as follows;
> install.packages(repos=NULL,"bie5782_1.0.zip")
Installing package(s) into ‘C:/Program Files/R/R-2.15.0/library’
(as ‘lib’ is unspecified)
package ‘bie5782’ successfully unpacked and MD5 sums checked
The library can from now on be loaded like any other library with;
> library("bie5782")
Loading required package: Rcpp
==== Help and build in examples====
The package and function documentation is provided as a normal help page as for any other R function. Furthermore, the example as described in the documentation can be run as for any other R function. The following help pages and examples are defined;
?bie5782
?mmm
?performanceMultiplicationTest
example(mmm)
example(performanceMultiplicationTest)
The documentation looks as follows for the mmm function;
Performs matrix multiplication through C++ interface and code.
mmm(a,b)
a |
an n x m matrix |
b |
an m x p matrix |
mmm.nrows = 5 mmm.ncols = 5 mmm.a <- matrix(nrow=mmm.nrows, ncol=mmm.ncols,runif(mmm.nrows*mmm.ncols)) mmm.b <- matrix(nrow=mmm.nrows, ncol=mmm.ncols,runif(mmm.nrows*mmm.ncols)) (mmm.a %*% mmm.b) (mmm(mmm.a,mmm.b))The documentation looks as follows for the mm function;
Plots a performance graph of a function compared to the basic matrix multiplication function.
performanceMultiplicationTest(func,nsize,nrep)
func |
a function that accepts two matrices as arguments, func(x,y) |
nsize |
the range of matrix sizes under test, defaults to seq(10, 1000, 100) |
nrep |
the number of repetition for each measurement, defaults to 5 |
performanceMultiplicationTest(mmm,nsize=seq(10, 1000, 100),nrep=5)==== Measured results ==== Luckily enough for everyone that is using R my function performs worse than the default matrix multiplication function already present in R. To measure this performance I have written a function that measure the amount of time that it takes to do such a multiplication. The function is named performanceMultiplicationTest and accept as arguments the function under test, the size of the nxn matrix that need to be tested and amount of repetitions for each measurement. The last to make sure we have accurate measurements and that there is not on rogue measurement due to increased CPU usage by other applications. As a result the function produced the following graphs which clearly shows the performance of my C++ function in red and the default matrix multiplication function in blue. {{ :bie5782:01_curso_atual:alunos:trabalho_final:michel.bieleveld:rplot01.png?nolink& |}} ==== Functions ==== The help describes the following two functions; mmm() in C++ and performanceMultiplicationTest() in R.
performanceMultiplicationTest <- function(func,nsize=seq(10, 1000, 100),nrep=5)
{
# initialize vectors to empty vectors
time.base = c()
time.func = c()
time.base.mean = c()
time.func.mean = c()
# repeat measurements for each size in steps given by argument nsize
for (i in 1:length(nsize)){
# create two random nxn matrices where n is the size for this iteration
a <- matrix(nrow=nsize[i], ncol=nsize[i],runif(nsize[i]^2))
b <- matrix(nrow=nsize[i], ncol=nsize[i],runif(nsize[i]^2))
# clear the mean time for this matrix size
temp.base.mean = c()
temp.func.mean = c()
# repeat the measurements for nrep times for this matrix size
for (j in 1:nrep){
# do a measurement against the default matrix multiplier
t1 = system.time(a%*%b)[1]
# do a measurement with a new function
t2 = system.time(func(a,b))[1]
# add the measurement to a vector of all measurements
time.base = c(time.base,t1)
# add the measurement to a vector of only the measurement for this matrix size
temp.base.mean = c(temp.base.mean,t1)
# add the measurement to a vector of all measurements
time.func = c(time.func,t2)
# add the measurement to a vector of only the measurement for this matrix size
temp.func.mean = c(temp.func.mean,t2)
}
# now average the results for this matrix size so we can plot a line of avg. measurement
time.base.mean = c(time.base.mean,mean(temp.base.mean))
time.func.mean = c(time.func.mean,mean(temp.func.mean))
}
# create a plot where the ceiling is the max of the measured values
plot(rep(nsize,each=nrep),time.base,col="blue",
ylim=c(0,ceiling(max(time.base,time.func))),
main="Performance graph",
xlab="Matrix size",
ylab="Time (s)",
ann=FALSE,
axes=FALSE)
# create an x-axis where the labels are set to the matrix size in nxn notation
axis(1, at=rep(nsize,each=nrep), lab=paste(rep(nsize,each=nrep),"x",rep(nsize,each=nrep)),pos=0,las=1)
# create an y-axis where the labels are set to the measure time in seconds
axis(2, at=0:ceiling(max(time.base,time.func)),pos=0)
# plot the axis lines themselves
abline(v=0, h=0)
# plot the average measurement time measured for the default function
lines(nsize,time.base.mean,type="l",col="blue")
# plot the points as measured for the new function
points(rep(nsize,each=nrep),time.func,col="red",pch=22)
# plot the line with the average measurements as measured for the new function
lines(nsize,time.func.mean,type="l",col="red")
# add a legend at the top left corner
legend(1, max(time.base,time.func), c("base","function"), cex=0.8,
col=c("blue","red"), pch=21:22, lty=1:2);
# add further information to the graph
title("Performance", xlab="Matrix size",ylab="Time (s)")
TRUE
}
#include
#include
extern "C" {
SEXP rcpp_matrix_multiplication_f(SEXP a, SEXP b){
int arows,acols,bcols;
double *xa,*xb,*rans,temp;
SEXP ans;
PROTECT(a = AS_NUMERIC(a));
PROTECT(b = AS_NUMERIC(b));
arows = INTEGER(GET_DIM(a))[0];
acols = INTEGER(GET_DIM(a))[1];
bcols = INTEGER(GET_DIM(b))[1];
PROTECT(ans = allocMatrix(REALSXP, arows, bcols));
xa = NUMERIC_POINTER(a);
xb = NUMERIC_POINTER(b);
rans = NUMERIC_POINTER(ans);
for ( int i = 0; i < arows; i++ )
{
for ( int j = 0; j < bcols; j++ )
{
temp = 0;
for ( int k = 0; k < acols; k++ )
{
temp = temp + (xa[i+k*bcols] * xb[k+acols*j]);
}
rans[i+j*bcols] = temp;
}
}
UNPROTECT(3) ;
return ans;
}
}
===== Proposal final work =====
__**Proposal A:**__
//The speedup of matrix computation//
The proposal is to create a package with a function to possibly speed up simple matrix multiplications by utilizing the rcpp package1.
1 http://dirk.eddelbuettel.com/code/rcpp.html
**__Proposta B:__**
//The speedup of matrix computation in combination with a benchmarking function//
Same as A but in addition a function will be written that calls the cpp function of A with different matrix sizes and compares the execution time with the standard library version.
==== Comentários da proposta (Leo) ====
Apparently you are very familiar with such matrix analysis, but be careful not to make things too hard for you. Perhaps it is better to stick with a function, and not with a whole package. :-)
I missed more information about the input, possible arguments (such as what kind of matrix operation the user wishes to perform), and outputs (of course the output is not just a faster analysis, but the resut of the matrix operation in the apropriate format).
==== Comentários da proposta (Diogo) ====
This is a nice thing to learn, especially if you mess with complicated (or slow) analysis. But improving on the basic matrix operations can be very challenging. They already are implemented in C or FORTRAN. If you plan to use threading inside C++, even getting a linear n core n speed up on matrix multiplication is rather challenging (I know it took me a while...).
If you're good with pointers and subtle allocation problems, go ahead! But this will be much more a C++ project than an R project.