Thursday, December 24, 2015

Calculating AUC using cubic spline interpolation or trapezoid rule

/************************************************************************
    PURPOSE:
    This program uses PROC EXPAND to calculate the approximate area under
    the curve for some sample data.  The sample data should consist of
    (x,y) pairs.

   DETAILS:
    For this example, the sample data is generated from a high degree
    polynomial. PROC EXPAND is then used to compute the approximate area
    under the curve using each of the following methods:

       a.  Cubic Spline interpolation.
       b.  Trapezoid rule.

    The exact area, given by the definite integral, is calculated
    for the polynomial curve in order to assess the precision of the
    approximations.


************************************************************************/

%let lower=-2;
%let upper=1;
%let interval=0.2;

* generate some data according to a high order polynomial;
data kvm;
 do x=&lower to &upper by &interval;
  y=15+(x-2)*(x-1.5)*(x-1)*(x-.5)*x*(x+.5)*(x+1)*(x+1.5)*(x+2);
  output;
 end;

proc sort;
 by x;

/* PROC EXPAND will include a contribution for the last interval.  For
   an accurate approximation to the integral, we need to make sure that
   this last contribution is negligible.  So we'll append an additional
   x value which is extremely close to the last x value.  Of course, the
   two Y values will be identical.  But the result is that the last
   interval is extremely short, so any contribution to the integral
   approximation is negligible.
*/

data one;
 set kvm end=eof;
 output;
 if eof then do;
   x=x+(1e-10);
   output;
 end;
run;

proc print data=one(obs=50);
 title 'First few observations of the original data';
run;

proc gplot data=one;
 title 'original series';
 plot y*x;
run;

proc expand data=one out=three method=spline ;
 convert y=total/observed=(beginning,total) transformout=(sum);
 id x;
run;

No comments:

Post a Comment