.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.43) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "GLM 3pm" .TH GLM 3pm "2022-12-17" "perl v5.36.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" PDL::Stats::GLM \-\- general and generalized linear modeling methods such as ANOVA, linear regression, PCA, and logistic regression. .SH "DESCRIPTION" .IX Header "DESCRIPTION" The terms \s-1FUNCTIONS\s0 and \s-1METHODS\s0 are arbitrarily used to refer to methods that are threadable and methods that are \s-1NOT\s0 threadable, respectively. \s-1FUNCTIONS\s0 except \fBols_t\fR support bad value. \fBPDL::Slatec\fR strongly recommended for most \s-1METHODS,\s0 and it is required for \fBlogistic\fR. .PP P\-values, where appropriate, are provided if \s-1PDL::GSL::CDF\s0 is installed. .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 3 \& use PDL::LiteF; \& use PDL::NiceSlice; \& use PDL::Stats::GLM; \& \& # do a multiple linear regression and plot the residuals \& \& my $y = pdl( 8, 7, 7, 0, 2, 5, 0 ); \& \& my $x = pdl( [ 0, 1, 2, 3, 4, 5, 6 ], # linear component \& [ 0, 1, 4, 9, 16, 25, 36 ] ); # quadratic component \& \& my %m = $y\->ols( $x, {plot=>1} ); \& \& print "$_\et$m{$_}\en" for (sort keys %m); .Ve .SH "FUNCTIONS" .IX Header "FUNCTIONS" .SS "fill_m" .IX Subsection "fill_m" .Vb 1 \& Signature: (a(n); float+ [o]b(n)) .Ve .PP Replaces bad values with sample mean. Mean is set to 0 if all obs are bad. Can be done inplace. .PP .Vb 5 \& perldl> p $data \& [ \& [ 5 BAD 2 BAD] \& [ 7 3 7 BAD] \& ] \& \& perldl> p $data\->fill_m \& [ \& [ 5 3.5 2 3.5] \& [ 7 3 7 5.66667] \& ] .Ve .PP The output pdl badflag is cleared. .SS "fill_rand" .IX Subsection "fill_rand" .Vb 1 \& Signature: (a(n); [o]b(n)) .Ve .PP Replaces bad values with random sample (with replacement) of good observations from the same variable. Can be done inplace. .PP .Vb 5 \& perldl> p $data \& [ \& [ 5 BAD 2 BAD] \& [ 7 3 7 BAD] \& ] \& \& perldl> p $data\->fill_rand \& \& [ \& [5 2 2 5] \& [7 3 7 7] \& ] .Ve .PP The output pdl badflag is cleared. .SS "dev_m" .IX Subsection "dev_m" .Vb 1 \& Signature: (a(n); float+ [o]b(n)) .Ve .PP Replaces values with deviations from the mean. Can be done inplace. .PP dev_m processes bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "stddz" .IX Subsection "stddz" .Vb 1 \& Signature: (a(n); float+ [o]b(n)) .Ve .PP Standardize ie replace values with z_scores based on sample standard deviation from the mean (replace with 0s if stdv==0). Can be done inplace. .PP stddz processes bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "sse" .IX Subsection "sse" .Vb 1 \& Signature: (a(n); b(n); float+ [o]c()) .Ve .PP Sum of squared errors between actual and predicted values. .PP sse processes bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "mse" .IX Subsection "mse" .Vb 1 \& Signature: (a(n); b(n); float+ [o]c()) .Ve .PP Mean of squared errors between actual and predicted values, ie variance around predicted value. .PP mse processes bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "rmse" .IX Subsection "rmse" .Vb 1 \& Signature: (a(n); b(n); float+ [o]c()) .Ve .PP Root mean squared error, ie stdv around predicted value. .PP rmse processes bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "pred_logistic" .IX Subsection "pred_logistic" .Vb 1 \& Signature: (a(n,m); b(m); float+ [o]c(n)) .Ve .PP Calculates predicted prob value for logistic regression. .PP .Vb 1 \& # glue constant then apply coeff returned by the logistic method \& \& $pred = $x\->glue(1,ones($x\->dim(0)))\->pred_logistic( $m{b} ); .Ve .PP pred_logistic processes bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "d0" .IX Subsection "d0" .Vb 1 \& Signature: (a(n); float+ [o]c()) .Ve .PP .Vb 1 \& my $d0 = $y\->d0(); .Ve .PP Null deviance for logistic regression. .PP d0 processes bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "dm" .IX Subsection "dm" .Vb 1 \& Signature: (a(n); b(n); float+ [o]c()) .Ve .PP .Vb 1 \& my $dm = $y\->dm( $y_pred ); \& \& # null deviance \& my $d0 = $y\->dm( ones($y\->nelem) * $y\->avg ); .Ve .PP Model deviance for logistic regression. .PP dm processes bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "dvrs" .IX Subsection "dvrs" .Vb 1 \& Signature: (a(); b(); float+ [o]c()) .Ve .PP Deviance residual for logistic regression. .PP dvrs processes bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "ols_t" .IX Subsection "ols_t" Threaded version of ordinary least squares regression (\fBols\fR). The price of threading was losing significance tests for coefficients (but see \fBr2_change\fR). The fitting function was shamelessly copied then modified from PDL::Fit::Linfit. Uses PDL::Slatec when possible but otherwise uses PDL::MatrixOps. Intercept is \s-1LAST\s0 of coeff if \s-1CONST\s0 => 1. .PP ols_t does not handle bad values. consider \fBfill_m\fR or \fBfill_rand\fR if there are bad values. .PP Default options (case insensitive): .PP .Vb 1 \& CONST => 1, .Ve .PP Usage: .PP .Vb 2 \& # DV, 2 person\*(Aqs ratings for top\-10 box office movies \& # ascending sorted by box office numbers \& \& perldl> p $y = qsort ceil( random(10, 2)*5 ) \& [ \& [1 1 2 4 4 4 4 5 5 5] \& [1 2 2 2 3 3 3 3 5 5] \& ] \& \& # model with 2 IVs, a linear and a quadratic trend component \& \& perldl> $x = cat sequence(10), sequence(10)**2 \& \& # suppose our novice modeler thinks this creates 3 different models \& # for predicting movie ratings \& \& perldl> p $x = cat $x, $x * 2, $x * 3 \& [ \& [ \& [ 0 1 2 3 4 5 6 7 8 9] \& [ 0 1 4 9 16 25 36 49 64 81] \& ] \& [ \& [ 0 2 4 6 8 10 12 14 16 18] \& [ 0 2 8 18 32 50 72 98 128 162] \& ] \& [ \& [ 0 3 6 9 12 15 18 21 24 27] \& [ 0 3 12 27 48 75 108 147 192 243] \& ] \& ] \& \& perldl> p $x\->info \& PDL: Double D [10,2,3] \& \& # insert a dummy dim between IV and the dim (model) to be threaded \& \& perldl> %m = $y\->ols_t( $x\->dummy(2) ) \& \& perldl> p "$_\et$m{$_}\en" for (sort keys %m) \& \& # 2 persons\*(Aq ratings, eached fitted with 3 "different" models \& \& F \& [ \& [ 38.314159 25.087209] \& [ 38.314159 25.087209] \& [ 38.314159 25.087209] \& ] \& \& # df is the same across dv and iv models \& \& F_df [2 7] \& F_p \& [ \& [0.00016967051 0.00064215074] \& [0.00016967051 0.00064215074] \& [0.00016967051 0.00064215074] \& ] \& \& R2 \& [ \& [ 0.9162963 0.87756762] \& [ 0.9162963 0.87756762] \& [ 0.9162963 0.87756762] \& ] \& \& b \& [ # linear quadratic constant \& [ \& [ 0.99015152 \-0.056818182 0.66363636] # person 1 \& [ 0.18939394 0.022727273 1.4] # person 2 \& ] \& [ \& [ 0.49507576 \-0.028409091 0.66363636] \& [ 0.09469697 0.011363636 1.4] \& ] \& [ \& [ 0.33005051 \-0.018939394 0.66363636] \& [ 0.063131313 0.0075757576 1.4] \& ] \& ] \& \& # our novice modeler realizes at this point that \& # the 3 models only differ in the scaling of the IV coefficients \& \& ss_model \& [ \& [ 20.616667 13.075758] \& [ 20.616667 13.075758] \& [ 20.616667 13.075758] \& ] \& \& ss_residual \& [ \& [ 1.8833333 1.8242424] \& [ 1.8833333 1.8242424] \& [ 1.8833333 1.8242424] \& ] \& \& ss_total [22.5 14.9] \& y_pred \& [ \& [ \& [0.66363636 1.5969697 2.4166667 3.1227273 ... 4.9727273] \& ... .Ve .SS "r2_change" .IX Subsection "r2_change" Significance test for the incremental change in R2 when new variable(s) are added to an ols regression model. Returns the change stats as well as stats for both models. Based on \fBols_t\fR. (One way to make up for the lack of significance tests for coeffs in ols_t). .PP Default options (case insensitive): .PP .Vb 1 \& CONST => 1, .Ve .PP Usage: .PP .Vb 2 \& # suppose these are two persons\*(Aq ratings for top 10 box office movies \& # ascending sorted by box office \& \& perldl> p $y = qsort ceil(random(10, 2) * 5) \& [ \& [1 1 2 2 2 3 4 4 4 4] \& [1 2 2 3 3 3 4 4 5 5] \& ] \& \& # first IV is a simple linear trend \& \& perldl> p $x1 = sequence 10 \& [0 1 2 3 4 5 6 7 8 9] \& \& # the modeler wonders if adding a quadratic trend improves the fit \& \& perldl> p $x2 = sequence(10) ** 2 \& [0 1 4 9 16 25 36 49 64 81] \& \& # two difference models are given in two pdls \& # each as would be pass on to ols_t \& # the 1st model includes only linear trend \& # the 2nd model includes linear and quadratic trends \& # when necessary use dummy dim so both models have the same ndims \& \& perldl> %c = $y\->r2_change( $x1\->dummy(1), cat($x1, $x2) ) \& \& perldl> p "$_\et$c{$_}\en" for (sort keys %c) \& # person 1 person 2 \& F_change [0.72164948 0.071283096] \& # df same for both persons \& F_df [1 7] \& F_p [0.42370145 0.79717232] \& R2_change [0.0085966043 0.00048562549] \& model0 HASH(0x8c10828) \& model1 HASH(0x8c135c8) \& \& # the answer here is no. .Ve .SH "METHODS" .IX Header "METHODS" .SS "anova" .IX Subsection "anova" Analysis of variance. Uses type \s-1III\s0 sum of squares for unbalanced data. .PP Dependent variable should be a 1D pdl. Independent variables can be passed as 1D perl array ref or 1D pdl. .PP Will only calculate p\-value (\f(CW\*(C`F_p\*(C'\fR) if there are more samples than the product of categories of all the IVs. .PP Supports bad value (by ignoring missing or \s-1BAD\s0 values in dependent and independent variables list-wise). .PP Default options (case insensitive): .PP .Vb 5 \& V => 1, # carps if bad value in variables \& IVNM => [], # auto filled as [\*(AqIV_0\*(Aq, \*(AqIV_1\*(Aq, ... ] \& PLOT => 0, # plots highest order effect \& # can set plot_means options here \& WIN => undef, # for plotting .Ve .PP Usage: .PP .Vb 1 \& # suppose this is ratings for 12 apples \& \& perldl> p $y = qsort ceil( random(12)*5 ) \& [1 1 2 2 2 3 3 4 4 4 5 5] \& \& # IV for types of apple \& \& perldl> p $a = sequence(12) % 3 + 1 \& [1 2 3 1 2 3 1 2 3 1 2 3] \& \& # IV for whether we baked the apple \& \& perldl> @b = qw( y y y y y y n n n n n n ) \& \& perldl> %m = $y\->anova( $a, \e@b, { IVNM=>[\*(Aqapple\*(Aq, \*(Aqbake\*(Aq] } ) \& \& perldl> p "$_\et$m{$_}\en" for (sort keys %m) \& # apple # m \& [ \& [2.5 3 3.5] \& ] \& \& # apple # se \& [ \& [0.64549722 0.91287093 0.64549722] \& ] \& \& # apple ~ bake # m \& [ \& [1.5 1.5 2.5] \& [3.5 4.5 4.5] \& ] \& \& # apple ~ bake # se \& [ \& [0.5 0.5 0.5] \& [0.5 0.5 0.5] \& ] \& \& # bake # m \& [ \& [ 1.8333333 4.1666667] \& ] \& \& # bake # se \& [ \& [0.30731815 0.30731815] \& ] \& \& F 7.6 \& F_df [5 6] \& F_p 0.0141586545851857 \& ms_model 3.8 \& ms_residual 0.5 \& ss_model 19 \& ss_residual 3 \& ss_total 22 \& | apple | F 2 \& | apple | F_df [2 6] \& | apple | F_p 0.216 \& | apple | ms 1 \& | apple | ss 2 \& | apple ~ bake | F 0.666666666666667 \& | apple ~ bake | F_df [2 6] \& | apple ~ bake | F_p 0.54770848985725 \& | apple ~ bake | ms 0.333333333333334 \& | apple ~ bake | ss 0.666666666666667 \& | bake | F 32.6666666666667 \& | bake | F_df [1 6] \& | bake | F_p 0.00124263849516693 \& | bake | ms 16.3333333333333 \& | bake | ss 16.3333333333333 .Ve .SS "anova_rptd" .IX Subsection "anova_rptd" Repeated measures and mixed model anova. Uses type \s-1III\s0 sum of squares. The standard error (se) for the means are based on the relevant mean squared error from the anova, ie it is pooled across levels of the effect. .PP Will only calculate p\-value (\f(CW\*(C`F_p\*(C'\fR) if there are more samples than the product of categories of all the IVs. .PP anova_rptd supports bad value in the dependent and independent variables. It automatically removes bad data listwise, ie remove a subject's data if there is any cell missing for the subject. .PP Default options (case insensitive): .PP .Vb 6 \& V => 1, # carps if bad value in dv \& IVNM => [], # auto filled as [\*(AqIV_0\*(Aq, \*(AqIV_1\*(Aq, ... ] \& BTWN => [], # indices of between\-subject IVs (matches IVNM indices) \& PLOT => 0, # plots highest order effect \& # see plot_means() for more options \& WIN => undef, # for plotting .Ve .PP Usage: .PP .Vb 1 \& Some fictional data: recall_w_beer_and_wings.txt \& \& Subject Beer Wings Recall \& Alex 1 1 8 \& Alex 1 2 9 \& Alex 1 3 12 \& Alex 2 1 7 \& Alex 2 2 9 \& Alex 2 3 12 \& Brian 1 1 12 \& Brian 1 2 13 \& Brian 1 3 14 \& Brian 2 1 9 \& Brian 2 2 8 \& Brian 2 3 14 \& ... \& \& # rtable allows text only in 1st row and col \& my ($data, $idv, $subj) = rtable \*(Aqrecall_w_beer_and_wings.txt\*(Aq; \& \& my ($b, $w, $dv) = $data\->dog; \& # subj and IVs can be 1d pdl or @ ref \& # subj must be the first argument \& my %m = $dv\->anova_rptd( $subj, $b, $w, {ivnm=>[\*(AqBeer\*(Aq, \*(AqWings\*(Aq]} ); \& \& print "$_\et$m{$_}\en" for (sort keys %m); \& \& # Beer # m \& [ \& [ 10.916667 8.9166667] \& ] \& \& # Beer # se \& [ \& [ 0.4614791 0.4614791] \& ] \& \& # Beer ~ Wings # m \& [ \& [ 10 7] \& [ 10.5 9.25] \& [12.25 10.5] \& ] \& \& # Beer ~ Wings # se \& [ \& [0.89170561 0.89170561] \& [0.89170561 0.89170561] \& [0.89170561 0.89170561] \& ] \& \& # Wings # m \& [ \& [ 8.5 9.875 11.375] \& ] \& \& # Wings # se \& [ \& [0.67571978 0.67571978 0.67571978] \& ] \& \& ss_residual 19.0833333333333 \& ss_subject 24.8333333333333 \& ss_total 133.833333333333 \& | Beer | F 9.39130434782609 \& | Beer | F_p 0.0547977008378944 \& | Beer | df 1 \& | Beer | ms 24 \& | Beer | ss 24 \& | Beer || err df 3 \& | Beer || err ms 2.55555555555556 \& | Beer || err ss 7.66666666666667 \& | Beer ~ Wings | F 0.510917030567687 \& | Beer ~ Wings | F_p 0.623881438624431 \& | Beer ~ Wings | df 2 \& | Beer ~ Wings | ms 1.625 \& | Beer ~ Wings | ss 3.25000000000001 \& | Beer ~ Wings || err df 6 \& | Beer ~ Wings || err ms 3.18055555555555 \& | Beer ~ Wings || err ss 19.0833333333333 \& | Wings | F 4.52851711026616 \& | Wings | F_p 0.0632754786153548 \& | Wings | df 2 \& | Wings | ms 16.5416666666667 \& | Wings | ss 33.0833333333333 \& | Wings || err df 6 \& | Wings || err ms 3.65277777777778 \& | Wings || err ss 21.9166666666667 .Ve .PP For mixed model anova, ie when there are between-subject IVs involved, feed the IVs as above, but specify in \s-1BTWN\s0 which IVs are between-subject. For example, if we had added age as a between-subject \s-1IV\s0 in the above example, we would do .PP .Vb 2 \& my %m = $dv\->anova_rptd( $subj, $age, $b, $w, \& { ivnm=>[\*(AqAge\*(Aq, \*(AqBeer\*(Aq, \*(AqWings\*(Aq], btwn=>[0] }); .Ve .SS "dummy_code" .IX Subsection "dummy_code" Dummy coding of nominal variable (perl @ ref or 1d pdl) for use in regression. .PP Supports \s-1BAD\s0 value (missing or '\s-1BAD\s0' values result in the corresponding pdl elements being marked as \s-1BAD\s0). .PP .Vb 6 \& perldl> @a = qw(a a a b b b c c c) \& perldl> p $a = dummy_code(\e@a) \& [ \& [1 1 1 0 0 0 0 0 0] \& [0 0 0 1 1 1 0 0 0] \& ] .Ve .SS "effect_code" .IX Subsection "effect_code" Unweighted effect coding of nominal variable (perl @ ref or 1d pdl) for use in regression. returns in @ context coded pdl and % ref to level \- pdl\->\fBdim\fR\|(1) index. .PP Supports \s-1BAD\s0 value (missing or '\s-1BAD\s0' values result in the corresponding pdl elements being marked as \s-1BAD\s0). .PP .Vb 2 \& my @var = qw( a a a b b b c c c ); \& my ($var_e, $map) = effect_code( \e@var ); \& \& print $var_e . $var_e\->info . "\en"; \& \& [ \& [ 1 1 1 0 0 0 \-1 \-1 \-1] \& [ 0 0 0 1 1 1 \-1 \-1 \-1] \& ] \& PDL: Double D [9,2] \& \& print "$_\et$map\->{$_}\en" for (sort keys %$map) \& a 0 \& b 1 \& c 2 .Ve .SS "effect_code_w" .IX Subsection "effect_code_w" Weighted effect code for nominal variable. returns in @ context coded pdl and % ref to level \- pdl\->\fBdim\fR\|(1) index. .PP Supports \s-1BAD\s0 value (missing or '\s-1BAD\s0' values result in the corresponding pdl elements being marked as \s-1BAD\s0). .PP .Vb 6 \& perldl> @a = qw( a a b b b c c ) \& perldl> p $a = effect_code_w(\e@a) \& [ \& [ 1 1 0 0 0 \-1 \-1] \& [ 0 0 1 1 1 \-1.5 \-1.5] \& ] .Ve .SS "interaction_code" .IX Subsection "interaction_code" Returns the coded interaction term for effect-coded variables. .PP Supports \s-1BAD\s0 value (missing or '\s-1BAD\s0' values result in the corresponding pdl elements being marked as \s-1BAD\s0). .PP .Vb 5 \& perldl> $a = sequence(6) > 2 \& perldl> p $a = $a\->effect_code \& [ \& [ 1 1 1 \-1 \-1 \-1] \& ] \& \& perldl> $b = pdl( qw( 0 1 2 0 1 2 ) ) \& perldl> p $b = $b\->effect_code \& [ \& [ 1 0 \-1 1 0 \-1] \& [ 0 1 \-1 0 1 \-1] \& ] \& \& perldl> p $ab = interaction_code( $a, $b ) \& [ \& [ 1 0 \-1 \-1 \-0 1] \& [ 0 1 \-1 \-0 \-1 1] \& ] .Ve .SS "ols" .IX Subsection "ols" Ordinary least squares regression, aka linear regression. Unlike \fBols_t\fR, ols is not threadable, but it can handle bad value (by ignoring observations with bad value in dependent or independent variables list-wise) and returns the full model in list context with various stats. .PP IVs ($x) should be pdl dims \f(CW$y\fR\->nelem or \f(CW$y\fR\->nelem x n_iv. Do not supply the constant vector in \f(CW$x\fR. Intercept is automatically added and returned as \s-1LAST\s0 of the coeffs if CONST=>1. Returns full model in list context and coeff in scalar context. .PP Default options (case insensitive): .PP .Vb 3 \& CONST => 1, \& PLOT => 0, # see plot_residuals() for plot options \& WIN => undef, # for plotting .Ve .PP Usage: .PP .Vb 2 \& # suppose this is a person\*(Aqs ratings for top 10 box office movies \& # ascending sorted by box office \& \& perldl> p $y = qsort ceil( random(10) * 5 ) \& [1 1 2 2 2 2 4 4 5 5] \& \& # construct IV with linear and quadratic component \& \& perldl> p $x = cat sequence(10), sequence(10)**2 \& [ \& [ 0 1 2 3 4 5 6 7 8 9] \& [ 0 1 4 9 16 25 36 49 64 81] \& ] \& \& perldl> %m = $y\->ols( $x ) \& \& perldl> p "$_\et$m{$_}\en" for (sort keys %m) \& \& F 40.4225352112676 \& F_df [2 7] \& F_p 0.000142834216344756 \& R2 0.920314253647587 \& \& # coeff linear quadratic constant \& \& b [0.21212121 0.03030303 0.98181818] \& b_p [0.32800118 0.20303404 0.039910509] \& b_se [0.20174693 0.021579989 0.38987581] \& b_t [ 1.0514223 1.404219 2.5182844] \& ss_model 19.8787878787879 \& ss_residual 1.72121212121212 \& ss_total 21.6 \& y_pred [0.98181818 1.2242424 1.5272727 ... 4.6181818 5.3454545] .Ve .SS "ols_rptd" .IX Subsection "ols_rptd" Repeated measures linear regression (Lorch & Myers, 1990; Van den Noortgate & Onghena, 2006). Handles purely within-subject design for now. See t/stats_ols_rptd.t for an example using the Lorch and Myers data. .PP Usage: .PP .Vb 6 \& # This is the example from Lorch and Myers (1990), \& # a study on how characteristics of sentences affected reading time \& # Three within\-subject IVs: \& # SP \-\- serial position of sentence \& # WORDS \-\- number of words in sentence \& # NEW \-\- number of new arguments in sentence \& \& # $subj can be 1D pdl or @ ref and must be the first argument \& # IV can be 1D @ ref or pdl \& # 1D @ ref is effect coded internally into pdl \& # pdl is left as is \& \& my %r = $rt\->ols_rptd( $subj, $sp, $words, $new ); \& \& print "$_\et$r{$_}\en" for (sort keys %r); \& \& (ss_residual) 58.3754646504336 \& (ss_subject) 51.8590337714286 \& (ss_total) 405.188241771429 \& # SP WORDS NEW \& F [ 7.208473 61.354153 1.0243311] \& F_p [0.025006181 2.619081e\-05 0.33792837] \& coeff [0.33337285 0.45858933 0.15162986] \& df [1 1 1] \& df_err [9 9 9] \& ms [ 18.450705 73.813294 0.57026483] \& ms_err [ 2.5595857 1.2030692 0.55671923] \& ss [ 18.450705 73.813294 0.57026483] \& ss_err [ 23.036272 10.827623 5.0104731] .Ve .SS "logistic" .IX Subsection "logistic" Logistic regression with maximum likelihood estimation using PDL::Fit::LM (requires PDL::Slatec. Hence loaded with \*(L"require\*(R" in the sub instead of \*(L"use\*(R" at the beginning). .PP IVs ($x) should be pdl dims \f(CW$y\fR\->nelem or \f(CW$y\fR\->nelem x n_iv. Do not supply the constant vector in \f(CW$x\fR. It is included in the model and returned as \s-1LAST\s0 of coeff. Returns full model in list context and coeff in scalar context. .PP The significance tests are likelihood ratio tests (\-2LL deviance) tests. \s-1IV\s0 significance is tested by comparing deviances between the reduced model (ie with the \s-1IV\s0 in question removed) and the full model. .PP ***NOTE: the results here are qualitatively similar to but not identical with results from R, because different algorithms are used for the nonlinear parameter fit. Use with discretion*** .PP Default options (case insensitive): .PP .Vb 3 \& INITP => zeroes( $x\->dim(1) + 1 ), # n_iv + 1 \& MAXIT => 1000, \& EPS => 1e\-7, .Ve .PP Usage: .PP .Vb 1 \& # suppose this is whether a person had rented 10 movies \& \& perldl> p $y = ushort( random(10)*2 ) \& [0 0 0 1 1 0 0 1 1 1] \& \& # IV 1 is box office ranking \& \& perldl> p $x1 = sequence(10) \& [0 1 2 3 4 5 6 7 8 9] \& \& # IV 2 is whether the movie is action\- or chick\-flick \& \& perldl> p $x2 = sequence(10) % 2 \& [0 1 0 1 0 1 0 1 0 1] \& \& # concatenate the IVs together \& \& perldl> p $x = cat $x1, $x2 \& [ \& [0 1 2 3 4 5 6 7 8 9] \& [0 1 0 1 0 1 0 1 0 1] \& ] \& \& perldl> %m = $y\->logistic( $x ) \& \& perldl> p "$_\et$m{$_}\en" for (sort keys %m) \& \& D0 13.8629436111989 \& Dm 9.8627829791575 \& Dm_chisq 4.00016063204141 \& Dm_df 2 \& Dm_p 0.135324414081692 \& # ranking genre constant \& b [0.41127706 0.53876358 \-2.1201285] \& b_chisq [ 3.5974504 0.16835559 2.8577151] \& b_p [0.057868258 0.6815774 0.090936587] \& iter 12 \& y_pred [0.10715577 0.23683909 ... 0.76316091 0.89284423] \& \& # to get the covariance out, supply a true value for the COV option: \& perldl> %m = $y\->logistic( $x, {COV=>1} ) \& perldl> p $m{cov}; .Ve .SS "pca" .IX Subsection "pca" Principal component analysis. Based on corr instead of cov (bad values are ignored pair-wise. \s-1OK\s0 when bad values are few but otherwise probably should fill_m etc before pca). Use \fBPDL::Slatec::eigsys()\fR if installed, otherwise use \fBPDL::MatrixOps::eigens_sym()\fR. .PP Default options (case insensitive): .PP .Vb 4 \& CORR => 1, # boolean. use correlation or covariance \& PLOT => 0, # calls plot_screes by default \& # can set plot_screes options here \& WIN => undef, # for plotting .Ve .PP Usage: .PP .Vb 3 \& my $d = qsort random 10, 5; # 10 obs on 5 variables \& my %r = $d\->pca( \e%opt ); \& print "$_\et$r{$_}\en" for (keys %r); \& \& eigenvalue # variance accounted for by each component \& [4.70192 0.199604 0.0471421 0.0372981 0.0140346] \& \& eigenvector # dim var x comp. weights for mapping variables to component \& [ \& [ \-0.451251 \-0.440696 \-0.457628 \-0.451491 \-0.434618] \& [ \-0.274551 0.582455 0.131494 0.255261 \-0.709168] \& [ 0.43282 0.500662 \-0.139209 \-0.735144 \-0.0467834] \& [ 0.693634 \-0.428171 0.125114 0.128145 \-0.550879] \& [ 0.229202 0.180393 \-0.859217 0.4173 0.0503155] \& ] \& \& loadings # dim var x comp. correlation between variable and component \& [ \& [ \-0.978489 \-0.955601 \-0.992316 \-0.97901 \-0.942421] \& [ \-0.122661 0.260224 0.0587476 0.114043 \-0.316836] \& [ 0.0939749 0.108705 \-0.0302253 \-0.159616 \-0.0101577] \& [ 0.13396 \-0.0826915 0.0241629 0.0247483 \-0.10639] \& [ 0.027153 0.0213708 \-0.101789 0.0494365 0.00596076] \& ] \& \& pct_var # percent variance accounted for by each component \& [0.940384 0.0399209 0.00942842 0.00745963 0.00280691] .Ve .PP Plot scores along the first two components, .PP .Vb 1 \& $d\->plot_scores( $r{eigenvector} ); .Ve .SS "pca_sorti" .IX Subsection "pca_sorti" Determine by which vars a component is best represented. Descending sort vars by size of association with that component. Returns sorted var and relevant component indices. .PP Default options (case insensitive): .PP .Vb 1 \& NCOMP => 10, # maximum number of components to consider .Ve .PP Usage: .PP .Vb 2 \& # let\*(Aqs see if we replicated the Osgood et al. (1957) study \& perldl> ($data, $idv, $ido) = rtable \*(Aqosgood_exp.csv\*(Aq, {v=>0} \& \& # select a subset of var to do pca \& perldl> $ind = which_id $idv, [qw( ACTIVE BASS BRIGHT CALM FAST GOOD HAPPY HARD LARGE HEAVY )] \& perldl> $data = $data( ,$ind)\->sever \& perldl> @$idv = @$idv[list $ind] \& \& perldl> %m = $data\->pca \& \& perldl> ($iv, $ic) = $m{loadings}\->pca_sorti() \& \& perldl> p "$idv\->[$_]\et" . $m{loadings}\->($_,$ic)\->flat . "\en" for (list $iv) \& \& # COMP0 COMP1 COMP2 COMP3 \& HAPPY [0.860191 0.364911 0.174372 \-0.10484] \& GOOD [0.848694 0.303652 0.198378 \-0.115177] \& CALM [0.821177 \-0.130542 0.396215 \-0.125368] \& BRIGHT [0.78303 0.232808 \-0.0534081 \-0.0528796] \& HEAVY [\-0.623036 0.454826 0.50447 0.073007] \& HARD [\-0.679179 0.0505568 0.384467 0.165608] \& ACTIVE [\-0.161098 0.760778 \-0.44893 \-0.0888592] \& FAST [\-0.196042 0.71479 \-0.471355 0.00460276] \& LARGE [\-0.241994 0.594644 0.634703 \-0.00618055] \& BASS [\-0.621213 \-0.124918 0.0605367 \-0.765184] .Ve .SS "plot_means" .IX Subsection "plot_means" Plots means anova style. Can handle up to 4\-way interactions (ie 4D pdl). .PP Default options (case insensitive): .PP .Vb 10 \& IVNM => [\*(AqIV_0\*(Aq, \*(AqIV_1\*(Aq, \*(AqIV_2\*(Aq, \*(AqIV_3\*(Aq], \& DVNM => \*(AqDV\*(Aq, \& AUTO => 1, # auto set dims to be on x\-axis, line, panel \& # if set 0, dim 0 goes on x\-axis, dim 1 as lines \& # dim 2+ as panels \& # see PDL::Graphics::PGPLOT::Window for next options \& WIN => undef, # pgwin object. not closed here if passed \& # allows comparing multiple lines in same plot \& # set env before passing WIN \& DEV => \*(Aq/xs\*(Aq, # open and close dev for plotting if no WIN \& # defaults to \*(Aq/png\*(Aq in Windows \& SIZE => 640, # individual square panel size in pixels \& SYMBL => [0, 4, 7, 11], .Ve .PP Usage: .PP .Vb 2 \& # see anova for mean / se pdl structure \& $mean\->plot_means( $se, {IVNM=>[\*(Aqapple\*(Aq, \*(Aqbake\*(Aq]} ); .Ve .PP Or like this: .PP .Vb 1 \& $m{\*(Aq# apple ~ bake # m\*(Aq}\->plot_means; .Ve .SS "plot_residuals" .IX Subsection "plot_residuals" Plots residuals against predicted values. .PP Usage: .PP .Vb 1 \& $y\->plot_residuals( $y_pred, { dev=>\*(Aq/png\*(Aq } ); .Ve .PP Default options (case insensitive): .PP .Vb 8 \& # see PDL::Graphics::PGPLOT::Window for more info \& WIN => undef, # pgwin object. not closed here if passed \& # allows comparing multiple lines in same plot \& # set env before passing WIN \& DEV => \*(Aq/xs\*(Aq, # open and close dev for plotting if no WIN \& # defaults to \*(Aq/png\*(Aq in Windows \& SIZE => 640, # plot size in pixels \& COLOR => 1, .Ve .SS "plot_scores" .IX Subsection "plot_scores" Plots standardized original and \s-1PCA\s0 transformed scores against two components. (Thank you, Bob MacCallum, for the documentation suggestion that led to this function.) .PP Default options (case insensitive): .PP .Vb 10 \& CORR => 1, # boolean. PCA was based on correlation or covariance \& COMP => [0,1], # indices to components to plot \& # see PDL::Graphics::PGPLOT::Window for next options \& WIN => undef, # pgwin object. not closed here if passed \& # allows comparing multiple lines in same plot \& # set env before passing WIN \& DEV => \*(Aq/xs\*(Aq, # open and close dev for plotting if no WIN \& # defaults to \*(Aq/png\*(Aq in Windows \& SIZE => 640, # plot size in pixels \& COLOR => [1,2], # color for original and rotated scores .Ve .PP Usage: .PP .Vb 2 \& my %p = $data\->pca(); \& $data\->plot_scores( $p{eigenvector}, \e%opt ); .Ve .SS "plot_screes" .IX Subsection "plot_screes" Scree plot. Plots proportion of variance accounted for by \s-1PCA\s0 components. .PP Default options (case insensitive): .PP .Vb 11 \& NCOMP => 20, # max number of components to plot \& CUT => 0, # set to plot cutoff line after this many components \& # undef to plot suggested cutoff line for NCOMP comps \& # see PDL::Graphics::PGPLOT::Window for next options \& WIN => undef, # pgwin object. not closed here if passed \& # allows comparing multiple lines in same plot \& # set env before passing WIN \& DEV => \*(Aq/xs\*(Aq, # open and close dev for plotting if no WIN \& # defaults to \*(Aq/png\*(Aq in Windows \& SIZE => 640, # plot size in pixels \& COLOR => 1, .Ve .PP Usage: .PP .Vb 1 \& # variance should be in descending order \& \& $pca{var}\->plot_screes( {ncomp=>16} ); .Ve .PP Or, because \s-1NCOMP\s0 is used so often, it is allowed a shortcut, .PP .Vb 1 \& $pca{var}\->plot_screes( 16 ); .Ve .SH "SEE ALSO" .IX Header "SEE ALSO" PDL::Fit::Linfit .PP PDL::Fit::LM .SH "REFERENCES" .IX Header "REFERENCES" Cohen, J., Cohen, P., West, S.G., & Aiken, L.S. (2003). Applied Multiple Regression/correlation Analysis for the Behavioral Sciences (3rd ed.). Mahwah, \s-1NJ:\s0 Lawrence Erlbaum Associates Publishers. .PP Hosmer, D.W., & Lemeshow, S. (2000). Applied Logistic Regression (2nd ed.). New York, \s-1NY:\s0 Wiley-Interscience. .PP Lorch, R.F., & Myers, J.L. (1990). Regression analyses of repeated measures data in cognitive research. Journal of Experimental Psychology: Learning, Memory, & Cognition, 16, 149\-157. .PP Osgood C.E., Suci, G.J., & Tannenbaum, P.H. (1957). The Measurement of Meaning. Champaign, \s-1IL:\s0 University of Illinois Press. .PP Rutherford, A. (2001). Introducing Anova and Ancova: A \s-1GLM\s0 Approach (1st ed.). Thousand Oaks, \s-1CA:\s0 Sage Publications. .PP Shlens, J. (2009). A Tutorial on Principal Component Analysis. Retrieved April 10, 2011 from http://citeseerx.ist.psu.edu/ .PP The \s-1GLM\s0 procedure: unbalanced \s-1ANOVA\s0 for two-way design with interaction. (2008). \s-1SAS/STAT\s0(R) 9.2 User's Guide. Retrieved June 18, 2009 from http://support.sas.com/ .PP Van den Noortgatea, W., & Onghenaa, P. (2006). Analysing repeated measures data in cognitive research: A comment on regression coefficient analyses. European Journal of Cognitive Psychology, 18, 937\-952. .SH "AUTHOR" .IX Header "AUTHOR" Copyright (C) 2009 Maggie J. Xiong .PP All rights reserved. There is no warranty. You are allowed to redistribute this software / documentation as described in the file \s-1COPYING\s0 in the \s-1PDL\s0 distribution.