Scroll to navigation

simriscparams(7) simrisc configuration file organization simriscparams(7)

NAME

simriscparams - The description of the configuration files

DESCRIPTION

This page describes the organization of the simrisc configuration files. These files are formatted like standard unix configuration files. Lines are interpreted after removing initial white-space (blanks and tabs) and after removing all characters from lines starting at the first # character: this is considered comment and is ignored. If a line (not containing a # character) ends in a backslash (\), then the next line (initial white-space removed) is appended to the current line.

Note that all parameter identifiers are interpreted case sensitively. E.g., Costs: is a different parameter than costs:. The numeric values used in this man-page are for illustration purposes only. Some restrictions apply though: standard deviations cannot be negative; proportions and probabilities must lie in the range 0..1; multiple probabilities (like the ones used for breast densities) must add up to 1; etc. If restrictions apply then they are mentioned at the various parameter descriptions below.

DEFAULT CONFIGURATION FILE

The configuration file provided in the simrisc distribution is
/usr/share/doc/simrisc/simrisc.gz.

Usually this file is unzipped by the user to the user’s ~/.config directory:


gunzip < /usr/share/doc/simrisc/simrisc.gz > ~/.config/
whereafter ~/.config/simrisc can be edited to contain local modifications.

Various parameters specify probability distributions. Usually the Normal distribution is specified. The program also recognizes the LogNormal and Uniform distributions, and uses the Beta distribution when handling parameter variations of the beta parameter used for lung cancer simulations (note that the similarity of the names beta (the parameter) and Beta (the distribution) is sheer accidentally).

Parameter specifications start with keywords, followed by a colon. All keywords are covered below. The format of the specifications is fixed, but empty lines and white space may be used to improve the specifications’ readabilities.

Parameter specifications starting with uppercase letters (like Scenario:) specify (sub)sections and contain no additional specifications. Specifications starting with lowercase letters (like ageGroup:) are followed by actual parameter values.

The configuration file must define all parameters of all configuration sections, but configuration parameters can be modified using a separate analysis file or they can be modified by command-line parameters.

Several section namess are optional, e.g., `Scenario:’. In this manual page the label `(opt.)’ is appended to the names of those sections. In actual confiuration specifications they can stil be used, but they can also completely be omitted.

Section `Scenario:’ (opt.)

This section may start with a line containing Scenario: and specifies some general parameters of the simulation process. The default configuration file contains the following specifications:

spread: false
when specified as true parameter spreading is used;
iterations: 1
the (positive) number of iterations used in a simulation loop;
generator: fixed
in addition to fixed the modes random and increasing are available.
This parameter specificies the way simrisc’s random number generators are initialized. When mode fixed is used the random number generators are initialized with seed’s value; mode random results in the random number generators being initialized by randomly selected seeds and seed (below) is not used; mode increasing results in incrementing the seeds of the random number generators by a fixed increment at each iteration;
seed: 1
the (positive) value to seed the random number generator with. This parameter is ignored when generator: random was specified;
cases: 1000
the (positive) number of cases being simulated;
death: ...
the death: parameter may either be followed by the path to a file (if its initial character is a tilde (~) it is replaced by the user’s home directory; if it’s a plus (+) it is replaced by the base directory (specified with the --base option, see the simrisc(1) man-page)), or it must be followed by 101 cumulative death proportions, where each line starts with <nr>:, where <nr> is the next order number to read. The default configuration file specifies:

death:
1: .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000
11: .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000
21: .00000 .00014 .00028 .00042 .00056 .00070 .00106 .00142 .00178 .00214
31: .00250 .00382 .00514 .00646 .00778 .00910 .01118 .01326 .01534 .01742
41: .01950 .02312 .02674 .03036 .03398 .03760 .04414 .05068 .05722 .06376
51: .07030 .07776 .08522 .09268 .10014 .10760 .11564 .12368 .13172 .13976
61: .14780 .15718 .16656 .17594 .18532 .19470 .20658 .21846 .23034 .24222
71: .25410 .27560 .29710 .31860 .34010 .36160 .39368 .42576 .45784 .48992
81: .52200 .56240 .60280 .64320 .68360 .72400 .75826 .79252 .82678 .86104
91: .89530 .90962 .92394 .93826 .95258 .96690 .97104 .97518 .97932 .98346
101: .98760
which are the 101 cumulative death proportions used for breast cancer simulations.
If the 101 cumulative death proportions are available in, e.g., the user’s .config/ directory as the file cumdeath then the specification could have been:

death: ~/.config/cumdeath
which might be convenient when using different values in Analysis: specifications (see section ANALYSES in the simrisc(1) man-page).

Section `Costs:’ (opt.)

This section may start with a line containing Costs: and specifies several parameters used for cost-calculations. Modality-specific cost parameters are specified in Section Modalities: (see below). The default configuration file specifies:

biop: 176
the (positive) cost of performing a biopsy;
diameters: 0: 6438 20: 7128 50: 7701
pairs of diameter: cost values specifying the treatment cost starting at the specified tumor diameter, up to the next pair’s diameter (if specified) or all diameters starting at the diameter specified at the last pair. The first diameter must be 0. The second value of each pair specifies the (non-negative) treatment costs for that age-group.
Discount: (opt.)
the costs discount proportion starting at some age. This line is optional but the followed two additional lines are required: the line discount: contains two values specifying, respectively, the discount proportions for breast- and lung-cancer simulations. When only one proportion is specified it represents the proportion of the actually used simulation type (i.e., breast- or lung-cancer); the line age: specifies the discount’s starting age (for both simulation types):

Discount: # breast lung
proportion: 0 .04
age: 50

Section `BreastDensities:’

This section starts with a line containing BreastDensities: which are used with breast-cancer simulations. It defines breast density values for various age groups, covering ages 0 through the maximum age for simulated cases. The default configuration file contains the following specifications:


# bi-rad: a b c d
ageGroup: 0 - 40 0.05 0.30 0.48 0.17
40 - 50 0.06 0.34 0.47 0.13
50 - 60 0.08 0.50 0.37 0.05
60 - 70 0.15 0.53 0.29 0.03
70 - * 0.18 0.54 0.26 0.02

Optionally, each line may start with an ageGroup: specification, which is required for the first specification line.
Age groups are half-open ranges: starting at their first ages, and endinf at (but not including) their second ages. The first ages of subsequent age groups must be equal to the second ages of their previous age groups. For the last age group the specification * can be used, indicating that all ages at or above the last age group’s begin age are handled by that group.
For each age group the probabilities of the four bi-rad classifications must sum to 1.0.

Section Modalities: (opt.)

This section may start with a line containing Modalities: and specifies cancer-scanning modalities. Currently three modalities are supported for breast-cancer simulations: Mammo, Tomo, MRI, and one modality is supported for lung-cancer simulations: CT.

Some modalities specify age groups, which are (like the age ranges used for breastDensities) half-open ranges: they start at their first ages, and end at (not including) their second-ages, while subsequent age ranges must connect. Also, the last age group may use the end-age specification *.

The default configuration file contains (below the line Modalities:) the following specifications (if modalities aren’t used their specifications are optional):

CT:
The CT modality is used when performing lung cancer simulations. The screening costs are used instead of the value configured at the Costs: biop: specification. The diagnosis: costs specifies the costs of performing a CT-scan. The M0 and M1 values specify the costs when, respectively, no metastatis or a matastasis has been detected.
The radiation dose of CT scans is configured at the CT: dose: specification. The sensitivity depends on the tumor diameter. For tumor sizes between 3 and 5 mm. the sensitivity is computed using the formula (.5 * diameter - 1..5) * 100 (e.g., 50% for tumors of 4 mm.)
The default configuration file contains the following CT specifications:

CT:
# screening diagnosis M0 M1 (M0, M1: Table S3)
costs: 176 1908 37909 56556
dose: 1
# diam. value (must be integral 0..100 or -1)
sensitivity: 0 - 3: 0
3 - 5: -1 # formula: (.5 * diam - 1.5) * 100
5 - *: 100
# mean stddev dist
specificity: .992 .076 Normal
Optionally, each sensitivity specification line may start with a sensitivity: specification, which is required for the first specification line.

Mammo:
For the Mammo modality the costs, radiation doses and m: parameter specifications per bi-rad category, specificity probabilities for age groups, the parameters of the beta-function, and the systematic error probability must be specified.
The Mammo sensitivity is computed using the beta-function published by Isheden and Humphreys (2017, Statistical Methods in Medical Research, 28(3), 681-702). From a randomly generated probability and the case’s age the case’s bi-rad category is determined and that category is then used to select the m-parameter that is used in the beta-function.
The default configuration file contains the following Mammo specifications:

Mammo:
costs: 64
# bi-rad: a b c d
dose: 3 3 3 3
m: .136 .136 .136 .136
# ageGroups:
specificity: 0 - 40: .961 40 - *: .965
# 1 2 3 4
beta: -4.38 .49 -1.34 -7.18
systematicError: 0.1

Tomo:
For the Tomo modality the costs, radiation doses per bi-rad category, sensitivity probabilities per bi-rad category, and specificity probabilities for age groups must be specified.
The default configuration file contains the following Tomo specifications:

Tomo:
costs: 64
# bi-rad: a b c d
dose: 3 3 3 3
sensitivity: .87 .84 .73 .65
# ageGroups:
specificity: 0 - 40: .961 40 - *: .965

MRI:
For the MRI modality the costs, and the sensitivity and specificity probabilities must be specified.
The default configuration file contains (below the line MRI:) the following specifications:

costs: 280
# proportion:
sensitivity: .94
specificity: .95

Section `Screening:’ (opt.)

This section may start with a line containing Screening: and it specifies the ages at which screenings are performed, the used screening modality/modalities for each of the used screening ages, and the probabilities that screening rounds are attended. If no screening rounds should be used then specify a single round-specification line:


round: none

Otherwise, the first screening round must start with the keyword round: followed by an age, which in turn is followed by a list of at least one space delimited modality specification. Subsequent screening round specifications may optionally start with the keyword round:. Currently Mammo, Tomo, MRI and CT are available. Mammo, Tomo, and MRI can be specified when performing breast-cancer simulations, CT can be specified when performing lung-cancer simulations. The default configuration file contains the following round specifications:


# round: 50 CT
# 52 CT
# 54 CT
# 56 CT
# 58 CT
# 60 CT
# 62 CT
# 64 CT
# 66 CT
# 68 CT
# 70 CT
# 72 CT
# 74 CT
round: 50 Mammo
52 Mammo
54 Mammo
56 Mammo
58 Mammo
60 Mammo
62 Mammo
64 Mammo
66 Mammo
68 Mammo
70 Mammo
72 Mammo
74 Mammo

The probability that a case will attend a screening round is specified by the attendanceRate: parameter:


# probability:
attendanceRate: .8

Section `Tumor:’ (opt.)

This section may start with a line containing Tumor: and it specifies the characteristics of tumors. Several of the parameters in this section can be varied by specifying spread: true in the section Scenario:, in which case statistical variations are applied to these parameters.

Supported distributions are Normal, Uniform, LogNormal, and (for the lung-cancer’s Beir7 parameters) the Beta distribution. If value is the specified value parameter value, and spread the specified spread parameter then the values that are actually used during the simulations are:

when using the Normal distribution N(mean, stddev):

N(value, spread)
when using the Uniform distribution U(begin, end):

U(value - spread / 2, value + spread / 2)
when using the LogNormal distribution L(mean, stddev):

L(value, spread)
the Beta distribution is used when requestin lung cancer simulations. For male cases the 95% confidence intervals for the beta parameters ranges from .15 to .70, for female cases it ranges from .94 to 2.10, and values drawn from these distributions are used when spread: true has been specified (see also section BETA DISTRIBUTIONS).

The spread parameters may not be negative. If spread values are configured then their distributions must also be specified. If spread is not specified, then the value parameter won’t vary if spread: true is specified in the Scenario section. The same holds true for the Beta distribution: if no spreading should be applied, even though spread: true was specified, then the Beta distribution’s specificatins should be omitted.

The Tumor: section has four subsections: Beir7:, Growth, Incidence:, Survival:, and S3:. They contain the following parameter specifications:

Beir7:

BEIR (tumor induction) parameters: only tumor induction type 7 (i.e., beir7) is used. The default configuration file contains specifications for breast cancer simulations and for male and female lung cancer simulations:


# eta beta spread dist.
breast: -2.0 0.51 0.32 Normal
# Beta-distribution parameters:
# LC: eta beta dist constant factor aParam bParam
male: -1.4 .32 Beta .234091 1.72727 2.664237 5.184883
female: -1.4 1.40 Beta .744828 .818966 3.366115 4.813548
If spread: true is specified then the actually used beta parameters are drawn from their respective distributions.

See also National Research Council. 2006. Health Risks from Exposure to Low Levels of Ionizing Radiation: BEIR VII Phase 2. Washington, DC: The National Academies Press (https://doi.org/10.17226/11340).

Growth:

Tumor growth specifications consist of three elements: start diameters, self-detect parameterss and doubling time specifications.

The start parameters define the start diameters (in millimeters) of emerging tumors used with respectively, breast and lung cancer simulations. The default configuration file specifies



# breast lung
start: 5 3

The default configuration file contains the following specifications of the self-detect parameters for breast and lung cancer simulations:


#selfDetect: # stdev mean spread dist
breast: 2.01375 18.5413 2.31637 Normal
lung: 1.0141 20.8426 1.84043 Normal

Four parameters are used when determining the diameter at which self-detection is possible. These parameters are:

the standard deviation (stdev) used by the lognormal distribution to compute the diameter at which self-detection occurs. This parameter is required and cannot be negative;
the mean (see below) used by the lognormal distribution. This parameter is required and cannot be negative. Its value will vary using the following two parameters if spread: true was specified;
the spread (standard deviation) used by the distribution that is used to vary the mean if spread: true was specified. It can be omitted in which case the mean won’t vary;
the distribution used to vary the mean. If the previous parameter is omitted then this parameter must also be omitted.

The actually used self-detect diameter is computed using:


diameter = L(mean, stdev)

Finally, the Growth: subsection also defines tumor doubling times for various age groups when using breast cancer simulations and for all ages when using lung cancer simulations.

Doubling times are computed like the self-detect diameters, i.e., using millimeters and lognormal distributions. Thus, for each age group and for the lung cancer simulation four parameters are specified (of which the final two are optional): the standard deviation of the lognormal distribution, the mean value of the lognormal distribution, and the spread and name of the distribution that is used when spread: true was specified.

The age groups (used with breast cancer simulations) must cover ages 0 through the maximum age for simulated cases, and are specified as described at section BreastDensities:. The default configuration file contains the following specifications:


DoublingTime:
# stdev mean spread dist.
ageGroup: 1 - 50: 1.84043 79.838 1.53726 Normal
50 - 70: 1.29693 157.591 1.1853 Normal
70 - * : 1.56831 188.67 1.2586 Normal
# all ages stdev mean spread dist.
lung: 1.23368 98.4944 2.09594 Normal
Optionally, each ageGroup line may start with an ageGroup: specification, which is required for the first specification line.

Incidence: (opt.)

For breast cancer simulations three carrier types are supported: Normal, BRCA1 and BRCA2. Each having a probability of occurrence. The probabilities of these carriers must add to 1. In the default configuration file BRCA1 and BRCA2 are specified, but their probabilties are set to 0, in which case their specifications can also be removed from configuration files.

Each carrier is identified by name (i.e., when performing breast cancer simulations Breast:, BRCA1:, and BRCA2; when performing lung cancer simulations: Male: and Female:) followed by their parameter specifications:

for breast cancer simulations: the probability that the carrier is observed;
the lifetime risk: three parameters specifying a probability;
the mean age: three parameters specifying the mean age,;
the standard deviation used when computing the risk of getting a tumor. As this standard deviation is used in the denominator of expressions it must be larger than zero.

The lifetime risk, mean age and standard deviation parameters may optionally be followed by the standard deviation (spread) and distribution used to vary the probability when spread: true is specified;

The default configuration file contains these specifications:


Male:
# value spread distr.
lifetimeRisk: .22 .005 Normal
meanAge: 72.48 1.08 Normal
stdDev: 9.28 1.62 Normal
Female:
# value spread distr.
lifetimeRisk: .20 .004 Normal
meanAge: 69.62 1.49 Normal
stdDev: 9.73 1.83 Normal
Breast:
probability: 1
# value spread distr.
lifetimeRisk: .226 .0053 Normal
meanAge: 72.9 .552 Normal
stdDev: 21.1
BRCA1:
probability: 0
# value spread distr.
lifetimeRisk: .96
meanAge: 53.9
stdDev: 16.51
BRCA2:
probability: 0
# value spread distr.
lifetimeRisk: .96
meanAge: 53.9
stdDev: 16.51
Instead of specifying the parameters of a distribution, a riskTable can also be used. If a riskTable is specified at a category (i.e., Male: .. BRCA2:) then the lifetimeRisk, meanAge, and stdDev parameters are ignored. A riskTable specification contains pairs of values. The first value of a pair specifies an age, the second value the probability of a tumor developing until that age. Both ages and probabilities must be cumulative and at least two pairs must be specified.

Unless specified in the riskTable specification itself simrisc adds a pair 0, .00 at the beginning, and an age specification 100 at the end. Its cumulative tumor probability is computed using linear extrapolation of the two age values before age 100, using a maximum value of 1.00. For ages in between pairs of age values linear interpolation is used, using the surrounding age specifications. Here is a (fictitious) example of a riskTable specification in the Male: category:


Male:
riskTable:
40 .01 50 .1 55 .15 60 .22
65 .55 70 .62 75 .67

Survival: (opt.)

For breast cancer simulations four types of survival parameters must be specified. Each type (a..d) specifies a mean, and (optionally) a spread and distribution (which are used when spread: true has been specified). The default configuration file specifies:


# value spread dist:
type: a .00004475 .000004392 Normal
b 1.85867 .0420 Normal
c -.271 .0101 Normal
d 2.0167 .0366 Normal
e 1.00 .01 Normal
Optionally, each line may start with a type: specification, which is required for the first specification line.

The e parameters can be used to estimate relative survival if not enough data are available (and can reflect the quality of care). The value 1 indicates that enough data are available to estimate survival using the a to d parameters. If not enough data are available the e parameter can be used to adjust the survival estimate to correct the use of the a to f parameters to the quality level of the medical care relative to a country for which the a to d parameters are available. E.g., in Dutch breast cancer research studying the survival in Indonesia the value 0.9 was used since using that factor in combination with the provided a to d parameter values the correct survival probabilities for Indonesia were obtained.

When performing breast cancer simulations TNM indices (cf. the description of the S3 table below) are also determined. With breast cancer simulations the second TNM value is always 0, and the first TNM value is, as with table S3, determined by the tumor’s diameter. The default configuration file contains the following bc: specification (see also option --tnm):


# BC TNM categories thru (<=) diameters (mm):
bc: 20 50 *
# TNM: T1 T2 T3
(cf. https://www.cancerresearchuk.org/about-cancer/breast-cancer/stages-types-grades/tnm-staging).

For lung cancer simulations table S4 is used to determine the a..e parameters. Table S4 contains four categories (lung0..lung3) defining these a..e parameters, where (for a known cancer’s diameter) the category is randomly determined using table S3 (see below).

Table S4 is appended to the breast cancer specifications. The default configuration file contains the following specifications of table S4:


# table S4: 4 columns per a..d parameter
# lungX: X is table S4’s column index
lung0: a .00143 .00095 Normal
b 1.84559 .33748 Normal
c -.22794 .07823 Normal
d 1.06799 .16226 Normal
e 1.00 .01 Normal
lung1: a .01530 .00381 Normal
b 1.69434 .10979 Normal
c -.19358 .02105 Normal
d .66690 .03869 Normal
e 1.00 .01 Normal
lung2: a .78600 .29815 Normal
b .69791 .05425 Normal
c .0 .0 Normal
d .0 .0 Normal
e 1.00 .01 Normal
lung3: a 1.25148 .32305 Normal
b .77852 .34149 Normal
c .0 .0 Normal
d .0 .0 Normal
e 1.00 .01 Normal
Optionally, each subsequent line following the first lung0: .. lung3: line may repeat its lung1: .. lung3: label.

S3: (opt.)

With lung cancer specifications tables S3 and S4 are used to determine the survival parameters. The tumor’s diameter determines the row of table S3, and then its column is randomly determined using the probabilities listed in S3’s rows. For each row the probabilities must sum to 1. Once the S3 column has been determined the column index which of the lungX: specifications is used. The row and column indices are 0-based. E.g., if a tumor diameter is 24, then row 2 (diameter <= 30) is selected. Then, if the random value is .630, column 1 is used (column N1-3,M0). Whenever a tumor is present these pairs of indices are reported in the comma-separated data file in the column marked as TNM, using an entry like 2,1.

The default configuration file contains the following table S3:


S3:
# diameter (mm)
# T-row <= N0,M0 N1-3,M0 N1-3,M1a-b N0-3M1c
prob: 10: .756 .157 .048 .039 # T1a,b
20: .703 .197 .055 .045 # T1b
30: .559 .267 .095 .078 # T1c
50: .345 .351 .167 .137 # T2a,b
70: .196 .408 .218 .178 # T3
*: .187 .347 .256 .210 # T4
Optionally, each line may start with a prob: specification, which is required for the first specification line.
(cf. https://www.sciencedirect.com/science/article/pii/S2667005421000491 and Table 1 of its appendix: https://ars.els-cdn.com/content/image/1-s2.0-S2667005421000491-mmc1.pdf)

BETA DISTRIBUTIONS

Values generated from Beta distributions range between 0 and 1 (cf. https://en.wikipedia.org/wiki/Beta_distribution). The Beta distribution is computed using two Gamma distributions (cf. https://www.fmrib.ox.ac.uk/datasets/techrep/tr03tb1/tr03tb1/node24.html, https://stats.stackexchange.com/questions/502146/how-does-numpy-generate-samples-from-a-beta-distribution):


gamma1 = Gamma(aParam, 1)
Beta(aParam, bParam) = gamma1 / (gamma1 + Gamma(bParam, 1))

When using lung cancer simulations the 95% confidence interval (CI) for male cases ranges from .15 to .70, with a mean value of .32, and for women ranging from .94 to 2.10 with a mean value of 1.40.

The male and female CI ranges are transformed to .025 to .975 ranges using linear transformations. To transform values x from the male range to the .025 to .975 range the transformation y = 1.72727 * x - .234091 is used, and to transform back the transformation (y + .234091) / 1.72727 is used. For the female CI the transformations are y = .818966 * x - .744828 and (y + .744828) / .818966.

The aParam and bParam values are determined by first generating 1000 values so that their CI span the range 0.025 to .975, with a mean value of 0.318635 (for male cases) and 0.401724 (for female cases). Next the parameters of the corresponding beta distribution were estimated using maximum likelihood fitting, resulting in aParam = 2.664237 and bParam = 5.184883 for the distribution used with male lungcancer simulations, and aParam = 3.366115 and bParam = 4.813548 for the distribution used with female lungcancer simulations.

The default configuration file shows these values at the Beir7 beta parameters.

PARAMETER RESPECIFICATION

Parameters can be respecified by defining a separate parameter configuration file or by providing alternate parameter specifications in Analysis: sections of the program’s input file, or by providing alternative parameter specifications as command-line arguments (cf. the simrisc(1) man-page)

FILES

Configuration files

~/.config/simrisc: the default location of the program’s configuration file;
the simrisc distribution archive contains the default configuration file as simrisc-VERSION/stdconfig/simrisc, where VERSION is replaced by simrisc’s actual release version;
when installing simrisc using Linux distribution archives (e.g., .deb files) the default configuration file is commonly available as /usr/shared/doc/simrisc/simrisc.gz

SEE ALSO

simrisc(1)

BUGS

Versions before version 15.03.00 should not be used for lung cancer simulations.

COPYRIGHT

This is free software, distributed under the terms of the GNU General Public License (GPL).

AUTHOR

Frank B. Brokken (f.b.brokken@rug.nl),

2020-2024 simrisc.16.02.00