mcxquery - compute simple graph statistics
mcxq is not in actual fact a program. This manual page documents the behaviour
and options of the mcx program when invoked in mode
q. The options
-h,
--apropos,
--version,
-set,
--nop,
-progress <num> are accessible in all
mcx
modes. They are described in the
mcx manual page.
mcxquery [-abc <fname> (
specify label input)
]
[-imx <fname> (
specify matrix input)
] [-o
<fname> (
output file name)
] [-tab <fname>
(
use tab file)
] [--node-attr (
output node degree and
weight attributes)
] [-vary-threshold
<start,end,step,scale> (
analyze graph at similarity
cutoffs)
] [-vary-knn <start,end,step,scale>
(
analyze graph for varying k-NN)
] [-vary-ceil
<start,end,step,scale> (
analyze graph for varying ceil
reductions)
] [-report-scale <num> (
edge
weight/threshold scaling)
] [--no-legend (
do not output
explanatory legend)
] [--reduce (
use reduced
matrix)
] [--test-metric (
test whether graph distance is
metric)
] [--test-cycle (
test whether graph contains
cycles)
] [-test-cycle <num> (
test cycles, report
cyclees)
] [--vary-correlation (
analyze graph at
correlation cutoffs)
] [--clcf (
include clustering
coefficient analysis)
] [--eff (
include efficiency
criterion)
] [-div <num> (
cluster size separating
value)
] [--dim (
report native format and
dimensions)
] [--edges (
output all arc weights,
unsorted)
] [--edges-sorted (
output all arc weights,
sorted)
] [-edges-hist <start,end,step,scale (
output
arc weight histogram)
] [--output-table (
output logical
tab separated table without key)
] [-t <num>
(
number of threads to use)
] [-icl <fname>
(
input clustering)
] [-tf spec (
apply tf-spec to input
matrix)
] [-h (
print synopsis, exit)
]
[--apropos (
print synopsis, exit)
] [--version
(
print version, exit)
]
The main use of
mcxquery is to analyze a graph at different similarity
cutoffs. Typically this is done on a graph constructed using a very permissive
threshold. For example, one can create a graph from array expression data
using
mcxarray with a very low pearson correlation cutoff such
as 0.2 or 0.3. Then
mcxquery can be used to analyze the graph
at increasingly stringent thresholds of 0.25, 0.30,
0.35 .. 0.95. Attributes supplied across different thresholds are
the number of connected components, the number of singletons, adn statistics
(median, average, iqr) on node degrees and edge weights.
-abc <fname> (
label input)
The file name for input that is in label format.
-imx <fname> (
input matrix)
The file name for input that is in mcl native matrix format.
-o <fname> (
output file name)
Set the name of the file where output should be written to.
-tab <fname> (
use tab file)
This option causes the output to be printed with the labels found in the tab
file.
--dim (
report native format and dimensions)
This will report the matrix format (either interchange or binary) and the matrix
dimensions. For a graph the two reported dimensions should be equal.
--edges (
output all arc weights, unsorted)
--edges-sorted (
output all arc weights, sorted)
-edges-hist <start,end,step,scale (
output arc weight histogram)
These options are fairly self documenting. The result of
-edges-hist is a
tab separated table of bin offsets and bin counts.
--output-table (
output logical tab separated table without key)
This option causes table output such as provided by
--vary-correlation to
be output in a logical tab-separated format rather than pretty-printed.
-vary-threshold <start,end,step,scale> (
analyze graphs at
similarity cutoffs)
All of
start,
end,
step and
scale must be integer
numbers. From these a list of threshold is constructed, starting from
start
/ scale,
(start + step) / scale,
(start + 2 step) /
scale, and so on until a value larger than or equal to
end /
scale is reached.
--vary-correlation (
analyze graphs at correlation cutoffs)
This instructs
mcxquery to use a threshold list suitable for use with
graphs in which the edge weight similarities are correlations. The list starts
at 0.2 and ends at 0.95 using increments of 0.05. If a different start or
increment is required it can be achieved by using the
-vary-threshold
option. For example, a start of 0.10 and an increment of 0.02 are
obtained by issuing
-vary-threshold 10,100,2,100.
defopt{--no-legend}{do not output explanatory legend}
For a fully parseable output format use
--output-table.
--clcf (
include clustering coefficient analysis)
--eff (
include efficiency criterion)
These options can be used to compute additional characteristics in the analysis
of thresholded graphs with
--vary-correlation and
-vary-threshold. For large graphs these are relatively time-consuming
to compute. More information and a reference for the efficiency criterion can
be found in
clminfo(1).
-vary-knn <start,end,step,scale> (
analyze graphs for varying
k-NN)
-vary-ceil <start,end,step,scale> (
analyze graphs for varying
ceil reductions)
--reduce (
use reduced matrix)
These options cause analysis of a graph as it is subjected to reductions across
a range of parameters. Refer to
mcxio(5) for a description of these
reductions. The analyses starts at the
end argument, and progresses
towards the
start argument using decrements of size
step. By
default the reduction is always computed relative to the start matrix, i.e.
the input matrix after
-tf transformations have optionally been
applied. Specifying
--reduce causes this to change so that each new
reduction is calculated relative to the reduction just computed.
For graphs with ties among edge weights it may be useful to use
-tf '#tug()'. This will add small perturbations to the edge
weights and have the effect of breaking ties. By default perturbations are
computed using the cosine between the vectors of neighbours of the two nodes
incident to an edge. This can be changed to a random perturbation with
-tf '#rug()'.
--test-cycle (
test whether graph contains cycles)
-test-cycle <num> (
test cycles, report cyclees)
Test whether the input graph contains cycles. With the second option nodes that
are part of a cycle are output, up to a maximum of
<num> nodes.
Use
<num>=
-1 to output all such nodes.
--test-metric (
test whether graph distance is metric)
This tests all possible triangle relationships.
-report-scale <num> (
edge weight / threshold scaling)
The edge weights mean, average, and inter-quartile range, as well as the
different threshold steps are all rescaled in the reported output to avoid
printing of fractional part. If
-vary-threshold was supplied then
scaling factor specified in the argument is used. With
--vary-correlation a scaling factor of 100 is used. Either can be
overridden by using the present option.
-div <num> (
cluster size separating value)
When analyzing graphs at different thresholds with one of the options above,
mcxquery reports the percentage of nodes contained in clusters not
exceeding a specified size, by default 3. This number can be changed
using the
-div option.
-tf <tf-spec> (
transform input matrix values)
Transform the input matrix values according to the syntax described in
mcxio(5).
-t <num> (
number of threads to use)
This has an effect only when using the
-vary-knn option, and is only
useful on multi-CPU machines.
--node-attr (
output node degree and weight attributes)
Output is in the form of a tab separated file. The option
-icl can be
used in conjuction.
-icl <fname> (
input clustering)
Output for each node the size of the cluster it is in. This option can be used
in conjunction with
--node-attr.
mcxio(5), and
mclfamily(7) for an overview of all the
documentation and the utilities in the mcl family.