table of contents
- testing 3.12.3+dfsg-1
- unstable 3.12.3+dfsg-1
- experimental 3.13.0+dfsg-1~exp1
| GDAL-VECTOR-PARTITION(1) | GDAL | GDAL-VECTOR-PARTITION(1) |
NAME¶
gdal-vector-partition - Partition a vector dataset into multiple files
Added in version 3.12.
SYNOPSIS¶
Usage: gdal vector partition [OPTIONS] <INPUT> <OUTPUT> Partition a vector dataset into multiple files. Positional arguments:
-i, --input <INPUT> Input vector datasets [required] [not available in pipelines]
-o, --output <OUTPUT> Output directory [required] Common Options:
-h, --help Display help message and exit
--json-usage Display usage as JSON document and exit
--config <KEY>=<VALUE> Configuration option [may be repeated]
-q, --quiet Quiet mode (no progress bar or warning message) [not available in pipelines] Options:
--overwrite Whether overwriting existing output dataset is allowed
Mutually exclusive with --append
--append Whether appending to existing layer is allowed
Mutually exclusive with --overwrite
-f, --of, --format, --output-format <OUTPUT-FORMAT> Output format
--co, --creation-option <KEY>=<VALUE> Creation option [may be repeated]
--lco, --layer-creation-option <KEY>=<VALUE> Layer creation option [may be repeated]
--field <FIELD> Attribute or geometry field(s) on which to partition [may be repeated]
--scheme <SCHEME> Partitioning scheme. SCHEME=hive|flat (default: hive)
--pattern <PATTERN> Filename pattern ('part_%010d' for scheme=hive, '{LAYER_NAME}_{FIELD_VALUE}_%010d' for scheme=flat)
--feature-limit <FEATURE-LIMIT> Maximum number of features per file
--max-file-size <MAX-FILE-SIZE> Maximum file size (MB or GB suffix can be used)
--omit-partitioned-field Whether to omit partitioned fields from target layer definition
--skip-errors Skip errors when writing features Advanced Options:
--if, --input-format <INPUT-FORMAT> Input formats [may be repeated] [not available in pipelines]
--oo, --open-option <KEY>=<VALUE> Open options [may be repeated] [not available in pipelines]
DESCRIPTION¶
gdal vector partition dispatches features into different files, depending on the values the feature take on a subset of attribute or geometry fields specified by the user and/or by limiting each output layer to a maximum number of features and/or a maximum file size.
Two partitioning schemes are available:
- hive, corresponding to Apache Hive partitioning, is the
default one.
Each partitioning field corresponds to a nested directory. Let's consider a layer with fields "continent" and "country", chosen as partitioning fields. All features where "continent" evaluates to "Europe" and "country" evaluates to "France", will be written in the "continent=Europe/country=France/" subdirectory of the output directory.
NULL values for partitioning fields are encoded as __HIVE_DEFAULT_PARTITION__ in the directory name. Non-ASCII characters, space, equal sign, or characters not compatible with directory name constraints are percent-encoded (e.g. %20 for space).
- flat where files are written directly under the output directory using a default filename pattern of {LAYER_NAME}_{FIELD_VALUE}_%10d.
By default, the format of the input dataset will be used for the output, if it can be determined and the input driver supports writing. Otherwise, --format must be used.
gdal vector partition can be used as the last step of a pipeline.
The following options are available:
PROGRAM-SPECIFIC OPTIONS¶
- --feature-limit <FEATURE-LIMIT>
- Maximum number of features per file. By default, unlimited. If the limit is exceeded, several parts are created.
- --field <FIELD-NAME>
- Fields(s) on which to partition
Only attribute fields of type String, Integer and Integer64 are allowed. The order into which fields are specified matter to determine the directory hierarchy.
Starting with GDAL 3.13, geometry field names can be specified (OGR_GEOMETRY being the generic name for the first geometry field). Partitioning on geometry fields is done on the geometry type. This can be useful for file formats where a single geometry type per layer is allowed.
Starting with GDAL 3.13, --field is no longer required, but when it is not specified, --feature-limit and/or --max-file-size must be specified.
- --max-file-size <MAX-FILE-SIZE>
- Maximum file size (MB or GB suffix can be used). By default, unlimited. If
the limit is exceeded, several parts are created.
Note that the maximum file size is used as a hint, and might not be strictly respected, because the evaluation of the file size corresponding to a feature is based on a heuristics, as the file size itself cannot be reliably used when it is under writing. In particular, the heuristics does not assume any compression, so for compressed formats, the actual size of a part can be significantly smaller than the specified limit.
- --omit-partitioned-field
- Whether to omit partitioned fields from the target layer definition. Automatically set for Parquet output format and Hive partitioning.
- --output <OUTPUT-DIRECTORY>
- Root of the output directory. [required]
- --pattern <PATTERN>
- Filename pattern. User chosen string, with substitutions for:
- {LAYER_NAME}, when found, is substituted with the layer name (percent encoded where needed).
- {FIELD_VALUE}, when found, is substituted with the partitioning field value (percent encoded where needed). If several partitioning fields are used, each value is separated by underscore (_). Empty strings are substituted with __EMPTY__ and null fields with __NULL__.
- %[0?][0-9]?[0]?d: C-style integer formatter for the part number. Valid values are for example %d or %05d. One and only one part number specifier must be present in the pattern.
Default values for the pattern are part_%010d for the hive scheme, and {LAYER_NAME}_{FIELD_VALUE}_%010d for the flat scheme.`
- --scheme hive|flat
- Partitioning scheme. Defaults to hive.
STANDARD OPTIONS¶
- --append
- Whether the output directory must be opened in append mode. Implies that
it already exists and that the output format supports appending.
This mode is useful when adding new features to an already an existing partitioned dataset.
- --co, --creation-option <NAME>=<VALUE>
- Many formats have one or more optional dataset creation options that can
be used to control particulars about the file created. For instance, the
GeoPackage driver supports creation options to control the version.
May be repeated.
The dataset creation options available vary by format driver, and some simple formats have no creation options at all. A list of options supported for a format can be listed with the --formats command line option but the documentation for the format is the definitive source of information on driver creation options. See Vector drivers format specific documentation for legal creation options for each format.
Note that dataset creation options are different from layer creation options.
- --if, --input-format <format>
- Format/driver name to be attempted to open the input file(s). It is
generally not necessary to specify it, but it can be used to skip
automatic driver detection, when it fails to select the appropriate
driver. This option can be repeated several times to specify several
candidate drivers. Note that it does not force those drivers to open the
dataset. In particular, some drivers have requirements on file extensions.
May be repeated.
- --lco, --layer-creation-option <NAME>=<VALUE>
- Many formats have one or more optional layer creation options that can be
used to control particulars about the layer created. For instance, the
GeoPackage driver supports layer creation options to control the feature
identifier or geometry column name, setting the identifier or description,
etc.
May be repeated.
The layer creation options available vary by format driver, and some simple formats have no layer creation options at all. A list of options supported for a format can be listed with the --formats command line option but the documentation for the format is the definitive source of information on driver creation options. See Vector drivers format specific documentation for legal creation options for each format.
Note that layer creation options are different from dataset creation options.
- --oo, --open-option <NAME>=<VALUE>
- Dataset open option (format specific).
May be repeated.
- -f, --of, --format, --output-format <OUTPUT-FORMAT>
- Which output vector format to use. Allowed values may be given by gdal --formats | grep vector | grep rw | sort
- --overwrite
- Allow program to overwrite existing target file or dataset. Otherwise, by default, gdal errors out if the target file or dataset already exists.
- --skip-errors
- Added in version 3.12.
Whether failures to write feature(s) should be ignored. Note that this option sets the size of the transaction unit to one feature at a time, which may cause severe slowdown when inserting into databases.
RETURN STATUS CODE¶
The program returns status code 0 in case of success, and non-zero in case of error (non-blocking errors emitted as warnings are considered as a successful execution).
EXAMPLES¶
Example 1: Create a partition based on the "continent" and "country" fields¶
$ gdal vector partition world_cities.gpkg out_directory --field continent,country --format Parquet
Example 2: Create a partition based on the "country" field, filtering on cities with population bigger than 1 million, with a flat partitioning scheme¶
$ gdal pipeline ! read world_cities.gpkg ! filter --where "pop > 1e6" ! partition out_directory --field country --format GPKG --scheme flat
Example 3: Create multiple Shapefiles, one for each geometry type, from a GeoPackage file, by grouping together POLYGON/MULTIPOLYGON or LINESTRING/MULTILINESTRING.¶
$ gdal vector pipeline ! read input.gpkg ! set-geom-type --multi ! partition out_directory --scheme flat --field OGR_GEOMETRY --format "ESRI Shapefile"
Example 4: Split a file into files with at most 100,000 features.¶
$ gdal vector partition world_cities.gpkg out_directory --feature-limit 100000 --scheme flat --format Parquet
Example 5: Sort a file spatially and split it into files with at most 100,000 features.¶
$ gdal vector pipeline read world_cities.gpkg ! sort ! partition out_directory --feature-limit 100000 --scheme flat --format Parquet
AUTHOR¶
Even Rouault <even.rouault@spatialys.com>
COPYRIGHT¶
1998-2026
| May 8, 2026 |