Scroll to navigation

Catmandu::Importer(3pm) User Contributed Perl Documentation Catmandu::Importer(3pm)

NAME

Catmandu::Importer - Namespace for packages that can import

SYNOPSIS

    # From the command line
    # JSON is an importer and YAML an exporter
    $ catmandu convert JSON to YAML < data.json
    # OAI is an importer and JSON an exporter
    $ catmandu convert OAI --url http://biblio.ugent.be/oai to JSON 
    # Fetch remote content
    $ catmandu convert JSON --file http://example.com/data.json to YAML
    
    # From Perl
    
    use Catmandu;
    use Data::Dumper;
    my $importer = Catmandu->importer('JSON', file => 'data.json');
    $importer->each(sub {
        my $item = shift;
        print Dumper($item);
    });
    my $num = $importer->count;
    my $first_item = $importer->first;
    # Convert OAI to JSON in Perl
    my $importer = Catmandu->importer('OAI', url => 'http://biblio.ugent.be/oai');
    my $exporter = Catmandu->exporter('JSON');
    $exporter->add_many($importer);

DESCRIPTION

A Catmandu::Importer is a Perl package that can generate structured data from sources such as JSON, YAML, XML, RDF or network protocols such as Atom, OAI-PMH, SRU and even DBI databases. Given an Catmandu::Importer a programmer can read data from using one of the many Catmandu::Iterable methods:

    $importer->to_array;
    $importer->count;
    $importer->each(\&callback);
    $importer->first;
    $importer->rest;
    ...etc...

Every Catmandu::Importer is also Catmandu::Fixable and thus inherits a 'fix' parameter that can be set in the constructor. When given a 'fix' parameter, then each item returned by the generator will be automatically Fixed using one or more Catmandu::Fixes. E.g.

    my $importer = Catmandu->importer('JSON',fix => ['upcase(title)']);
    $importer->each( sub {
        my $item = shift ; # Every $item->{title} is now upcased... 
    });
    # or via a Fix file
    my $importer = Catmandu->importer('JSON',fix => ['/my/fixes.txt']);
    $importer->each( sub {
        my $item = shift ; # Every $item->{title} is now upcased... 
    });

CONFIGURATION

Read input from a local file given by its path. If the path looks like a url, the content will be fetched first and then passed to the importer. Alternatively a scalar reference can be passed to read from a string.
Read input from an IO::Handle. If not specified, Catmandu::Util::io is used to create the input stream from the "file" argument or by using STDIN.
Binmode of the input stream "fh". Set to ":utf8" by default.
An ARRAY of one or more Fix-es or Fix scripts to be applied to imported items.
The data at "data_path" is imported instead of the original data.

   # given this imported item:
   {abc => [{a=>1},{b=>2},{c=>3}]}
   # with data_path 'abc', this item gets imported instead:
   [{a=>1},{b=>2},{c=>3}]
   # with data_path 'abc.*', 3 items get imported:
   {a=>1}
   {b=>2}
   {c=>3}
    
Variables given here will interpolate the "file" and "http_body" options. The syntax is the same as URI::Template.

    # named arguments
    my $importer = Catmandu->importer('JSON',
        file => 'http://{server}/{path}',
        variables => {server => 'biblio.ugent.be', path => 'file.json'},
    );
    # positional arguments
    my $importer = Catmandu->importer('JSON',
        file => 'http://{server}/{path}',
        variables => 'biblio.ugent.be,file.json',
    );
    # or
    my $importer = Catmandu->importer('JSON',
        url => 'http://{server}/{path}',
        variables => ['biblio.ugent.be','file.json'],
    );
    # or via the command line
    $ catmandu convert JSON --file 'http://{server}/{path}' --variables 'biblio.ugent.be,file.json'
    

HTTP CONFIGURATION

These options are only relevant if "file" is a url. See LWP::UserAgent for details about these options.

Set the GET/POST message body.
Set the type of HTTP request 'GET', 'POST' , ...
A reference to a HTTP::Headers objects.

Set an own HTTP client

Set an own HTTP client

Alternative set the parameters of the default client

A string containing the name of the HTTP client.
Maximum number of HTTP redirects allowed.
Maximum execution time.
Verify the SSL certificate.
Maximum times to retry the HTTP request if it temporarily fails. Default is not to retry. See LWP::UserAgent::Determined for the HTTP status codes that initiate a retry.
Maximum times and timeouts to retry the HTTP request if it temporarily fails. Default is not to retry. See LWP::UserAgent::Determined for the HTTP status codes that initiate a retry and the format of the timing value.

METHODS

first, each, rest , ...

See Catmandu::Iterable for all inherited methods.

CODING

Create your own importer by creating a Perl package in the Catmandu::Importer namespace that implements "Catmandu::Importer". Basically, you need to create a method 'generate' which returns a callback that creates one Perl hash for each call:

    my $importer = Catmandu::Importer::Hello->new;
    $importer->generate(); # record
    $importer->generate(); # next record
    $importer->generate(); # undef = end of stream

Here is an example of a simple "Hello" importer:

    package Catmandu::Importer::Hello;
    use Catmandu::Sane;
    use Moo;
    with 'Catmandu::Importer';
    sub generator {
        my ($self) = @_;
        state $fh = $self->fh;
        my $n = 0;
        return sub {
            $self->log->debug("generating record " . ++$n);
            my $name = $self->fh->readline;
            return defined $name ? { "hello" => $name } : undef;
        };
    }
    1;

This importer can be called via the command line as:

    $ catmandu convert Hello to JSON < /tmp/names.txt
    $ catmandu convert Hello to YAML < /tmp/names.txt
    $ catmandu import Hello to MongoDB --database_name test < /tmp/names.txt

Or, via Perl

    use Catmandu;
    my $importer = Catmandu->importer('Hello', file => '/tmp/names.txt');
    $importer->each(sub {
        my $items = shift;
    });

SEE ALSO

Catmandu::Iterable , Catmandu::Fix , Catmandu::Importer::CSV, Catmandu::Importer::JSON , Catmandu::Importer::YAML

2022-03-22 perl v5.34.0