.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.42) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "Catmandu::MARC::Tutorial 3pm" .TH Catmandu::MARC::Tutorial 3pm "2022-09-27" "perl v5.34.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" Catmandu::MARC::Tutorial \- A documentation\-only module for new users of Catmandu::MARC .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& perldoc Catmandu::MARC::Tutorial .Ve .SH "UTF\-8" .IX Header "UTF-8" .SS "\s-1MARC8\s0 and \s-1UTF\-8\s0" .IX Subsection "MARC8 and UTF-8" The current Catmandu \s-1MARC\s0 tools are targetted for processing \s-1UTF\-8\s0 encoded files. When you have \s-1MARC8\s0 encoded data tools like MarcEdit or \f(CW\*(C`yaz\-marcdump\*(C'\fR can be used to create a \s-1UTF\-8\s0 encoded file: .PP .Vb 1 \& $ yaz\-marcdump \-f MARC\-8 \-t UTF\-8 \-o marc \-l 9=97 marc21.raw > marc21.utf8.raw .Ve .SS "Unicode errors" .IX Subsection "Unicode errors" If you process \s-1UTF\-8\s0 encoded files which contain faulty characters, you will get a fatal error message like: .PP .Vb 1 \& utf8 "\exD8" does not map to Unicode at ... .Ve .PP Use the iconv (libc6\-dev Linux package) tool, to preprocess the data and discard faulty characters: .PP .Vb 1 \& $ iconv \-c \-f UTF\-8 \-t UTF\-8 marc21.utf8.raw | catmandu convert MARC \-\-type RAW to JSON .Ve .SS "Convert a decomposed \s-1UTF\-8\s0 file to a combined \s-1UTF\-8\s0 file and vice versa" .IX Subsection "Convert a decomposed UTF-8 file to a combined UTF-8 file and vice versa" For example, the character a\*: can be represented as .PP \&\*(L"a\*:\*(R", that is the codepoint U+00E4 (two bytes c3 a4 in \s-1UTF\-8\s0 encoding), or as \&\*(L"aX\*(R", that is the two codepoints U+0061 U+0308 (three bytes 61 cc 88 in \s-1UTF\-8\s0). .PP The uconv tool (from the libicu-dev Linux package) can be used to convert these types of files: .PP .Vb 2 \& $ uconv \-x any\-nfc < decomposed.txt > combined.txt \& $ uconv \-x any\-nfd < combined.txt > decomposed.txt .Ve .SH "READING" .IX Header "READING" .SS "Convert \s-1MARC21\s0 records into \s-1JSON\s0" .IX Subsection "Convert MARC21 records into JSON" The command below converts file data.mrc into \s-1JSON:\s0 .PP .Vb 1 \& $ catmandu convert MARC to JSON < data.mrc .Ve .SS "Convert \s-1MARC21\s0 records into MARC-XML" .IX Subsection "Convert MARC21 records into MARC-XML" .Vb 1 \& $ catmandu convert MARC to MARC \-\-type XML < data.mrc .Ve .SS "Convert \s-1UNIMARC\s0 records into \s-1JSON, XML, ...\s0" .IX Subsection "Convert UNIMARC records into JSON, XML, ..." To read \s-1UNIMARC\s0 records use the \s-1RAW\s0 parser to get the correct character encoding. .PP .Vb 2 \& $ catmandu convert MARC \-\-type RAW to JSON < data.mrc \& $ catmandu convert MARC \-\-type RAW to MARC \-\-type XML < data.mrc .Ve .SS "Create a \s-1CSV\s0 file containing all the titles" .IX Subsection "Create a CSV file containing all the titles" To extract data from a \s-1MARC\s0 record on needs a Fix routine. This is a small language to manipulate data. In the example below we extract all 245 fields from \s-1MARC:\s0 .PP .Vb 1 \& $ catmandu convert MARC to CSV \-\-fix \*(Aqmarc_map(245,title); retain(title)\*(Aq < data.mrc .Ve .PP The Fix \f(CW\*(C`marc_map\*(C'\fR puts the \s-1MARC 245\s0 field in the \f(CW\*(C`title\*(C'\fR field. The Fix \f(CW\*(C`retain\*(C'\fR makes sure only the title field ends up in the \&\s-1CSV\s0 file. .SS "Create a \s-1CSV\s0 file containing only the 245$a and 245$c subfields" .IX Subsection "Create a CSV file containing only the 245$a and 245$c subfields" The \f(CW\*(C`marc_map\*(C'\fR Fix can get one or more subfields to extract from \s-1MARC:\s0 .PP .Vb 1 \& $ catmandu convert MARC to CSV \-\-fix \*(Aqmarc_map(245ac,title); retain(title)\*(Aq < data.mrc .Ve .SS "Create a \s-1CSV\s0 file which contains a repeated field" .IX Subsection "Create a CSV file which contains a repeated field" In the example below the 650a field can be repeated in some \s-1MARC\s0 records. We will join all the repetitions in a comma delimited list for each record. .PP .Vb 1 \& $ catmandu convert MARC to CSV \-\-fix \*(Aqmarc_map(650a,subject,join:","); retain(subject)\*(Aq < data.mrc .Ve .SS "Create a list of all \s-1ISBN\s0 numbers in the data" .IX Subsection "Create a list of all ISBN numbers in the data" In the previous example we saw how all subjects can be printed using a few Fix commands. When a subject is repeated in a record, it will be written on one line joined by a comma: .PP .Vb 3 \& subject1 \& subject2,subject3 \& subject4 .Ve .PP In this example, record 1 contained 'subject1', record 2 'subject2' and 'subject3' and record 3 'subject4'. What should we use when we want a list of all values in a single long list? .PP In the example below we'll print all \s-1ISBN\s0 numbers in a batch of \s-1MARC\s0 records in one long list using the Text exporter: .PP .Vb 1 \& $ catmandu convert MARC to Text \-\-field_sep "\en" \-\-fix \*(Aqmarc_map(020a,isbn.$append); retain(isbn)\*(Aq < data.mrc .Ve .PP The first new thing is \f(CW$append\fR in the marc_map. This will create in \f(CW\*(C`isbn\*(C'\fR a list of all \s-1ISBN\s0 numbers found in the \f(CW\*(C`020a\*(C'\fR field. The \f(CW\*(C`Text\*(C'\fR exporter with the \f(CW\*(C`field_sep\*(C'\fR option will use all list values in the \f(CW\*(C`isbn\*(C'\fR field and writ them using new line as separator. .SS "Create a list of all unique \s-1ISBN\s0 numbers in the data" .IX Subsection "Create a list of all unique ISBN numbers in the data" Given the result of the previous command, it is now easy to create a unique list of \s-1ISBN\s0 numbers with the \s-1UNIX\s0 \f(CW\*(C`uniq\*(C'\fR command: .PP .Vb 1 \& $ catmandu convert MARC to Text \-\-field_sep "\en" \-\-fix \*(Aqmarc_map(020a,isbn.$append); retain(isbn)\*(Aq < data.mrc | sort | uniq .Ve .SS "Create a list of the number of subjects per record" .IX Subsection "Create a list of the number of subjects per record" We will create a list of subjects (650a) and count the number of items in this list for each record. The \s-1CSV\s0 file will contain the \f(CW\*(C`_id\*(C'\fR (record identifier) and \f(CW\*(C`subject\*(C'\fR the number of 650a fields. .PP Writing all Fixes on the command line can become tedious. In Catmandu it is possible to create a Fix script that contains all the Fix commands. .PP Open a text editor and create the \f(CW\*(C`myfix.fix\*(C'\fR file with content: .PP .Vb 3 \& marc_map(650a,subject.$append) \& count(subject) \& retain(_id, subject) .Ve .PP And execute the command: .PP .Vb 1 \& $ catmandu convert MARC to CSV \-\-fix myfix.fix < data.mrc .Ve .SS "Create a list of all \s-1ISBN\s0 numbers for records with type 920a == book" .IX Subsection "Create a list of all ISBN numbers for records with type 920a == book" In the example we need an extra condition for match the content of the 920a field against the string \f(CW\*(C`book\*(C'\fR. .PP Open a text editor and create the \f(CW\*(C`myfix.fix\*(C'\fR file with content: .PP .Vb 2 \& marc_map(020a,isbn.$append) \& marc_map(920a,type) \& \& select all_match(type,"book") # select only the books \& select exists(isbn) # select only the records with ISBN numbers \& \& retain(isbn) # only keep this field .Ve .PP Text after the \f(CW\*(C`#\*(C'\fR sign are inline code comments. .PP And run the command: .PP .Vb 1 \& $ catmandu convert MARC to Text \-\-field_sep "\en" \-\-fix myfix.fix < data.mrc .Ve .SS "Show which \s-1MARC\s0 records don't contain a 900a field matching some list of values" .IX Subsection "Show which MARC records don't contain a 900a field matching some list of values" First we need to create a list of keys that need to be matched against our \s-1MARC\s0 records. In the example below we create a \s-1CSV\s0 file with a \f(CW\*(C`key\*(C'\fR , \f(CW\*(C`value\*(C'\fR header and all the keys that are \s-1OK:\s0 .PP .Vb 5 \& $ cat mylist.txt \& key,value \& book,OK \& article,OK \& journal,OK .Ve .PP Next we create a Fix script that maps the \s-1MARC\s0 900a field to a field called \&\f(CW\*(C`type\*(C'\fR. This \f(CW\*(C`type\*(C'\fR field we lookup in the \f(CW\*(C`mylist.txt\*(C'\fR file. If a match is found, then the \f(CW\*(C`type\*(C'\fR field will contain the value in the list (\s-1OK\s0). When no match is found then the \f(CW\*(C`type\*(C'\fR will contain the original value. We reject all records that have \s-1OK\s0 as \f(CW\*(C`type\*(C'\fR and keep only the ones that weren't matched in the file. .PP Open a text editor and create the \f(CW\*(C`myfix.fix\*(C'\fR file with content: .PP .Vb 1 \& marc_map(900a,type) \& \& lookup(type,\*(Aq/tmp/mylist.txt\*(Aq) \& \& reject all_match(type,OK) \& \& retain(_id,type) .Ve .PP And now run the command: .PP .Vb 1 \& $ catmandu convert MARC to CSV \-\-fix myfix.fix < data.mrc .Ve .SH "Create a CSV file of all ISSN numbers found at any MARC field" .IX Header "Create a CSV file of all ISSN numbers found at any MARC field" To process this information we need to create a Fix script like the one below (line numbers are added here to explain the working of this script but should not be included in the script): .PP .Vb 10 \& 01: marc_map(\*(Aq***\*(Aq,text.$append) \& 02: \& 03: filter(text,\*(Aq(\eb\ed{4}\-?\ed{3}[\edxX]\eb)\*(Aq) \& 04: replace_all(text.*,\*(Aq.*(\eb\ed{4}\-?\ed{3}[\edxX]\eb).*\*(Aq,$1) \& 05: \& 06: do list(path:text) \& 07: unless is_valid_issn(.) \& 08: reject() \& 09: end \& 10: end \& 11: \& 12: vacuum() \& 13: \& 14: select exists(text) \& 15: \& 16: join_field(text,\*(Aq ; \*(Aq) \& 17: \& 18: retain(_id,text) .Ve .PP On line 01 all the text in the \s-1MARC\s0 record is mapped into a \f(CW\*(C`text\*(C'\fR array. On line 03 we filter out this array all the lines that contain an \s-1ISSN\s0 string using a regular expression. On line 04 the \f(CW\*(C`replace_all\*(C'\fR is used to delete everything in the \f(CW\*(C`text\*(C'\fR array that isn't an \s-1ISSN\s0 number. On line 06\-10 we go over every \s-1ISSN\s0 string and check if it has a valid checksum and erase it when not. On line 12 we use the \f(CW\*(C`vacuum\*(C'\fR function to remove any remaining empty fields On line 14 we select only the records that contain a valid \s-1ISSN\s0 number On line 16 the \s-1ISSN\s0 get joined by a semicolon ';' into a long string On line 18 we keep only the record id and the ISSNs in for the report. .PP Run this Fix script (without the line number) using this command .PP .Vb 1 \& $ catmandu convert MARC to CSV \-\-fix myfix.fix < data.mrc .Ve .SS "Create a \s-1MARC\s0 validator" .IX Subsection "Create a MARC validator" For this example we need a Fix script that contains validation rules we need to check. For instance, we require to have a 245 field and at least a 008 control field with a date filled in. This can be coded as in: .PP .Vb 4 \& # Check if a 245 field is present \& unless marc_has(\*(Aq245\*(Aq) \& log("no 245 field",level:ERROR) \& end \& \& # Check if there is more than one 245 field \& if marc_has_many(\*(Aq245\*(Aq) \& log("more than one 245 field?",level:ERROR) \& end \& \& # Check if in 008 position 7 to 10 contains a 4 digit number (\*(Aq\ed\*(Aq means digit) \& unless marc_match(\*(Aq008/07\-10\*(Aq,\*(Aq\ed{4}\*(Aq) \& log("no 4\-digit year in 008 position 7 \-> 10",level:ERROR) \& end .Ve .PP Put this Fix script in a file \f(CW\*(C`myfix.fix\*(C'\fR and execute the Catmandu command with the \*(L"\-D\*(R" option for logging and the Null exporter to discard the normal output .PP .Vb 1 \& $ catmandu \-D convert MARC to Null \-\-fix myfix.fix < data.mrc .Ve .SH "TRANSFORMING" .IX Header "TRANSFORMING" .SS "Add a new \s-1MARC\s0 field" .IX Subsection "Add a new MARC field" In the example bellow we add new 856 field to the record with a \f(CW$u\fR subfield containing the Google homepage: .PP .Vb 1 \& marc_add(856,u,"http://www.google.com") .Ve .PP A control field can be added by using the '_' subfield .PP .Vb 1 \& marc_add(009,_,0123456789) .Ve .PP Maybe you want to copy the data from one subfield to another. Use the marc_map to store the data first in a temporary field and add it later to the new field: .PP .Vb 2 \& # copy a subfield \& marc_map(001,tmp) \& \& # maybe process the data a bit \& append(tmp,"\-mytest") \& \& # add the contents of the tmp field to the new 009 field \& marc_add(009,_,$.tmp) .Ve .SS "Set a \s-1MARC\s0 subfield" .IX Subsection "Set a MARC subfield" Set the \f(CW$h\fR subfield to a new value (or create it when it doesn't exist yet): .PP .Vb 1 \& marc_set(100h, test123) .Ve .PP Only set the 100 field if the first indicator is 3 .PP .Vb 1 \& marc_set(100[3]h, test123) .Ve .SS "Remove a \s-1MARC\s0 (sub)field" .IX Subsection "Remove a MARC (sub)field" Remove all fields 500 , 501 , 5** : .PP .Vb 1 \& marc_remove(5**) .Ve .PP Remove all 245h fields: .PP .Vb 1 \& marc_remove(245h) .Ve .SS "Append text to a \s-1MARC\s0 field" .IX Subsection "Append text to a MARC field" Append a period to the 500 field is there isn't already there: .PP .Vb 5 \& do marc_each() \& unless marc_match(500, "\e.$") # Only if the current field 500 doesn\*(Aqt end with a period \& marc_append(500,".") # Add to the current 500 field a period \& end \& end .Ve .PP Use the Catmandu::Fix::Bind::marc_each Bind to loop over all \s-1MARC\s0 fields. In the context of the \f(CW\*(C`do \-\- end\*(C'\fR only one \s-1MARC\s0 field at a time is visible for the \f(CW\*(C`marc_*\*(C'\fR fixes. .SS "The marc_each binder" .IX Subsection "The marc_each binder" All \f(CW\*(C`marc_*\*(C'\fR fixes will operate on all \s-1MARC\s0 fields matching a \s-1MARC\s0 path. For example, .PP .Vb 1 \& marc_remove(856) .Ve .PP will remove all 856 \s-1MARC\s0 fields. In some cases you may want to change only some of the fields in a record. You could write: .PP .Vb 3 \& if marc_match(856u,"google") \& marc_remove(856) \& end .Ve .PP in the hope it would remove the 856 fields that contain the text \*(L"google\*(R" in the \f(CW$u\fR subfield. Alas, this is not what will happen. The \f(CW\*(C`if\*(C'\fR condition will match when the record contains one or more 856u fields containing \*(L"google\*(R". The \f(CW\*(C`marc_remove\*(C'\fR Fix will delete \fBall\fR 856 fields. To correctly remove only the 856 fields in the context of the \f(CW\*(C`if\*(C'\fR statement the \f(CW\*(C`marc_each\*(C'\fR binder is required: .PP .Vb 5 \& do marc_each() \& if marc_match(856u,"google") \& marc_remove(856) \& end \& end .Ve .PP The \f(CW\*(C`marc_each\*(C'\fR will loop over all \s-1MARC\s0 fields one at a time. The if statement will only match when the current \s-1MARC\s0 field is 856 and the \f(CW$u\fR field contains \*(L"google\*(R". The \f(CW\*(C`marc_remove(856)\*(C'\fR will only delete the current 856 field. .PP In \f(CW\*(C`marc_each\*(C'\fR binder, it seems for all Fixes as if there is only one field at a time visible in the record. This Fix will not work: .PP .Vb 5 \& do marc_each() \& if marc_match(856u,"google") \& marc_remove(900) # <\-\- there is only a 856 field in the current context \& end \& end .Ve .SS "marc_copy, marc_cut and marc_paste" .IX Subsection "marc_copy, marc_cut and marc_paste" The Catmandu::Fix::marc_copy, Catmandu::Fix::marc_cut, Catmandu::Fix::marc_paste Fixes are needed when complicated edits are needed in \s-1MARC\s0 record. .PP The \f(CW\*(C`marc_copy\*(C'\fR fill copy parts of a \s-1MARC\s0 record matching a \s-1MARC_PATH\s0 to a temporary variable. This tempoarary variable will contain an \s-1ARRAY\s0 of HASHes containing the content of the \s-1MARC\s0 field. .PP For instance, .PP .Vb 1 \& marc_copy(650, tmp) .Ve .PP The \f(CW\*(C`tmp\*(C'\fR will contain something like: .PP .Vb 10 \& tmp:[ \& { \& "subfields" : [ \& { \& "a" : "Perl (Computer program language)" \& } \& ], \& "ind1" : " ", \& "ind2" : "0", \& "tag" : "650" \& }, \& { \& "ind1" : " ", \& "subfields" : [ \& { \& "a" : "Web servers." \& } \& ], \& "tag" : "650", \& "ind2" : "0" \& } \& ] .Ve .PP This structure can be edited with all the Catmandu fixes. For instance you can set the first indicator to '1': .PP .Vb 1 \& set_field(tmp.*.ind1 , 1) .Ve .PP The \s-1JSON\s0 path \f(CW\*(C`tmp.*.ind1\*(C'\fR will match all the first indicators. The \s-1JSON\s0 path \&\f(CW\*(C`tmp.*.tag\*(C'\fR will match all the \s-1MARC\s0 tags. The \s-1JSON\s0 path \f(CW\*(C`tmp.*.subfields.*.a\*(C'\fR will match all the \f(CW$a\fR subfields. For instance, to change all 'Perl' into 'Python' in the \f(CW$a\fR subfield use this Fix: .PP .Vb 1 \& replace_all(tmp.*.subfields.*.a,"Perl","Python") .Ve .PP When the fields need to be places back into the record the \f(CW\*(C`marc_paste\*(C'\fR command can be used: .PP .Vb 1 \& marc_paste(subjects) .Ve .PP This will add all 650 fields in the \f(CW\*(C`tmp\*(C'\fR temporary variable at the \fBend\fR of the record. You can change the \s-1MARC\s0 fields in place using the \f(CW\*(C`march_each\*(C'\fR binder: .PP .Vb 5 \& do marc_each() \& # Select only the 650 fields \& if marc_has(650) \& # Create a working copy \& marc_copy(650,tmp) \& \& # Change some fields \& set_field(tmp.*.ind1 , 1) \& \& # Paste the result back \& marc_paste(tmp) \& end \& end .Ve .PP The \f(CW\*(C`marc_cut\*(C'\fR Fix works like \f(CW\*(C`marc_copy\*(C'\fR but will delete the matching \s-1MARC\s0 field from the record. .SS "Rename \s-1MARC\s0 subfields" .IX Subsection "Rename MARC subfields" In the example below we rename each \f(CW$1\fR subfield in the \s-1MARC\s0 record to \f(CW$0\fR using the Catmandu::Fix::marc_cut, Catmandu::Fix::marc_paste and Catmandu::Fix::rename fixes: .PP .Vb 4 \& # For each marc field... \& do marc_each() \& # Cut the field into tmp.. \& marc_cut(***,tmp) \& \& # Rename every 1 subfield to 0 \& rename(tmp.*.subfields.*,1,0) \& \& # And paste it back \& marc_paste(tmp) \& end .Ve .PP The \f(CW\*(C`marc_each\*(C'\fR bind will loop over all the \s-1MARC\s0 fields. With \f(CW\*(C`marc_cut\*(C'\fR we store any field (\f(CW\*(C`***\*(C'\fR matches every field) into a \f(CW\*(C`tmp\*(C'\fR field. The \f(CW\*(C`marc_cut\*(C'\fR creates an array structure in \f(CW\*(C`tmp\*(C'\fR which is easy to process using the Fix language. Using the \f(CW\*(C`rename\*(C'\fR function we search for all the subfields, and replace the field matching the regular expression \f(CW1\fR with \f(CW0\fR. At the end, we paste back the \f(CW\*(C`tmp\*(C'\fR field into the record. .SS "Setting and remove \s-1MARC\s0 indicators" .IX Subsection "Setting and remove MARC indicators" In the example below we set every indicator1 of the 500 field to the value \*(L"0\*(R". We will use the Catmandu::Fix::Bind::marc_each bind with a loop variable: .PP .Vb 11 \& # For each marc field... \& do marc_each(var:this) \& # If the marc field is a 500 field \& if marc_has(500) \& # Set the indicator1 to value "0" \& set_field(this.ind1,0) \& # Store the result back into the MARC record \& marc_remove(500) \& marc_paste(this) \& end \& end .Ve .PP Using the same method indicators can also be deleted by setting their value to a space \*(L" \*(R". .SS "Adding a new \s-1MARC\s0 subfield" .IX Subsection "Adding a new MARC subfield" In the example below we append a new \s-1MARC\s0 subfield \f(CW$z\fR to the 500 field with value test. We will use the Catmandu::Fix::Bind::marc_each bind with a loop variable: .PP .Vb 11 \& # For each marc field... \& do marc_each(var:this) \& # If the marc field is a 500 field \& if marc_has(500) \& # add a new subfield z \& add_field(this.subfields.$append.z,Test) \& # Store the result back into the MARC record \& marc_remove(500) \& marc_paste(this) \& end \& end .Ve .SS "Remove all non-numeric fields from the \s-1MARC\s0 record" .IX Subsection "Remove all non-numeric fields from the MARC record" .Vb 8 \& # For each marc field... \& do marc_each(var:this) \& # If we have a non\-numeric fields \& unless all_match(this.tag,"\ed{3}") \& # Remove this tag \& marc_remove(***) \& end \& end .Ve .SH "WRITING" .IX Header "WRITING" .SS "Convert a \s-1MARC\s0 record into a \s-1MARC\s0 record (do nothing)" .IX Subsection "Convert a MARC record into a MARC record (do nothing)" .Vb 1 \& $ catmandu convert MARC to MARC < data.mrc > output.mrc .Ve .SS "Add a 920a field with value 'checked' to all records" .IX Subsection "Add a 920a field with value 'checked' to all records" .Vb 1 \& $ catmandu convert MARC to MARC \-\-fix \*(Aqmarc_add("900",a,"checked")\*(Aq < data.mrc > output.mrc .Ve .SS "Delete the 024 fields from all \s-1MARC\s0 records" .IX Subsection "Delete the 024 fields from all MARC records" .Vb 1 \& $ catmandu convert MARC to MARC \-\-fix \*(Aqmarc_remove("024")\*(Aq < data.mrc > output.mrc .Ve .SS "Set the 650p field to 'test' for all records" .IX Subsection "Set the 650p field to 'test' for all records" .Vb 1 \& $ catmandu convert MARC to MARC \-\-fix \*(Aqmarc_add("650p","test")\*(Aq < data.mrc > output.mrc .Ve .SS "Select only the records with 900a == book" .IX Subsection "Select only the records with 900a == book" .Vb 1 \& $ catmandu convert MARC to MARC \-\-fix \*(Aqmarc_map(900a,type); select all_match(type,book)\*(Aq < data.mrc > output.mrc .Ve .PP The \f(CW\*(C`all_match\*(C'\fR also allows a regular expressions: .PP .Vb 1 \& $ catmandu convert MARC to MARC \-\-fix \*(Aqmarc_map(900a,type); select all_match(type,"[Bb]ook")\*(Aq < data.mrc > output.mrc .Ve .SS "Select only the records with 900a values in a given \s-1CSV\s0 file" .IX Subsection "Select only the records with 900a values in a given CSV file" Create a \s-1CSV\s0 file with name,value pairs (need two columns): .PP .Vb 5 \& $ cat values.csv \& name,values \& book,1 \& journal,1 \& movie,1 \& \& $ catmandu convert MARC to MARC \-\-fix myfixes.txt < data.mrc > output.mrc \& \& with myfixes.txt like: \& \& do marc_each() \& marc_map(900a,test) \& lookup(test,values.csv,default:0) \& select all_match(test,1) \& remove_field(test) \& end .Ve .PP We use a \*(L"do \fBmarc_each()\fR ... end\*(R" loop because 900a fields can be repeated. If a \&\s-1MARC\s0 tag isn't repeatable this loop not isn't needed. With marc_map we copy first the value of a marc subfield to a 'test' field. This test we lookup against the \s-1CSV\s0 file. Then, we select only the records that are found in the \s-1CSV\s0 file (and return the correct value). .SH "DEDUPLICATION" .IX Header "DEDUPLICATION" .SS "Check for duplicate \s-1ISBN\s0 numbers in a \s-1MARC\s0 file" .IX Subsection "Check for duplicate ISBN numbers in a MARC file" In this example we extract from a \s-1MARC\s0 file all the \s-1ISBN\s0 numbers from the 020 and do a little bit of data cleaning using the Catmandu::Identifier project. To install this package, we run this command: .PP .Vb 1 \& $ cpanm Catmandu::Identifier .Ve .PP To extract all the \s-1ISBN\s0 numbers we use this Fix script 'dedup.fix': .PP .Vb 9 \& marc_map(020a, identifier.$append) \& replace_all(identifier.*,"\es+.*","") \& do list(path:identifier) \& isbn13(.) \& end \& do hashmap(exporter:YAML) \& copy_field(identifier,key) \& copy_field(_id,value) \& end .Ve .PP The first \f(CW\*(C`marc_map\*(C'\fR fix maps every 020 field to an identifier array. The \f(CW\*(C`replace_all\*(C'\fR cleans the data a bit and deletes some unwanted text. The \f(CW\*(C`do list\*(C'\fR will transform all the \s-1ISBN\s0 numbers to \s-1ISBN13.\s0 The \f(CW\*(C`do hashmap\*(C'\fR will create an internal mapping table of identifier,_id key value pairs. For very identifier, one or more _id can be stored. At the end of all \s-1MARC\s0 processing this mapping table is dumped from memory as a \s-1YAML\s0 document. .PP Run this fix as: .PP .Vb 1 \& $ catmandu convert MARC to Null \-\-fix dedup.fix < marc.mrc > output.yml .Ve .PP The output \s-1YAML\s0 file will contain the \s-1ISBN\s0 to document \s-1ID\s0 mapping. We only need the \s-1ISBN\s0 numbers with more than one hit. We need a little bit of cleanup on this \s-1YAML\s0 file to reach our final result. Use the following \&'cleanup.fix' script: .PP .Vb 2 \& select exists(value.1) \& join_field(value,",") .Ve .PP This first \f(CW\*(C`select\*(C'\fR fix selects only the records with more than one hit. The \f(CW\*(C`join_field\*(C'\fR will turn the array of results into a string. Execute this Fix like: .PP .Vb 1 \& $ catmandu convert YAML to TSV \-\-fix cleanup.fix < output.yml > result.csv .Ve .PP This will provide a tab delimited file of double isbn numbers in the \s-1MARC\s0 input file.