table of contents
podget(7) | podget(7) |
NAME¶
Podget - Simple tool to automate downloading of podcasts.
SYNOPSIS¶
podget <options>
DESCRIPTION¶
Podget is a simple podcast aggregator/downloader optimized for scheduled background jobs (i.e. cron).
It features support for:
- Downloading podcasts from RSS and ATOM XML feeds.
- For sorting the files into folders and categories.
- For importing URLs from iTunes PCAST files and OPML lists.
- Automatic M3U & ASX playlist creation.
- Cleanup of old files.
- Automatic UTF-16 conversion for feeds hosted on MS Windows Servers.
Podget works by extracting the <enclosure> tags from the feed then downloading the specified URL. There is one exception when Podget will ignore <enclosure> tags and that is when they are within <podcast:liveItem> tags because Podget is an aggregator and not a player so has not been optimized for live content.
OPTIONS¶
- -c <FILE> | --config <FILE>
- Name of configuration file.
- --create-config <FILE>
- Create configuration file and exit.
- -C | --cleanup
- Skip downloading and only run cleanup loop.
- --cleanup_days <NUMBER>
- Cleanup files older than <NUMBER> days.
- --cleanup_simulate
- Simulate cleanup loop to see what files would be deleted.
- -d <DIRECTORY> | --dir_config <DIRECTORY>
- Directory that configuration files are stored in.
- --dir_session <DIRECTORY>
- Directory that session files are stored in.
- -f | --force
- Force download of items from each feed even if they've already been downloaded.
- -h | --help
- Display condensed help dialog.
- -l <DIRECTORY> | --library <DIRECTORY>
- Directory to store downloaded files in.
- -n | --no-playlist
- Do not create M3U playlist of new items.
- -p | --playlist-asx
- In addition to M3U playlists, create ASX playlists.
- --playlist-per-podcast
- Create a playlist of new items for each podcast feed.
- -r <COUNT> | --recent <COUNT>
- Download only the <COUNT> newest items from each feed.
- --serverlist <FILE>
- Use <FILE> as serverlist instead of default.
- -s | --silent
- Run silently (for cron jobs).
- -v
- Set verbosity to level 1.
- -vv
- Set verbosity to level 2.
- -vvv
- Set verbosity to level 3.
- -vvvv
- Set verbosity to level 4.
- --verbosity <LEVEL>
- Set verbosity level (0-4).
- -V | --version
- Display version.
- OPML List Options:
- --import_opml <FILE or URL>
- Import servers from OPML file or HTTP/FTP URL.
- --export_opml <FILE>
- Export serverlist to OPML file.
- --import_pcast <FILE or URL>
- Import server from iTunes PCAST file or HTTP/FTP URL.
CONFIGURATION FILES¶
By default, Podget relies on two configuration files.
- podgetrc
- This is a file with most options for how Podget should run.
If it is required to run podget with different options for certain feeds, then additional configuration files can be created and used with the --config or -c option. When this option is run with a new filename that does not exist yet, the file is created with default options that can then be customized as necessary.
- serverlist
- This is a file of all the feeds that Podget should monitor and download
from.
If you need to separate your feeds into multiple lists, then additional files can be created with the --serverlist option. When this option is run with a new filename that does not exist yet, the file is created with a default list of a single feed. Whenever a new list is created, Podget will download a single item from the single feed included by default to verify that everything is working.
For a description of the options available for this file, please refer to the SERVER LIST CONFIGURATION section of this document.
USER CONFIGURATION DIRECTORY¶
The first time a user runs podget, it will create a configuration directory. In this directory, it will install the default configuration files.
Where this configuration directory is automatically placed is dependent upon the version of Podget that you used when you first ran it.
For version 0.8.10 and before:
For later versions:
If a user wants to clean up their $HOME directory by moving their existing configuration directory to either of the new locations, it can be done but it is necessary to remember to remove the leading period so it is no longer a hidden directory.
These locations can be overridden by the use of the --dir_config or -d option when you run podget.
WHICH CONFIGURATION DIRECTORY IS USED¶
Since there are at least three possible locations for the configuration directory then it is necessary to know which one podget will use. To keep things simple, Podget uses the first one it finds and tests in the following order:
1. $HOME/.podget
2. $XDG_CONFIG_HOME/podget
3. $HOME/.config/podget
This location testing is skipped by the use of the --dir_config or -d option.
AUTOMATIC CLEANUP¶
You can enable automatic cleanup with every run by configuring it in your podgetrc file. Simply set the following options:
However, some people prefer to run cleanup as a separate cron session. To do that, set the options in podgetrc to:
# Autocleanup.
# 0 == disabled
# 1 == delete any old content
cleanup=1
# Number of days to keep files. Cleanup will remove anything
# older than this.
cleanup_days=7
# Autocleanup.
# 0 == disabled
# 1 == delete any old content
cleanup=0
# Number of days to keep files. Cleanup will remove anything
# older than this.
cleanup_days=7
Then add something similar to this example to your crontab:
# Once a week on Sunday at 04:07AM
07 04 * * Sun /usr/bin/podget -C
MULTIPLE CONCURRENT SESSIONS¶
Podget checks for sessions using the same core configuration file that may already be running when it starts and exits if any are found. This insures that any long running sessions are not interrupted by new ones.
If you have feeds that require distinct configurations, then you can enable them to run simultaneously by using separate configuration files for each. Then if you have sufficient bandwidth, you can call them all at the same time.
Example Crontab configuration:
00 02 * * * /usr/bin/podget -c podgetrc-group1
00 02 * * * /usr/bin/podget -c podgetrc-group2
SEQUENTIAL SESSIONS¶
Sometimes, you have feed lists that use the same configuration but you wish to keep separate. There are two ways to handle this.
First, run then separately from crontab with sufficient time in between so they don't interfere with each other.
00 02 * * * /usr/bin/podget --serverlist RSS-Feeds
00 03 * * * /usr/bin/podget --serverlist ATOM-Feeds
The second option is to place them into a shell script so they are called sequentially and do not interfere with each other and then add it to your crontab.
#!/usr/bin/env bash
/usr/bin/podget --serverlist RSS-Feeds
/usr/bin/podget --serverlist ATOM-Feeds
ENABLING DEBUG OUTPUT¶
Debug output can be enabled in two ways.
The first way is by uncommenting the DEBUG option in your podgetrc and setting it to '1'. However this way will not enable DEBUG until just over 1400 lines of script have run and when podgetrc finally is read. This is sufficient for most issues.
The second way is from the command-line and enables debug as early as possible.
Simply execute podget like so:
$ DEBUG=1 podget -vvvv
You can enable other options as well if you need to but for debugging purposes, it is highly recommended that you enabled as much verbosity as possible.
SERVER LIST CONFIGURATION¶
By default, Podget uses serverlist for the default list of servers to contact. However you can configure the name with the config_serverlist variable in your podgetrc file.
Feeds are listed one per line in the serverlist file.
Default format with category and name:
Alternate Formats:
1. With a category but no name.
<url> No_Category <name>
<url> . <name>
1. URL Rules:
B. You may use underscores and dashes.
C. You can insert date substitutions.
%MM% == Month
%DD% == Day
D. Category disabling:
- With a name, the category must either be a single period (.) or 'No_Category'.
- If the name is blank, the category can also be blank.
3. Name Rules:
A. If you are creating ASX playlists, make sure the feed name does not have any spaces in it and the filename cannot be blank.
B. You can leave the feed name blank, and files will be saved in the category directory.
C. Names with spaces are only compatible with filesystems that allow for spaces in filenames. For example, spaces in feed names are OK for feeds saved to Linux ext partitions but are not OK for those saved to Microsoft FAT partitions.
D. Feed names can be disabled by leaving them blank.
4. Disable the downloading of any feed by commenting it out with a leading #.
Example:
http://www.lugradio.org/episodes.rss Linux LUG Radio
Example with date substitution in the category and a blank feed
name:
http://downloads.bbc.co.uk/rmhttp/downloadtrial/worldservice/summary/rss.xml
News-%YY%-%MM%-%DD%
Example of two ways to do a feed with authentication:
http://somesite.com/feed.rss CATEGORY Feed Name USER:username PASS:password
http://username:password@somesite.com/feed.rss CATEGORY Feed Name
NOTE: The second method will fail if a colon (:) is part of the username or password. Both methods will fail if a space is part of the username or password.
- Common Options:
- OPT_CONTENT_DISPOSITION
- Attempt to get filename from the Content-Disposition tag that is part of wget --server-response.
- OPT_DISPOSITION_FAIL
- This option works in conjunction with OPT_CONTENT_DISPOSITION by removing any URLs that fail to receive a filename from the COMPLETED log. This allows them to be automatically retried the next time a session runs. If this option is added to a feed that has already been downloaded then the user will need to remove the URLs for the problematic files from the COMPLETED log manually. On one feed this allowed for the improvement of the number of filename problems from approximately 15% to under 2% over the course of 6 sessions. Those sessions can occur sequentially on one day or as part of your established cron rotation.
- OPT_FEED_ORDER_ASCENDING
- By default, Podget assumes that items in a feed will be listed from newest to oldest (descending order). This option will modify Podget's handling of the feed for those that are listed from oldest to newest. This option will not have any noticeable effect for feeds where you want to download every item. It will have an effect for new feeds when combined with the --recent [COUNT] option.
- OPT_FEED_PLAYLIST_NEWFIRST
- Most playlist options create lists of just the new items that are downloaded in the current session. This option creates or updates a full playlist for all items available for a feed sorted from newest to oldest based on the modification date/time of the file.
- OPT_FEED_PLAYLIST_OLDFIRST
- Same as OPT_FEED_PLAYLIST_NEWFIRST except playlist is ordered from oldest to newest.
- OPT_FILENAME_LOCATION
- Some feeds do not have the detailed filename listed in the FEED but rather rename the file on redirection. This option addresses that issue by attempting to grab the filename from the last 'Location:' tag in the output of 'wget --server-response'.
- OPT_FILENAME_RENAME_MDATE
- For feeds that use a singular filename for each item that is identified by a long somewhat incomprehensible string in the URL. These feeds were previously fixed with FILENAME_FORMATFIX4 which would append the string to the common filename to produce unique filenames for each item. However this produced filenames that were not very easy to understand. This option gives us another method for dealing with these common filenames. This appends the date of the files last change (modification date) as a prefix to the filename in the format of YYYYMMDD_HHhMMm_<common-part>. This makes the filenames sortable and gives the user something that makes a moderate amount of sense. Does not work for all feeds, for some feeds the last modification time for each file is the time of download. Which may be acceptable in some situations but can cause confusion when downloading more than one item at a time from a feed.
- OPT_WGET_DEFUSERAGENT
- Configure Wget to use it's default user-agent (normally formatted similar to "Wget/1.21.2") and to not use either Podget's default user-agent ("Podget") or a custom agent set in WGET_BASEOPTS in podgetrc.
- OPT_NO_CERT_CHECK
- Disable wget SSL certificate verification. This is common used for feeds that are using self-signed certificates.
- OPT_PREFER_IPv4 or OPT_PREFER_IPv6
- Configure wget so that when a DNS lookup gives a choice of several addresses that it should connect to the specified family first.
Examples:
http://somesite.com/feed.rss CATEGORY Feed Name OPT_PREFER_IPv4
http://somesite.com/feed.rss CATEGORY Feed Name OPT_PREFER_IPv6
http://somesite.com/feed.rss CATEGORY Feed Name OPT_WGET_DEFUSERAGENT
http://somesite.com/feed.rss CATEGORY Feed Name OPT_NO_CERT_CHECK
http://somesite.com/feed.rss CATEGORY Feed Name OPT_CONTENT_DISPOSITION
http://somesite.com/feed.rss CATEGORY Feed Name OPT_CONTENT_DISPOSITION
OPT_DISPOSITION_FAIL
http://somesite.com/feed.rss CATEGORY Feed Name OPT_FILENAME_LOCATION
http://somesite.com/feed.rss CATEGORY Feed Name OPT_FILENAME_RENAME_MDATE
http://somesite.com/feed.rss CATEGORY Feed Name OPT_FILENAME_LOCATION
OPT_FILENAME_RENAME_MDATE
http://somesite.com/feed.rss CATEGORY Feed Name OPT_FEED_ORDER_ASCENDING
http://somesite.com/feed.rss CATEGORY Feed Name OPT_FEED_PLAYLIST_NEWFIRST
http://somesite.com/feed.rss CATEGORY Feed Name
OPT_FEED_PLAYLIST_OLDFIRST
- RSS Feed Options:
- There are three options for RSS Feeds that are not supported for ATOM
feeds.
The first two are related with the renaming the downloaded files with the contents of the <TITLE> tag from the HTML and the third is to expand what tags Podget gets content from.
- OPT_FILENAME_RENAME_TITLETAG
- This first version is for handling feeds that place the <TITLE> tag before the <ENCLOSURE> tag. The majority of tested feeds that use <TITLE> tags follow this order.
- OPT_FILENAME_RENAME_REVTITLETAG
- The second version is for handling feeds that have the <ENCLOSURE> tag first followed by the <TITLE> tag.
- OPT_RSS_MEDIACONTENT
- This third option will enable Podget to download content from <MEDIA:CONTENT> tags in addition to <ENCLOSURE> tags.
Examples:
http://somesite.com/feed.rss CATEGORY Feed Name OPT_FILENAME_RENAME_TITLETAG
http://somesite.com/feed.rss CATEGORY Feed Name OPT_FILENAME_RENAME_TITLETAG
OPT_FILENAME_RENAME_MDATE
http://somesite.com/feed.rss CATEGORY Feed Name
OPT_FILENAME_RENAME_REVTITLETAG
http://somesite.com/feed.rss CATEGORY Feed Name OPT_RSS_MEDIACONTENT
To determine if the feed uses <TITLE> tags and in which order, run the following with the URL for the feed:
This will produce a list of lines that start with either TITLE or URL. The URL is from the <ENCLOSURE> tag and the TITLE is obviously from the <TITLE> tag. On many feeds the first thing you will notice is a few uses of the <TITLE> tag before the first URL is specified. In that case, Podget uses the last TITLE found, so the earlier ones are discard. The important part is when we get to the first URL, from there we need to determine if the title for that item came before or after the URL. If it comes first then we use OPT_FILENAME_RENAME_TITLETAG for it. If the title comes second then we use OPT_FILENAME_RENAME_REVTITLETAG.
wget -O - http://somesite.com/feed.rss | sed -n -e :a -e 's/.*<enclosure.*url\s*=\s*"\([^"]+\)".*/URL 1/Ip' -e t -e "s/.*<enclosure.*url\s*'=\s*\([^i]\+\)'.*/URL \1/Ip" -e t -e 's/.*<title>\(.*\)<[/]title>.*$/TITLE 1/Ip' -e t -e '/\(<enclosure\|<title>\).*/I{N;s/ *0 /;T;ba}'
On some feeds, the downloaded filename will not have anything identifiable to determine which TITLE goes with it. In those cases it may be necessary to download a few items and listen to them to determine which order they use.
On some feeds, it will be discovered that the downloaded filename and the TITLE are very similar. In those cases, it is left to the user to determine which they prefer.
On some feeds, the TITLE will have very little to specify when it was recorded and it may be useful to use the OPT_FILENAME_RENAME_MDATE option to add a date tag to each filename as it is converted.
And on some feeds, there will be a complete absence of TITLE lines. Those feeds do not use the tag so using either option will not produce any changes.
- Atom Feed Options:
- The following options are available for advanced handling of Atom feeds.
- ATOM_FILTER_SIMPLE
- This option will enable filtering for just audio or video files from a feed.
- ATOM_FILTER_TYPE="type"
- This option allows more detailed filtering of the variety of types available. This can limit the files downloaded to one type (example: "audio/mpeg") or to a few types (example: "(audio|video)/.*" for all audio and video types, OR "audio/.*" for all audio types).
- ATOM_FILTER_LANG="language"
- If an Atom feed supports multiple languages for enclosures, then you can use this option to filter to only those you desire. You can limit to one language (example: "en" for just English) or combine several supported languages to get them all (example: "(en|es|fr)" to download files in English, Spanish and French. How the languages are defined may vary from feed to feed.
Note: If you do not enable any of the ATOM_FILTER options on a feed with multiple enclosures per item, when you run podget it will tell you the count per type or language to help you decide if you should enable the filters to reduce the number of files to be downloaded.
Examples:
http://somesite.com/feed CATEGORY Feed Name ATOM_FILTER_SIMPLE
http://somesite.com/feed CATEGORY Feed Name
ATOM_FILTER_TYPE="audio/mpeg"
http://somesite.com/feed CATEGORY Feed Name
ATOM_FILTER_TYPE="(audio|video)/.*"
http://somesite.com/feed CATEGORY Feed Name ATOM_FILTER_LANG="en"
http://somesite.com/feed CATEGORY Feed Name
ATOM_FILTER_LANG="(en|es|fr)"
http://somesite.com/feed CATEGORY Feed Name
ATOM_FILTER_TYPE="audio/mpeg" ATOM_FILTER_LANG="en"
HANDLING UTF-16 FEEDS¶
Some servers provide their feeds in UTF-16 format rather than the more common UTF-8.
To automatically convert these files, create a secondary serverlist in your configuration directory:
Remember to change the name of the serverlist to match what you set it to with config_serverlist if you changed it.
serverlist.utf16
EXAMPLE CRON JOB¶
Once podget is running correctly, it's most useful if you run it from a cron job so that the new episodes are available to play or load onto a portable player and you don't have to wait for them to download.
To edit your crontab, do:
$ crontab -e
Then add one line similar to this example:
15 04 * * * /usr/bin/podget -s
This will run podget at 4:15 AM every day.
In some cases, you might need to add a few directories to your PATH variable so that Podget can find everything it needs.
Then the job might look like:
15 04 * * * PATH=/opt/local/bin:/usr/local/bin:$PATH /usr/bin/podget -s
AUTHORS¶
Dave Vehrs
10 February 2023 |