Quick Python Script to Get Bulletins

The Seven Seas Cruising Association (SSCA) puts out a great monthly bulletin that's a must read for anyone planning or dreaming about the cruising life. Based on a recommendation from Jim Trefethen's book, "The Cruising Life: A Commonsense Guide For The Would-Be Voyager", I signed up for a membership and started reading the bulletins (You can read a sample bulletin on the site.)

The bulletins are available as pdf's on the site and all are available for download for members. I manually downloaded the recent few months, but soon realized I wanted them all for my personal electronic research library. So I broke out my programming skills and created a quick python script to download all the files.

The script is available on github, but you will need to edit the script and enter the correct urls. After all, the bulletins are a perk of membership, so you need to join first.

Here are a couple of fun tricks I learned in the process.

Formatting strings in Python 3 is a little different, but easy enough to grok. The SSCA bulletins are all named sscaYYMM.pdf, where YY is the two digit year, and MM is the two digit month.

A couple of for loops with ranges create the variables, then it's just a matter of creating the file name.

def main():

    # Create year list
    for year in range(3, 16, 1):
        for month in range(1, 13, 1):
            # add break point of 1502
            if year == 15 and month == 2:
                print('We are done')
            filename = 'ssca{:0>2}{:0>2}.pdf'.format(year, month)
            if checkfile(filename):
                print('Getting file {}'.format(filename))
                print('file exists')

As you can see in the main function, the year for loop goes from 3 (2003) through 15. Remember, ranges in python are non-inclusive, so the end point is not included. The month loop, as expected, returns 1 through 12. I put in a quick check to stop at February 2015 since we aren't there yet at the time of this writing.

The trick with the file name is the {:0>2} bit. This pads the integer to two places with zero's, so 3 returns 03, and {:0>3} would return 003.

The rest of the script just checks to see if the file is already there, and if so skips it, otherwise it uses curl to pull down the file.

Getting the file is pretty straight forward, but it's important to think about the file we are getting. In this case, .pdf files, so it's a good idea to throw the wb on the file open command.

def getTheFile(filename):

    tempurl = setVariables()
    url = '{}{}'.format(tempurl, filename)
    print('url: {}'.format(url))
    fp = open(filename, 'wb')
    c = pycurl.Curl()
    c.setopt(c.URL, url)
    c.setopt(c.WRITEDATA, fp)
        result = c.perform()
    except pycurl.error as error:
        print('curl error: {}'.format(error))

The tempurl is just a call to a function at the top of the script for the bulletin location url. The user of the script needs to log into the SSCA site as a member and get the url.

This was a quick script to solve a simple problem, and I enjoyed creating it.

Time to get back to reading the old bulletins.


Similar Posts

Return to blog