Hands-free Jekyll: Publish Every Day with Cron and Ansible

Ansible Jekyll Blogging Behind-The-Scenes

I absolutely LOVE Jekyll for web site publishing. 💕 Jekyll is the static website engine which publishes this very blog for your enjoyment!

I write content for this site as plain text files (Markdown), and commit them to a Git repository.

Static publishing helps you to concentrate on your content, first and foremost.

With static publishing, and I get all of the benefits of Git, like version-control, and it’s stored safely in a repo in GitHub.

I rely on Netlify to build and host Tutorial Works.

But if you’re running your own server, how do you set up automation to republish your static site?

In this article I’ll talk through how I do the publishing for my personal blog, with Jekyll, Ansible and cron.

Automate the publishing of a static site: Push or Pull?

There are a couple of ways can I think of, to automate the publishing of a static site.

You can either use a push approach, or a pull approach:

1. PUSH it real good

In this model, every time you push to GitHub, it can invoke a webhook, somewhere.

Me → Push to GitHub → GitHub invokes webhook → Webhook runs script to build site

To use this approach, I would need to expose (or “host”) a webhook on my web server, which would trigger a script to rebuild the website.

(A webhook is essentially just an HTTP endpoint that triggers an action.)

You could write your own tool to do this, or you could try the webhook project on GitHub which describes itself like this:

webhook is a lightweight configurable tool written in Go, that allows you to easily create HTTP endpoints (hooks) on your server, which you can use to execute configured commands.

Sounds ideal! But, there’s a downside: I could be opening up my little web server to a new attack surface.

As I see it, the risk is:

risk of the webhook utility having an undiscovered security flaw
PLUS
risk of my “configured command” (the site build script) having a security flaw

(Plus other risks, which I can’t think of for the minute!)

Of course, if I choose this option, I would need to keep the webhook tool regularly updated, too.

I’m sure webhook is stable and reliable. But since I’m managing my web server on my own, I’d rather avoid the need for a webhook, unless I really need to.

Which brings me onto the other approach, which is to pull.

2. PULL up to the bumper

In the pull approach, I can get the server to pull the latest changes, on a regular basis, chronologically… (can you tell where I’m going with this?)

Me → Push to GitHub

Timer → Fetch code from GitHub → Build site.

The first step is to push my code to the remote repository (GitHub). Then separately, on a timer or schedule, I pull the code from GitHub onto the server, and run a rebuild.

The pull approach might work for you if you’re happy to publish on a fixed schedule.

This approach decouples the code push, from the site build. So, my site is no longer rebuilt every time I do a git push. This means my posts won’t get published immediately.

But that’s OK for me, because I don’t write more than one article per day, and I’m happy with publishing on a fixed schedule.

In my case, I would choose a schedule of once per day; probably at night, when there’s less website traffic.

And to set up the schedule, I can use the cron scheduler on Linux.

I decided to go with the pull approach, so in the next section I’ll show you how I set it up.

Setting up a Jekyll site to publish once per day

Step 1. Write the build script

The first step is the build script. I want to write a script that will run my build repeatedly, in a predictable fashion.

Because I want the build to run predictably, I use a container. The container will have all of the dependencies I need (Jekyll requires Ruby, for example) and it will run the same way each time.

But, I need to trigger this container, so I write a bash script to do that. The script will:

  1. Take an argument to say whether we are building for development or production. (The production build adds things like analytics.)

  2. Start a ruby container. The ruby container also includes Bundler. Since I’m using CentOS, I use Podman as my container engine, but you could also use docker for this.

  3. Share a directory with the Podman container, so that the build can write the HTML files to my file system.

  4. Share a location for dependencies in the environment variable BUNDLE_APP_CONFIG. This is specific to Ruby Bundler. It is the location where Bundler will cache dependencies, so that I don’t download them every time the build runs.

And, voila, the script looks like this:

#!/bin/bash -l
#
# Publishes this Jekyll static site
# Usage: ./publish.sh <env> <bundleCacheLocation>

JEKYLL_ENV=${1:-production}
BUNDLE_APP_CONFIG=${2:-/usr/local/bundle}

echo "Building site..."

# Run the bundle install and jekyll build in a container
# The ruby container uses /usr/local/bundle as a local artifact location
podman run --rm \
    -v "$PWD":/usr/src/site:Z \
    -v ${BUNDLE_APP_CONFIG}:/usr/local/bundle:Z \
    -w /usr/src/site \
    -e JEKYLL_ENV=${JEKYLL_ENV} \
    docker.io/library/ruby:2.7 /bin/bash -c "bundle install && bundle exec jekyll build"

echo "Site build complete."

Notice that I’m pinning to a specific tag of the Ruby image (2.7) rather than just latest. It gives me a level of predictability!

I commit this into my site’s Git repository, as ./publish.sh.

I test this script by running it in a temp directory somewhere. And it all seems good. 🆗

So next step: I need to set up a cron job to run the publish script.

Step 2. Clone the repo and set up the cron job

The first thing to do in this stage is clone the repository on your remote server.

So first, do a git clone ... into the directory you want.

Next, I want to run this script automatically, on a schedule. (So that I can be sipping a Jungle Bird somewhere while it runs.)

The commands to build the site are going to be:

cd /var/www/mysite.com/mysite.git
git pull
./publish.sh >/dev/null 2>&1

This is basically:

  1. Change to the Git repository on the server

  2. Pull the latest code from GitHub

  3. Run the publish script, redirecting any output to /dev/null (a black hole, basically – I really don’t care about any logs.)

  4. Jekyll publishes the website to the ./_site directory

I’m going to use cron to publish the static site on a schedule. It’s going to publish every day, at 04:05am, local time. The cron pattern for this is:

5 4 * * *

So I add it to my crontab using crontab -e :

5 4 * * * cd /var/www/mysite.com/mysite.git && git pull && ./publish.sh >/dev/null 2>&1

But, because I love automation, I’m going to add this to an Ansible playbook instead.

Setting up the cron job with Ansible

I already use Ansible to configure my web server. I have a playbook which I already use, and I just want to add some new tasks to it.

If you don’t already have a playbook, you can look at the ansible-examples repo and create one of your own.

So I add these tasks to my role’s tasks/main.yml:

This first task uses the git module, to clone the repository if it doesn’t exist already:

- name: mysite.com - Ensure git repo exists
  git:
    repo: "https://github.com/myuser/mysite.git"
    dest: /var/www/mysite.com/mysite.git
    update: no

The second task uses the cron module. This will create an entry into the root user’s crontab, to publish the site every day at 4:05am:

- name: mysite.com - Ensure cron job exists to publish site
  cron:
    name: "Publish mysite.com from source"
    user: root
    minute: "5"
    hour: "4"
    job: "cd /var/www/mysite.com/mysite.git && git pull && ./publish.sh >/dev/null 2>&1"

And that’s it! Now the cron job will run on schedule and publish the site every day.

Write anywhere, publish on schedule

That’s almost it for this post! I’ve shown you how I set up my Jekyll site to publish automatically on a schedule.

You can create the crontab entry manually, or, if you already have an Ansible playbook, add a couple of tasks to set up the Git repo and cron job, and you’re all set.

When this is all set up, your static site will be published into the $GIT_REPO/_site directory. You can configure your web server (e.g. Apache or Nginx) to serve from this directory.

With publishing taken care of, you can write content from anywhere

Here’s a neat bonus you get from this setup. You don’t need to write and edit files locally anymore.

With this setup, I can use the GitHub website, or even a Git app on my phone (I recommend Working Copy), to create and edit my Markdown files.

Simply write and publish on the move, and the cron job will take care of pulling the latest changes, and publishing every day, like clockwork.

For an example of a static site that I publish with this method, see this Git repo for my Knowledge Base:

See the Git repo

Keep static publishing, y’all!