Samuel Tardieu @ rfc1149.net

Jekyll and live feeds update

,

Before I use Jekyll, Wordpress was running my blog. One thing I noticed while using Wordpress was that Google and other blog search engines were fetching my new posts a few seconds after I published them.

To achieve these performances, Wordpress use two different systems:

  1. It sends a ping to some services which in turn fetch your feeds. Some concentrators such as ping-o-matic allow you to ping them, and they in turn ping various search engines for you so that you don’t have to. Then each search engine decides whether or not it will crawl your blog again.

  2. Wordpress also uses the recent pubsubhubbub protocol (what a lovely name!) In your feed, you declare the address of a hub where interested parties can send subscription requests. Then, when a new article is published on your blog, Wordpress sends a ping to the hub, and the hub retrieves your feed. If the feed has changed, it is sent to the subscribers using a callback address they registered when they subscribed. This way, interested services such as Google do not have to retrieve the feed themselves, as it will get pushed to them when it contains new items.

It is easy to enhance a Jekyll blog with the pubsubhubbub system, because:

  • there exists public open pubsubhubbub hubs, such as the well known https://pubsubhubbub.appspot.com;
  • you may send the ping message from everywhere, not necessarily from the server.

The first thing to do is to add hub information in your Atom or RSS feeds. For an Atom feed, you may add the following into the feed section

<feed xmlns="http://www.w3.org/2005/Atom">
  <link rel="hub" href="https://pubsubhubbub.appspot.com"/>
  ...
</feed>

while a RSS feed would contain

<rss xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link rel="hub" href="https://pubsubhubbub.appspot.com"/>
    ...
  </channel>
</rss>

Then you may want to ensure that you can tell the hub that your feed has some fresh interesting content by pinging it. If you don’t, your feed will be retrieved at regular intervals, but you will lose the benefit of using pubsubhubbub. If you are using rake for your development, you may want to create a :ping task which will send the ping when you run it:

desc 'Ping pubsubhubbub server.'
task  do
  require 'cgi'
  require 'net/http'
  printHeader 'Pinging pubsubhubbub server'
  data = 'hub.mode=publish&hub.url=' + CGI::escape("http://address.of.your/feed/")
  http = Net::HTTP.new('pubsubhubbub.appspot.com', 80)
  resp, data = http.post('http://pubsubhubbub.appspot.com/publish',
                         data,
                         {'Content-Type' => 'application/x-www-form-urlencoded'})

  puts "Ping error: #{resp}, #{data}" unless resp.code == "204"
end

If you prefer to use make, then a similar target using wget or curl would do the job. The only thing you need to do is send a POST request to http://pubsubhubbub.appspot.com/publish with an URL-encoded form containing the following two fields:

  • hub.mode: a single string publish.
  • hub.url: the URL of your updated feed. This can be repeated multiple times if several feeds have been updated at once.

Note that in the real life, my rake rule is much more complex: since I have separate feeds for the two languages I use on this blog, as well as one feed per tag, my Rakefile contains code to check whether posts have been updated in the last 24 hours, and all the feeds that might have changed (and only these) will be signalled to the hub.

What can you do with those realtime updates? You can start using services such as twitterfeed to post twitter notices of your blog posts right after they appear on your site, or you can use PuSH Bot to get live updates in your XMPP stream (in Google Talk for example). This is really as easy as pie, there is no reason your blog should not be using it right now.

How will I publish this very post? I will just do

rake install ping

and be done with it.

blog comments powered by Disqus