There must be a better way
Since I now use Jekyll to generate this web site, I had to find a way to convert tag names into nice ASCII-only-lowercase symbols. For example, Free Software would become free-software
and Éducation would become education
.
One solution I came up with is a slugify
filter which uses the unicode
ruby gem. After converting the string to lower case and decomposing æ and œ to ae and oe respectively, it uses the unicode normalization form KD which separates individual characters from accentuation marks as shown in this figure. Then only plain ASCII letters are kept, spaces are replaced by hyphens, and the string is reassembled.
# -*- coding: utf-8 -*-
module Slugify
require 'unicode'
def slugify(input)
= Unicode::nfkd(input.downcase.gsub('æ', 'ae').gsub('œ', 'oe'))
t .gsub(/[^\w\s-]/, '').gsub(/[\s-]+/, '-').downcase
tend
end
and
{% assign tn = '{{ tag | slugify }}'{% assign t = '{{ tag }}' %}
This way, I can link to the tag
page using <a href="/blog/tag/{{ tn }}">{{ t }}</a>
without fearing that some software chokes on the URL. It works well and I am now satisfied with this function, so I removed the questions that were there in previous instances of this post. The only thing I dislike is the double downcase
call, due to the fact that some entities cannot be downcased without knowing more about the used language.
Edit: updated to match the name and behaviour of Django’s slugify
as per Ricardo Buring comment with an additional “æ” to “ae” and “œ” to “OE” translations.