13 Feb 2014

Blog Refurbished

Submitted by blizzz

Finally, finally, I found the time (spread over several months) to refurbish my blog. Not that it took so long, but spare time is rare these days.

I decided to stick with Drupal and created a fresh and clean installation of 7 to replace the old Drupal 6. Now Drupal is somewhat overkill for a simple blog, but I the alternatives did not convince me for various reasons.

Now I do not want to dive into why or why not this and that, but point out some remarkabilities with regard to Drupal. Setting up Drupal is straightforward of course, and everything works fine and smooth, but the real work comes with adjusting it to become what you want. Its flexibility allows to realize anything, on the other hand it is also a reason why some things are laborious to accomplish.

Layout and Modules

I chose Bamboo as base theme and sub-themed it. The blog looks now less stale, long code parts are presented properly and it has a mode for mobile devices. It required to dive in into Drupal theming a bit, but not too much. The documentation provided with Bamboo was already very helpful. The big plus of a sub-themed theme is that you easily can update the main theme without patching around endlessly. Theoretically it still can break your layout, if major changes would be applied. In the end, this task was pleasant to complete.

Afterwards it was about selecting modules and do the configuration. Mostly it was searching, installing, configuring, done. Just three things i want to point out here:

  1. There's no easy solution for a guest preview as offered by Wordpress. You can achieve this by doing something complicated (it did not follow this) or using the view_unpublished module. It does not offer the same convenience, but is good enough.
  2. Also finally standard elements like captions and buttons are localized into most common languages, e.g. English, Spanish, Chinese, Russian, Arab and German. Drupal does not ship translations by default.
  3. Avoiding blog spam. On the old version I used reCaptcha. I believe the only type of commentators it kept away were authentic people, instead I had the doubtful pleasure to moderate tons of SEO spam. Now I use a honeypot approach and so far (in testing) it works incredibly good and does not get in the way of real people. I an very fond of this.

Upgrade and Maintenance

I wished I could get the latest Drupal from the repositories, either original Ubuntu ones or a PPA. Web software evolves fast, releases fast and often closes security issues. Unfortunately, neither is provided (only older packages in the 12.04 repositories).

So I need to keep Drupal up to date by hand. Who has ever read the update instructions knows, that you don't want to do it by hand. A lot of stuff to do. Perfect condition for the lazy CS guy and a good opportunity to refresh my shell scripting. I could automate a lot of the ugly and boring stuff. What is left is for me is to kick off the script, and get in and out of the maintenance mode. Even this can be achieved without human interaction, so far i prefer to keep the control. In the end, I need to ensure everything works as expected anyway.

Migration

The fun part. First, why did I not upgrade from Drupal 6 to 7, but made everything from scratch? Because I did some decision with the old configuration that were not so useful. Then, there were some modules that were discontinued or replaced with a lacking upgrade path. And somewhere in my head was stuck, that an upgrade was problematic or not recommended, though this is probably of goof of my own memory. Well, in the end almost everything was ready and was just waiting for the content.

To migrate the content, i.e. blog posts, static pages, comments, tags, from Drupal 6 to 7 was easy in the end, once you found the way and fixed what was missing.

There is a module that provides exactly this transfer from an old Drupal 6 installation to a new Drupal 7 one, providing a GUI. I really did not want to write an upgrade script, because I would have needed to get into those details again, while all the content types were standard ones. So, GUI was a plus. At that time there was no stable release including the GUI, though, so I took the development version. Took it, run it, was delighted.

Only a little bit later I found out, that the tags were not assigned and node and term IDs (tags) were shuffled.

Reassigning the tags worked with some SQL select and insert.

INSERT INTO field_data_field_tags (entity_type, bundle, deleted, entity_id, revision_id, language, delta, field_tags_tid) 
SELECT 'node' AS 'entity_type', 'blog' AS 'bundle', 0 AS 'deleted', node.nid AS 'entity_id', node.nid AS 'revision_id', 'und' AS 'language', (@jDelta := @jDelta +1) AS 'delta', taxonomy_term_data.tid AS 'field_tags_tid' 
FROM taxonomy_term_data, node, oldDatabase.term_data, oldDatabase.node, oldDatabase.term_node, (SELECT @jDelta := 0) AS jDelta 
WHERE oldDatabase.term_node.nid = oldDatabase.node.nid AND oldDatabase.term_node.tid = oldDatabase.term_data.tid AND taxonomy_term_data.name = oldDatabase.term_data.name AND node.title = oldDatabase.node.title ORDER BY entity_id;

So, the Node IDs and Term IDs were left. This is a problem, because they are contained in the URLs. From a SEO point of view, keeping them different will confuse search engines. Likely that they get it right after a while, but as a former SEO consultant you want to do it the right way. Changing them back would work, but the IDs are used everywhere and there is a lot of tables. Before I decided for the migrate module I considered migrating the content just by copying it from the old to the new database, but things changed are without getting really down into it, many new tables and columns remained unclear.

The lazy approach was to to redirect the old node IDs to the new ones.

SELECT CONCAT('redirect 301 /node/', oldDatabase.node.nid, ' http://www.arthur-schiwon.de/', alias)  
FROM node, url_alias, oldDatabase.node 
WHERE node.title = oldDatabase.node.title AND source = CONCAT('node/', node.nid);

It redirects the old URLs containing the old node IDs to the clean URLs. For some reasons, something happened canonical tag in Drupal 6 so that the old clean URLs where not used, but the ugly ones. I do not want to have them in the search engines. Now, this is fixed as well. The result contained duplicate lines, somehow, but they could be easily dropped or the correct alias chosen. In few cases, I needed to update the alias, commas led to some problems. I pasted the result at the beginning of the .htaccess file. The same needed to be done for the term IDs.

It is not the best approach, but given the limited time I could and wanted to spent this is OK. In the end, it's a private blog for fun and fame, but not for profit.

It is essential to try whether all important old URLs will still be reachable to avoid broken links. Broken links are bad for visitors as well as search engines. I used linkchecker, available in Ubuntu repositories, to collect all the URLs from my old site.

linkchecker -Fcsv/urlstate.csv --stdin -t1 -r0

A lot of stuff is gathered I took the whole path pointing to my domain, replaced the domain to my test domain, saved it in a text file and ran curl against them, I wrote a small script for this.

#!/bin/sh
OUTPUTFILE=new-url-stats.csv
for url in `cat urls-new-ws`; do
  status=`curl -I $url | grep "HTTP/1.1"`
  echo "$url,$status" >> $OUTPUTFILE
done

In the resulting CSV file I had the URL and the status, good enough for me. In LibreOffice, I auto-filtered it and sorted out the faulty or suspicious URLs, i.e. those throwing 4xx errors. If things needed to be fixed, I fixed them and rerun the script again until I was satisfied.

Future

I wondered whether I should switch away from Drupal but decided to stay with it. The migration should be performed as good as possible while spending as little time as possible. In the end, it took quite some time to investigate and find the right strategy. Maybe it would have been faster with a direct upgrade. Probably it is easier and more straight forward to use a software that is dedicated to run blogs. This question will reappear when the next iteration of the blog is going to be done in some years. And I cannot promise to stay with Drupal, since I really only use a little bit of the whole feature set. But I am not a fan of neither Wordpress nor Ghost, so let us see which options will be out in the wild then.

With the result I am satisfied, though there are a few smaller edges that can be taken care of later. It really is a huge relief to deliver "Comment" buttons and likes in common languages instead of just only German and be able to properly read it on mobile devices.

Now I only need to find time to blog more often ;)

Comments

Nice work. You are the AD guy at ownCloud, or? So are you authentication your drupal on an AD server? ;-)

Thanks! Yes, I do the LDAP/AD backend. Personally, i keep my fingers away from Microsoft. Atm I do not run an LDAP on my server, though this will likely change after i move it to new hardware.

Stay on Drupal, D8 will get much improved and if you have time - help us or just post some issues you want to see improved. It´s yet not perfect but it´s getting better thanks to its awesome community. Direct complaining here https://groups.drupal.org/drupal-initiatives or here http://www.drupal-am-main.de

D8 will definitely be an option! Maybe there will be competitors, but as of today Drupal is the strongest candidate for me. What is most important for me is a reliable and easy way of upgrading and/or migrating, having 3rd party modules in mind. Again, a simple blog is not the main purpose of Drupal and I am OK to live with the consequences as long as it is not getting overly complex – then it is just the wrong tool.

For the preview issues check out these modules & pages: - https://drupal.org/project/pagepreview - https://drupal.org/project/responsive_preview - https://drupal.org/project/sps -> Site Preview System - https://drupal.org/project/anonymous_publishing - https://groups.drupal.org/large-scale-drupal-lsd-projects-and-plans/content-staging

Thank you for the overview. Most of those modules I have evaluated already, but they did not meet my requirements. E.g. with Page Preview I cannot let other people (without account) see it. At least as I understand it. It is also not clear with Responsive Preview. OTOH, the nice thing about both that you have a rendered preview. SPS is overkill and requires to patch the core. Anonymous Publishing grants too many permissions, I just want others to look on my drafts. View Unpublished (https://drupal.org/project/view_unpublished) does it for me. Not perfect, but closer than other modules I have seen.

Add new comment