Slide Creation Checklist

Sun, April 25, 2010, 05:56 PM under AboutPresenting

PowerPoint is a great tool for conference (large audience) presentations, which is the context for the advice below.

The #1 thing to keep in mind when you create slides (at least for conference sessions), is that they are there to help you remember what you were going to say (the flow and key messages) and for the audience to get a visual reminder of the key points. Slides are not there for the audience to read what you are going to say anyway. If they were, what is the point of you being there? Slides are not holders for complete sentences (unless you are quoting) – use Microsoft Word for that purpose either as a physical handout or as a URL link that you share with the audience. When you dry run your presentation, if you find yourself reading the bullets on your slide, you have missed the point. You have a message to deliver that can be done regardless of your slides – remember that. The focus of your audience should be on you, not the screen.

Based on that premise, I have created a checklist that I go over before I start a new deck and also once I think my slides are ready.

  1. Turn AutoFit OFF. I cannot stress this enough.
  2. For each slide, explicitly pick a slide layout. In my presentations, I only use one Title Slide, Section Header per demo slide, and for the rest of my slides one of the three: Title and Content, Title Only, Blank. Most people that are newbies to PowerPoint, get whatever default layout the New Slide creates for them and then start deleting and adding placeholders to that. You can do better than that (and you'll be glad you did if you also follow item #11 below).
  3. Every slide must have an image.
  4. Remove all punctuation (e.g. periods, commas) other than exclamation points and question marks (! ?).
  5. Don't use color or other formatting (e.g. italics, bold) for text on the slide.
  6. Check your animations. Avoid animations that hide elements that were on the slide (instead use a new slide and transition). Ensure that animations that bring new elements in, bring them into white space instead of over other existing elements. A good test is to print the slide and see that it still makes sense even without the animation.
  7. Print the deck in black and white choosing the "6 slides per page" option. Can I still read each slide without losing any information? If the answer is "no", go back and fix the slides so the answer becomes "yes".
  8. Don't have more than 3 bullet levels/indents. In other words: you type some text on the slide, hit 'Enter', hit 'Tab', type some more text and repeat at most one final time that sequence. Ideally your outer bullets have only level of sub-bullets (i.e. one level of indentation beneath them).
  9. Don't have more than 3-5 outer bullets per slide. Space them evenly horizontally, e.g. with blank lines in between.
  10. Don't wrap. For each bullet on all slides check: does the text for that bullet wrap to a second line? If it does, change the wording so it doesn't. Or create a terser bullet and make the original long text a sub-bullet of that one (thus decreasing the font size, but still being consistent) and have no wrapping.
  11. Use the same consistent fonts (i.e. Font Face, Font Size etc) throughout the deck for each level of bullet. In other words, don't deviate form the PowerPoint template you chose (or that was chosen for you). Go on each slide and hit 'Reset'. 'Reset' is a button on the 'Home' tab of the ribbon or you can find the 'Reset Slide' menu when you right click on a slide on the left 'Slides' list. If your slides can survive doing that without you "fixing" things after the Reset action, you are golden!
  12. For each slide ask yourself: if I had to replace this slide with a single sentence that conveys the key message, what would that sentence be? This exercise leads you to merge slides (where the key message is split) or split a slide into many, if there were too many key messages on the slide in the first place. It can also lead you to redesign a slide so the text on it really is just explanation or evidence for the key message you are trying to convey.
  13. Get the length right. Is the length of this deck suitable for the time you have been given to present? If not, cut content! It is far better to deliver less in a relaxed, polished engaging, memorable way than to deliver in great haste more content. As a rule of thumb, multiply 2 minutes by the number of slides you have, add the time you need for each demo and check if that add to more than the time you have allotted. If it does, start cutting content – we've all been there and it has to be done.

As always, rules and guidelines are there to be bent and even broken some times. Start with the above and on a slide-by-slide basis decide which rules you want to bend. That is smarter than throwing all the rules out from the start, right?

AutoFit in PowerPoint: Turn it OFF

Sun, April 25, 2010, 05:54 PM under AboutPresenting

Once a feature has shipped, it is very hard to eliminate it from the next release. If I was in charge of the PowerPoint product, I would not hesitate for a second to remove the dreadful AutoFit feature.


Fortunately, AutoFit can be turned off on a slide-by-slide basis and, even better, globally: go to the PowerPoint "Options" and under "Proofing" find the "AutoCorrect Options…" button which brings up the dialog where you need to uncheck the last two checkboxes (see the screenshot to the right).

AutoFit is the ability for the user to keep hitting the Enter key as they type more and more text into a slide and it magically still fits, by shrinking the space between the lines and then the text font size. It is the root of all slide evil. It encourages people to think of a slide as a Word document (which may be your goal, if you are presenting to execs in Microsoft, but that is a different story). AutoFit is the reason you fall asleep in presentations.

AutoFit causes too much text to appear on a slide which by extension causes the following:

  1. When the slide appears, the text is so small so it is not readable by everyone in the audience. They dismiss the presenter as someone who does not care for them and then they stop paying attention.
  2. If the text is readable, but it is too much (hence the AutoFit feature kicked in when the slide was authored), the audience is busy reading the slide and not paying attention to the presenter. Humans can either listen well or read well at the same time, so when they are done reading they now feel that they missed whatever the speaker was saying. So they "switch off" for the rest of the slide until the next slide kicks in, which is the natural point for them to pick up paying attention again.
  3. Every slide ends up with different sized text. The less visual consistency between slides, the more your presentation feels unprofessional. You can do better than dismiss the (subconscious) negative effect a deck with inconsistent slides has on an audience.

In contrast, the absence of AutoFit

  1. Leads to consistency among all slides in a deck with regards to amount of text and size of said text.
  2. Ensures the text is readable by everyone in the audience (presuming the PowerPoint template is designed for the room where the presentation is delivered).
  3. Encourages the presenter to create slides with the minimum necessary text to help the audience understand the basic structure, flow, and key points of the presentation. The "meat" of the presentation is delivered verbally by the presenter themselves, which is why they are in the room in the first place.
  4. Following on from the previous point, the audience can at a quick glance consume the text on the slide when it appears and then concentrate entirely on the presenter and what they have to say.

You could argue that everything above has nothing to do with the AutoFit feature and all to do with the advice to keep slide content short. You would be right, but the on-by-default AutoFit feature is the one that stops most people from seeing and embracing that truth.

In other words, the slides are the tool that aids the presenter in delivering their message, instead of the presenter being the tool that advances the slides which hold the message. To get there, embrace terse slides: the first step is to turn off this horrible feature (that was probably introduced due to the misuse of this tool within Microsoft). The next steps are described on my next post.

Outlook 2010 – My Top 9 features

Sun, April 25, 2010, 02:18 PM under Random

Office 2010 has reached RTM.

Here are my favorite Outlook features.

  1. Speed. It is faster than previous versions and hangs much less…
  2. Ignore Conversation (Ctrl+Del). Not interested in a conversation? Click this button on the new ribbon and you'll never receive another message on that thread (they all go to your Deleted folder).
  3. Calendar Preview. When receiving a Meeting Request, before deciding to accept or not you get to see a preview of your calendar for that day and where the new meeting would fit in. See full description on outlook team blog post.
  4. Quick Steps. See full description on outlook team blog post. I have created my own quick steps for filing conversations to folders, various pre-populated reply templates, creating calendar invites and creating TODOs from received emails.
  5. Search Interface. Many of us knew the magic keywords for making smart searches (e.g. from:Name), but it is great to learn many more through the search tools contextual ribbon tab.
  6. Next 7 days. Out of the many enhancements to the Calendar view, my favorite is to be able with  single click to view the next 7 days – that is now my default view.
  7. MailTips. See full description on outlook team blog post. The ones I particularly like are
    • when composing a mail to someone that has their Out Of Office reply set, you get to read it before sending the mail (and hence can decide to postpone sending).
    • when composing a mail to a distribution list, a message informs you of the number of recipients. Hopefully, senders will use that as a clue for narrowing down the recipient list or at least verifying that their mail should indeed be sent to so many people.
  8. "You are not responding to the latest message in this conversation. Click here to open it.". When composing a reply to a conversation and you have not picked the last message to reply to (don't you hate it when people split threads like that?), this is the inline message you see (under the MailTips area) and if you click on the message it opens the last mail in the conversation so you can reply to that.
  9. Rich "Conversation Settings" and in particular "Show Messages from Other Folders". imageFor example, you can see in your inbox not only the message you received but also the reply you sent (it gets pulled in from the Sent folder). Another example: a conversation has been taking place on a distribution list (so your rules filed it to a folder) and they add you on the TO or CC line, so it appears in a different folder; regardless of which folder you open, you are able to see the entire conversation. Note that messages from other folders than the one you are browsing, appear in grey text so you can easily spot them. Reading them in one folder, obviously marks them as read in the other folder…

If you haven't yet, when are you making the move to Outlook 2010?

Word 2010 Navigation Pane and more

Sun, April 25, 2010, 02:12 PM under Random

I have been using Office 2010 since Beta1 and have not looked back since. I am currently on an internal RC, but will upgrade tomorrow to the RTM version.

Word 2010 Navigation Pane

There are a plethora of new productivity features and for Word 2010 the one that overshadows everything else, IMO, is the Navigation Pane. I could spend time describing it here, but I'll never be able to cover it more thoroughly than what the product team has on their blog post.

You enable it via the "Navigation Pane" checkbox in the "Show" group of the "View" tab on the Word ribbon.

Even if you have come across this new Word 2010 feature, trust me you will learn something more about it, you will thank me later. Go learn how to make the most of the new Navigation Pane.







As an aside, there are many new benefits in PowerPoint 2010 too, my favorite being support for sections. Not to leave Excel 2010 out, you should check Excel's integration with HPC Server.

Visual Studio 2010 released!

Mon, April 12, 2010, 08:07 AM under ParallelComputing | VisualStudio

Visual Studio 2010 releases to the world today. Get the full story from Soma's blog post (inc. links for buy, try etc).

Our team is very proud of what we have contributed to this release and you can learn more about it through our content on the Parallel Computing MSDN home.

Microsoft Windows HPC Server R2 Beta2

Mon, April 12, 2010, 08:05 AM under HPC | ParallelComputing

Internally and unofficially we refer to this as "HPC Server v3" and its Beta2 became available last week. Read the full story on this blog post from Ryan and this one from Don.

There has been a lot of excitement on the web for this release with coverage from last Wednesday here, here, here, here, here and here.

Don't forget that Visual Studio 2010 makes it easy to develop for HPC Server including the MPI Cluster Debugger integration that I explained here and here.

Tool to convert content to dasBlog

Fri, April 9, 2010, 07:33 AM under Blogging

Due to dropping FTP support, I've had to move my blog. If you are in a similar situation, this post will help you by showing you the necessary steps to take.


No loss on blog posts, comments AND all existing permalinks continue to work (redirect to the correct place).


  1. Download the XML files corresponding to your content and store them in a folder.
  2. Install and configure dasBlog on your local machine.
  3. Configure your web.config file (will need updating once you run step 4).
  4. Use the tool I describe further down to generate the content and place it at the right place.
  5. Test your site locally. Once you are happy, repeat step 2 on your hosting provider of choice. Remember to copy up your dasBlog theme folder if you created one.
  6. Copy up the local web.config file and the XML dasBlog content files generated by the tool of step 4.
  7. Test your site on the server. Once you are happy, go live (following instructions from your hoster). In my case, I gave the nameservers from my new hoster to my existing domain registrar and they made the switch.

Tool (code)

At step 4 above I referred to a tool. That is an overstatement, it is simply one 450-line C#code file that you can download here: BloggerToDasBlog.cs. I used this from a .NET 2.0 console app (and I run it under the Visual Studio debugger, i.e. F5) like this: Program.cs. The console app referenced the dasBlog 2.3 ASP.NET Blogging Engine i.e. the newtelligence.DasBlog.Runtime.dll assembly.

Let me describe what the code does:

  • A path to a folder where the XML files from the old blog reside. It can deal with both types of XML file.
  • A full file path to a file where it creates XML redirect input (as required by the rewriteMap mentioned here).
  • The blog URL. The author's email. The blog author name.
  • A path to an empty folder where the new XML dasBlog content files will get created.
  • The subfolder name used after the domain name in the URL.
  • The 3 reg ex patterns to use. You can use the same as mine, but will need to tweak the monthly_archive rule.

Again, to see what values I passed for all the above, see my Program.cs file.

  • It creates dasBlog XML files in the folder specified. It creates those by parsing the old XML files that reside in the folder specified. After that is generated, copy it to the "Content" folder under your dasBlog installation.
  • It creates an XML file with a single ignorable root element and a bunch of inner XML elements. You can copy paste these in the web.config file as discussed in this post.
Other notes:
  • For each blog post, it detects outgoing links to itself (i.e. to the same blog), and rewrites those to point to the new URLs. So internal links do not rely on the web.config redirects.
  • It deals with duplicate post titles; it does not deal with triplicates and higher.
  • Removes all references to (e.g. references to, the injected hidden footer for statistics that each blog post has and others – see the code).
  • It creates a lot of diagnostic output (in the Output window) and indeed the documentation for the code is in the Debug.WriteLine statements ;)

This is not code I will maintain or support – it was a throwaway one-use project that I am sharing here as a starting point for anyone finding themselves in the same boat that I was. Enjoy "as is".

Preserving Permalinks

Fri, April 9, 2010, 07:22 AM under Blogging

One of the things that gets me on a rant is websites that break permalinks. If you have posted something somewhere and there is a public URL pointing to it, that URL should never ever return a 404. You are breaking all websites that ever linked to you and you are breaking all search engine links to your content (that others will try and follow). It is a pet peeve of mine.

So when I had to move my blog, obviously I would preserve the root URL (, but I also wanted to preserve every URL my blog has generated over the years. To be clear, our focus here is on the URL formatting, not the content migration which I'll talk about in my next post. In this post, I'll describe my solution first and then what it solves.

1. The IIS7 Rewrite Module and web.config

There are a few ways you can map an old URL to a new one (so when requests to the old URL come in, they get redirected to the new one). The new blog engine I use (dasBlog) has built-in functionality to do that (Scott refers to it here). Instead, the way I chose to address the issue was to use the IIS7 rewrite module.

The IIS7 rewrite module allows redirecting URLs based on pattern matching, regular expressions and, of course, hardcoded full URLs for things that don't fall into any pattern. You can configure it visually from IIS Manager using a handy dialog that allows testing patterns against input URLs. Here is what mine looked like after configuring a few rules:

URL Rewrite

To learn more about this technology check out this video, the reference page and this overview blog post; all 3 pages have a collection of related resources at the bottom worth checking out too.

All the visual configuration ends up in a web.config file at the root folder of your website. If you are on a shared hosting service, probably the only way you can use the Rewrite Module is by directly editing the web.config file. Next, I'll describe the URLs I had to map and how that manifested itself in the web.config file. What I did was create the rules locally using the GUI, and then took the generated web.config file and uploaded it to my live site. You can view my web.config here.

2. Monthly Archives

Observe the difference between the way the two blog engines generate this type of URL

  • Blogger: /Blog/2004_07_01_mothblog_archive.html
  • dasBlog: /Blog/default,month,2004-07.aspx

In my web.config file, the rule that deals with this is the one named "monthlyarchive_redirect".

3. Categories

Observe the difference between the way the two blog engines generate this type of URL

  • Blogger: /Blog/labels/Personal.html
  • dasBlog: /Blog/CategoryView,category,Personal.aspx

In my web.config file the rule that deals with this is the one named "category_redirect".

4. Posts

Observe the difference between the way the two blog engines generate this type of URL

  • Blogger: /Blog/2004/07/hello-world.html
  • dasBlog: /Blog/Hello-World.aspx

In my web.config file the rule that deals with this is the one named "post_redirect".

Note: The decision is taken to use dasBlog URLs that do not include the date info (see the description of my Appearance settings). If we included the date info then it would have to include the day part, which blogger did not generate. This makes it impossible to redirect correctly and to have a single permalink for blog posts moving forward. An implication of this decision, is that no two blog posts can have the same title. The tool I will describe in my next post (inelegantly) deals with duplicates, but not with triplicates or higher.

5. Unhandled by a generic rule

Unfortunately, the two blog engines use different rules for generating URLs for blog posts. Most of the time the conversion is as simple as the example of the previous section where a post titled "Hello World" generates a URL with the words separated by a hyphen. Some times that is not the case, for example:

  • /Blog/2006/05/medc-wrap-up.html
  • /Blog/MEDC-Wrapup.aspx


  • /Blog/2005/01/best-of-moth-2004.html
  • /Blog/Best-Of-The-Moth-2004.aspx


  • /Blog/2004/11/more-windows-mobile-2005-details.html
  • /Blog/More-Windows-Mobile-2005-Details-Emerge.aspx

In short, blogger does not add words to the title beyond ~39 characters, it drops some words from the title generation (e.g. a, an, on, the), and it preserve hyphens that appear in the title. For this reason, we need to detect these and explicitly list them for redirects (no regular expression can help here because the full set of rules is not listed anywhere).

In my web.config file the rule that deals with this is the one named "Redirect rule1 for FullRedirects" combined with the rewriteMap named "StaticRedirects".

Note: The tool I describe in my next post will detect all the URLs that need to be explicitly redirected and will list them in a file ready for you to copy them to your web.config rewriteMap.

6. C# code doing the same as the web.config

I wrote some naive code that does the same thing as the web.config: given a string it will return a new string converted according to the 3 rules above. It does not take into account the 4th case where an explicit hard-coded conversion is needed (the tool I present in the next post does take that into account).

  static string REGEX_post_redirect           = "[0-9]{4}/[0-9]{2}/([0-9a-z-]+).html";
  static string REGEX_category_redirect       = "labels/([_0-9a-z-% ]+).html";
  static string REGEX_monthlyarchive_redirect = "([0-9]{4})_([0-9]{2})_[0-9]{2}_mothblog_archive.html";

  static string Redirect(string oldUrl)
    GroupCollection g;
    if (RunRegExOnIt(oldUrl, REGEX_post_redirect, 2, out g))
      return string.Concat(g[1].Value, ".aspx");

    if (RunRegExOnIt(oldUrl, REGEX_category_redirect, 2, out g))
      return string.Concat("CategoryView,category,", g[1].Value, ".aspx");

    if (RunRegExOnIt(oldUrl, REGEX_monthlyarchive_redirect, 3, out g))
      return string.Concat("default,month,", g[1].Value, "-", g[2], ".aspx");

    return string.Empty;

  static bool RunRegExOnIt(string toRegEx, string pattern, int groupCount, out GroupCollection g)
    if (pattern.Length == 0)
      g = null;
      return false;
    g = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled).Match(toRegEx).Groups;

    return (g.Count == groupCount);


Fri, April 9, 2010, 07:08 AM under Blogging

Some people like blogging on a site that is completely managed by someone else (e.g. and others, like me, prefer hosting their own blog at their own domain. In the latter case you need to decide what blog engine to install on your web space to power your blog. There are many free blog engines to choose from (e.g. the one from If, like me, you want to use a blog engine that is based on the .NET platform you have many choices including BlogEngine.NET, Subtext and the one I picked: dasBlog.

In this post I'll describe the steps I took to get going with the open source dasBlog (home page, source page).

A. Installing

First I installed dasBlog on my local Windows 7 machine where I have IIS7 installed. To install dasBlog, I started by clicking the "Install" button on its web gallery page. After that I went through configuration, theming and adding content as described below.

Once I was happy that everything was working correctly on the local machine, I set this up on a hosting service. I went for a Windows IIS7 shared hosting 3 month Economy plan from GoDaddy. The dasBlog site lists a bunch of other hosts. You can read the installation instructions for dasBlog, and with GoDaddy I just had to click one button since it is available as part of their quick-install apps. With GoDaddy I had a previewdns option that allowed me to play around and preview my site before going live.

B. Configuring

After it was installed (on local machine and/or hosting provider), I followed the obvious steps to create an admin user and logged in. This displays an admin navigation bar with the following options:

1. Navigator Links: I decided I was not going to use this feature. I manage links on the side of my blog manually elsewhere as part of the theme. So, I deleted every entry on this page and ignored it thereafter.

2. Blogroll: Ditto - same comment as for Navigator Links.

3. Content Filters: I did not delete (or add) these, but I did ensure both checkboxes are not checked. I.e. I am not using this feature now, but I may return to it in the future.

4. Activity: This is a read-only view of various statistics. So nothing to configure here, but useful to come back to for complementary statistics to whatever other statistical package you use (e.g. free stats as part of the hosting and I also use feedburner for syndication stats).

5. Cross-posting: I did not need that, so I turned it off via the Configuration Settings discussed next.

6. Configuration Settings: This is where the bulk of the configuration for the blog takes place and they are stored in a single XML file: Site.Config file. There are truly self-explanatory options to pick for Basic Settings, Services Settings and Services to Ping, Syndication Settings (this is where you link to your feedburner name if you have one) and Mail to Weblog Settings (I keep this turned off). There are also "Xml Storage System Settings" (I keep this turned off), "OpenId Settings" (I allow OpenID commenters), "Spammer Settings" (Enable captcha, never show email addresses) and "Comment settings" (Enable comments, don't allow on older posts, don't allow html). There are also Appearance Settings (I checked the "Use Post Title for Permalink", replaced spaces with hyphen and unchecked the "Use Unique Title"). Finally, there are also Notification Settings, but they are a bit of hit and miss in my case, in that I don’t always get the emails (still investigating this).

C. Adding Content

You can add content via the "Add Entry" link on the admin navigation bar or by configuring the "Mail to Weblog" settings and sending email or, do what I've started doing, use Live Writer (also the team has a blog).

Another way to add content is programmatically if, for example, you are migrating content from another blog (and I'll cover that in separate post sharing the code). What you should know is that all blog content (posts and comments) live in XML files in a folder called "content" under your dasBlog installation.

D. Theming

There is a very good guide about themes for dasBlog, there is also a similar guide with screenshots (scroll down to "So how do I create a theme") and the dasBlog macro reference.

When you install dasBlog, there are many themes available; each theme is in its own folder (representing the folder name) under the themes folder. You may have noticed that you can switch between these via the "Appearance Settings" described above (look for the combobox after the Default Theme label).

I created my own theme by copy-pasting an existing theme folder, renaming it and then switching to it as the default. I then opened the folder in Visual Studio and hacked around the HTML in the 3 files (itemTemplate, homeTemplate and dayTemplate). These files have a blogtemplate file extension, which I temporarily renamed to HTML as I was editing them. There is no more advice I can offer here as this is a matter of taste and the aforementioned links is all I used. Personally, I had salvaged the CSS (and structure) from my previous blog and wanted to make this one match it as closely as possible - I think I have succeeded.

E. If you run into any issue with dasBlog...

...use your favorite search engine to find answers. Many bloggers have been using this engine for a while and have documented issues and workarounds over time. One such example is ScottHa's dasBlog category; another example is therightstuff where I "borrowed" the idea/macro for the outlook-style on-page navigation. If you don't find what you want through searching, try posting a question to the forums.

Get your content off

Fri, April 9, 2010, 06:58 AM under Blogging

Due to deprecating FTP users I've decided to move my blog.

When I think of the content of a blog, 4 items come to mind: blog posts, comments, binary files that the blog posts linked to (e.g. images, ZIP files) and the CSS+structure of the blog.

1. Binaries

The binary files you used in your blog posts are sitting on your own web space, so really is not involved with that. Nothing for you to do at this stage, I'll come back to these in another post.

2. CSS and structure

In the best case this exists as a separate CSS file on your web space (so no action for now) or in a worst case, like me, your CSS is embedded with the HTML. In the latter case, simply navigate from you dashboard to "Template" then "Edit HTML" and copy paste the contents of the box. Save that locally in a txt file and we'll come back to that in another post.

3. Blog posts and Comments

The blog posts and comments exist in all the HTML files on your own web space. Parsing HTML files to extract that can be painful, so it is easier to download the XML files from blogger's servers that contain all your blog posts and comments.

3.1 Single XML file, but incomplete

The obvious thing to do is go into your dashboard "Settings" and under the "Basic" tab look at the top next to "Blog Tools". There is a link there to "Export blog" which downloads an XML file with both comments and posts. The problem with that is that it only contains 200 comments - if you have more than that, you will lose the surplus. Also, this XML file has a lot of noise, compared to the better solution described next. (note that a tool I will refer to in a future post deals with either kind of XML file)

3.2 Multiple XML files

First you need to find your blog ID. In case you don't know what that is, navigate to the "Template" as described in section 2 above. You will find references to the blog id in the HTML there, but you can also see it as part of the URL in your browser: Mine is 7 digits.

You can now navigate to these URLs to download the XML for your posts and comments respectively:

Note that you can only get 500 posts at a time and only 200 comments at a time. To get more than that you have to change the URL and download the next batch. To get you started, to get the XML for the next 500 posts and next 200 comments respectively you’d have to use these URLs:

...and so on and so forth. Keep all the XML files in the same folder on your local machine (with nothing else in there).

4. Validating the XML aka editing older blog posts

The XML files you just downloaded really contain HTML fragments inside for all your blog posts. If you are like me, your blog posts did not conform to XHTML so passing them to an XML parser (which is what we will want to do) will result in the XML parser choking. So the next step is to fix that. This can be no work at all for you, or a huge time sink or just a couple hours of pain (which was my case).

The process I followed was to attempt to load the XML files using XmlDocument.Load and wait for the exception to be thrown from my code. The exception would point to the exact offending line and column which would help me fix the issue. Rather than fix it in the XML itself, I would go back and edit the offending blog post and fix it there - recommended! Then I'd repeat the cycle until the XML could be loaded in the XmlDocument.

To give you an idea, some of the issues I encountered are: extra or missing quotes in img and href elements, direct usage of chevrons instead of encoding them as <, missing closing tags, mismatched nested pairs of elements and capitalization of html elements. For a full list of things that may go wrong see this.

5. Opportunity for other changes

I also found a few posts that did not have a category assigned so I fixed those too. I took the further opportunity to create new categories and tag some of my blog posts with that. Note that I did not remove/change categories of existing posts, but only added.


In an another post we'll see how to use the XML files you stored in the local folder… kills FTP

Fri, April 9, 2010, 06:41 AM under Blogging

History (you can safely ignore)

Back in 2002 I came across some (almost) free Linux/Apache space and set up my first manually-created HTML-based home page, which still exists: In 2004 I wanted to have a blog that would be hosted on a sub-folder of my domain, and at the same time I did not want to mess with setting up a blog engine myself. I found the perfect solution in, which offered a web interface for creating blog posts (and managing the pages' template) and it would then use FTP to upload HTML pages to my space (no server-side programming/installation required at all)!

FTP feature dropped by

Unfortunately, along the way Google purchased and a couple of months ago they announced that they decided to kill the FTP feature, and they are forcing customers using that feature to have their content hosted (in an opaque way) on Google's servers.

Even though I prefer having my content on my own space, I would have considered moving it to Google's servers if I could host my blog in a sub-folder and preserve my full blog URL: (including my home pages being hosted at the root of the domain). Sadly, that is not possible.

What now

So I decided to move my blog somewhere else. I'll document on the next few posts how I did that (inc. a tool I wrote) in case it helps someone else in the same situation and also as a reminder to me if I need to do something like this again in the future.