In the process of setting up Novedge Pulse I had to go through several hundreds RSS feeds from blogs and news sources. More interestingly, I had to build a software system able to understand all of them. My first approach to the problem was very naive. I assumed that RSS was a simple and well defined standard based on XML and therefore there would be no space for syntactical error or personal interpretations.
Not having any RSS feed at all. This is by far the most common problem.
I was wrong. While most of the RSS feeds are aligned with the standard definition (there are several versions), I found a large number of feeds that are almost unusable. Here are some of the most severe errors:
Not having any RSS feed at all. This is by far the most common problem. It is incredible how many companies, from small to large, don't understand the importance of publishing Press Releases, Study Cases, etc. using the RSS technology.
Missing dates. This is a very bad mistake and it will make all your blog posts or news almost impossible to use.
Everything has the same date. This is a common error. Every time you download the feed all the news appears on the current day's date for no reason.
Non-standard date formats. This was the most painful part. Most of the code of Novedge Pulse is dedicated to recovering poor date format. Pulse tries to reconstruct the original date but still misses some of the dates. The RSS specifications are clear about the date format and it only takes a couple of minutes for a junior programmer to generate a date that is compliant with the standards.
Missing or wrong permalinks. It seems difficult to believe but several RSS feeds are missing the key information: a link to the original web page with the blog post or news. An interesting variant is the permalink provided by Intergraph RSS Feed for Press Releases, which completely lacks the original website's component of the URL.
Less serious mistakes that can still compromise the fruition of your feeds:
Feeds with only a title and no description. This error makes your feeds less attractive and doesn’t encourage readers to follow the link to get more information.
Extremely long titles. This is going to create problems for several news readers. A typical example of this poor marketing style is visible with OKI Press Releases.
No author or misleading authors. The majority of multi-author blogs are setup so that the author listed by the feeds is always the same. A typical example is the blog BIM & BEAM written by Nicolas Mangon and Wai Chu, always published under the name of Nicoals.
Linking to a registered users' only website. There are few situations more irritating than following an interesting link just to find out that in order to access the article you have to register on a totally irrelevant website. Nigel Davies does exactly that with his blog "Eat your CAD"!
News published in blocks of several items at the same time. It is more effective to spread the release of the posts over a few days or hours. This is for example the problem with the PTC "Customer Successes" feeds, which are silent for ages before finally publishing everything at once.
… following an interesting link just to find out that you have to register on a totally irrelevant website
Many of these problems can be fixed in no time by a programmer or by spending a few minutes setting the parameters of your RSS generators. A free, easy test for syntactical correctness of the feed can be done on several websites. I recommend FeedsValidator. Also having your feeds hosted by FeedBurner can take care of several problems.