<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Ruby web spider Part 0: concept</title>
	<atom:link href="http://warrenseen.com/blog/2006/03/03/ruby-web-spider-part-0-concept/feed/" rel="self" type="application/rss+xml" />
	<link>http://warrenseen.com/blog/2006/03/03/ruby-web-spider-part-0-concept/</link>
	<description>freelance software developer</description>
	<pubDate>Tue, 02 Dec 2008 22:15:31 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: Phill Midwinter</title>
		<link>http://warrenseen.com/blog/2006/03/03/ruby-web-spider-part-0-concept/#comment-858</link>
		<dc:creator>Phill Midwinter</dc:creator>
		<pubDate>Fri, 23 Feb 2007 12:17:15 +0000</pubDate>
		<guid isPermaLink="false">http://warrenseen.com/blog/2006/03/03/ruby-web-spider-part-0-concept/#comment-858</guid>
		<description>Not a bad start, you may find it worth putting the cache into DB if you want to search it effectively, you should also try breaking down the page into what are known as 'barrels'. Basically a fat list of all the keywords on the page and their associated attributes (density, colour, font size whatever floats your boat). This makes it a lot easier to search later on.</description>
		<content:encoded><![CDATA[<p>Not a bad start, you may find it worth putting the cache into DB if you want to search it effectively, you should also try breaking down the page into what are known as &#8216;barrels&#8217;. Basically a fat list of all the keywords on the page and their associated attributes (density, colour, font size whatever floats your boat). This makes it a lot easier to search later on.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
