Posts Tagged ‘web development’

Screen Scraping and Creating a Feed with YQL and Yahoo! Pipes

Thursday, June 25th, 2009

Dav Glass recently asked if anyone could build him an RSS feed of the YUI download page using YQL and Pipes. I was somewhat confident of my Pipes and YQL skills so I decided to take a crack at it. The first thing I did was to scrape the page’s contents using YQL.

Scraping with YQL

The YQL Console allows access to several Data Tables, among which is a whole set of tables under the category “data” that aren’t tables at all. They’re more like APIs allowing you to connect to some pretty powerful fetchers and parsers. I selected the one named “html” which allows you to access any html document on the web as a data source and parse it using xpath.

Here’s the query I fed it:

select * from html where url="http://yuilibrary.com/downloads/?show=yui2" and xpath='//table[thead/tr/th/h2/@id="yui2"]/tbody//tr[@class="even" or @class="odd"]'

which returns the following fragment:

<tr class="even">
    <td><a title="Version 2.7.0b" href="yui2/yui_2.7.0b.zip">Version 2.7.0b</a></td>
    <td>02/19/2009</td>
    <td><em class="md5">90778a161ce9108a23a590e5198b8116</em></td>
</tr>
<tr class="odd">
    <td><a title="Version 2.6.0" href="yui2/yui_2.6.0.zip">Version 2.6.0</a></td>
    <td>10/01/2008</td>
    <td><em class="md5">41bed4b882c9148cebff5dd1a0dd8727</em></td>
</tr>
<tr class="even">
    <td><a title="Version 2.5.2" href="yui2/yui_2.5.2.zip">Version 2.5.2</a></td>
    <td>05/28/2008</td>
    <td><em class="md5">eaadfcbcb651c50092bb679266aa3c20</em></td>
</tr>

Creating a Feed with Pipes

You may have noticed that I already had a problem. The link text for each of the items was not descriptive enough. If I used them as-is, people subscribing to the feed would see stuff like: “1.0.0b1″ and “3.0.0 Beta 1″, which would just be confusing. Instead I wanted them to look like this: “YUI Builder – 1.0.0b1″ and “YUI 3 – 3.0.0 Beta 1″. That’s why my query only targets one table at a time by its id. That way I can isolate each table and prefix each link text with the correct product’s name.

Here‘s what doing that looked like:

pipes

You’ll note that the first thing I did after fetching the rows was to prepend the link text. Here’s where things got tricky. On my first attempt to do this, back when Dav first asked for it, I got stuck here. I wasn’t able to target the first td in the three that are contained in each row. Thanks to Nagesh Susarla of the YQL/Pipes team however, I was able to target the td I wanted by including its index number in the field targeting string like so: item.td.0.a.content. By including the 0 in there, I’m telling YQL that I want the first td in the set.

After prepending the link text with the product name, I use a union operator to put the contents of all the queries together, since now I don’t need them apart anymore. Then it’s on to putting the yuilibrary.com domain name in front of the href attribute values since they weren’t there. Then I clean everything up by looping over my rows and creating clean items out of them using an Item Builder. (I could have just renamed or copied the fields I wanted and stayed with the rows as-is, but then I’d be delivering a lot of unnecessary junk in my feed.) Finally, I run the whole thing through a sort operator on the pubDate field (not in the screen capture) and output the result in a feed.

In the end, a YQL/Pipes team member gave Dav what he needed a lot faster than I could, but hey I learned something in the process. This technique comes in pretty handy when a page doesn’t have a feed and you want to track changes on it. So now that you know how to do it, go out there and rip the web apart!

Update: I’m proud to say that the feed that I created has been copied and made the feed for the YUI Downloads page.

New Site Design

Monday, April 13th, 2009

If you read this blog in a feed reader, then you won’t notice the change, but I’ve completely redesigned my site. There were two motivating factors behind the redesign. One, I was tired of the green look, and two, I really wanted to simplify the markup.

The result is what you see. Suffice it to say, it’s a departure from the previous look. What do you think?

The Future Of Social Media Is Geolocation

Wednesday, April 8th, 2009

Why is Facebook so popular? Is it because people love Ajax or the colour blue? No, Facebook is popular because it serves a real-world social need, it connects people. It’s the same reason why Twitter is so popular, because it facilitates communication. These are the pillars of the social web, connection and communication. People doing online what people have been doing in the real world since the dawn of humankind, interacting with each other.

So now that people can connect with each other, broadcast their thoughts in 140 characters or less, and share video of themselves doing the Funky Chicken, what’s next?

Geolocation. If you think of computers, and by extension the web, as a means to an end rather than the end itself, then the next step in the evolution of the social media phenomenon is to translate the connections in the digital world into connections in the physical world. This is where geolocation comes into play.

For example, I’m certain that soon all cameras will tag images with GPS coordinates so that once they’re shared, people will be able to know where they were taken. This will help people more easily find pictures of known locations as well as better relate to the experience the photographer is sharing. Another example of connecting people in the real world is Nine Inch Nails‘ upcoming iPhone App. The app leverages the iPhone’s GPS ability to connect fans in many different ways (watch Kevin Rose explain about half way through this Wired.com video).

The point, like I mentioned earlier, is that social media is a means to an end and not the end itself. If you want to be ahead of the curve then focus on developing tools that will bring people closer together. Build tools that will help people discover and connect in new and different ways. Build tools that will help them translate those connections into the real world we all live in. Because in the end, that’s what social media is really all about.

A Real-World Example Of Experiential Perspective

Wednesday, April 8th, 2009

I was watching my four and a half year old daughter play Recipe Rhumba! on the Kids’ CBC web site with a banana in one hand and her other hand on the laptop’s trackpad. Now the game requires that you drag and drop recipe items (see the image below) which I figured she wouldn’t be able to do with just one hand. Yet as I watched, she proceeded to click an item once, which caused it to be picked up, then she dragged her finger to the tray, and as soon as she was over it, the item automatically dropped. Now I guess I just took for granted that dragging and dropping items required the mouse button to be kept pressed during the whole drag operation. But that was me taking things for granted. Considering kids and their likely inability to keep the mouse button pressed during the dragging operation would require a different design, which is what Kids’ CBC did.

So why am I telling you all this? Because without actually watching a kid interact with the game, I would have never considered something as crucial as the drag and drop interaction as it applied to children. I would have simply taken it for granted and delivered a flawed product. This is exactly the sort of thing I was talking about in my post on Experiential Perspective. Until you actually get into what you’re building and interact with it like a real user, you’re going to be missing key nuances and what you deliver will end up being sub-par.

Screenshot of Recipe Rhumba, a game on the Kids' CBC web site.

Making Your Tweets Retweet Friendly

Monday, April 6th, 2009

I don’t know how often I’ve wanted to retweet someone’s tweet, only to find that after adding “RT @username” it no longer fits in 140 characters. If I’m adamant about retweeting it, I might try and pare it down a little, but now I’m wasting time and energy. I’m sure that if there were stats collected on retweeting dropoff, this is where people would be dropping off en masse.

The way to make tweets “retweet friendly” of course, is to subtract the number of characters it takes to write “RT @yourusername” from the 140 character limit. It’s quick math, but if you’re lazy like me, you can use Retweetulator. It’s a quick script I whipped up just for lazy folk like myself. Enjoy!