Posts Tagged ‘yahoo’

Twitter FTW: or how after 10,543 tweets Twitter actually came in handy

Tuesday, February 16th, 2010

Twitter bird by Frank ChimeroFor those who may not know, I work for Yahoo! remotely. As such, my development box is 4,795 km (2,980 miles) to the west of me. Now, this isn’t normally a problem, except for today when my box decided to stop responding. And if you don’t believe in Murphy’s Law, today was also Presidents Day in the U.S., which meant there was nobody in the office to hit the reset switch for me. Enter Twitter. After keeping myself busy with non-dev-box related work, I finally decided to send out a tweet asking for help. I wasn’t holding my breath. But then, David Calhoun came to my rescue with a tweet of his own. Let it be known that I’ve never even met him face to face, but this selfless, shining example of a human being went to my box and turned it on (turns out someone had turned it off).

Now, if that isn’t an awesome use of Twitter, then I don’t know what is. Thanks David, thanks Twitter, you made my day.

YUICONF 2009, Day 1

Thursday, October 29th, 2009

What a week! As a new yahoo I attended orientation on Monday and Tuesday, and now I’m enjoying the first annual YUICONF. Late last night Christian Heilmann asked if I could contribute my impressions of the conference’s first day which was just posted to the YDN blog. Happy reading!

What I didn’t mention in the blog post is that so far, this week has been close to surreal. Forget that I’ve been on the Yahoo! HQ campus as a fully carded yahoo for three days. That itself is pinch-me worthy. But it’s been fanboy heaven. I’ve met Tom Croucher, Isaac Schlueter, Adam Moore, Matt Sweeney, Lucas Smith, Stoyan Stefanov, Satyen Desai, Philip Tellis and I shook Doug Crockford’s hand! Yes, I’m a shameless fanboy, and I don’t care! :-)

Screen Scraping and Creating a Feed with YQL and Yahoo! Pipes

Thursday, June 25th, 2009

Dav Glass recently asked if anyone could build him an RSS feed of the YUI download page using YQL and Pipes. I was somewhat confident of my Pipes and YQL skills so I decided to take a crack at it. The first thing I did was to scrape the page’s contents using YQL.

Scraping with YQL

The YQL Console allows access to several Data Tables, among which is a whole set of tables under the category “data” that aren’t tables at all. They’re more like APIs allowing you to connect to some pretty powerful fetchers and parsers. I selected the one named “html” which allows you to access any html document on the web as a data source and parse it using xpath.

Here’s the query I fed it:

select * from html where url="http://yuilibrary.com/downloads/?show=yui2" and xpath='//table[thead/tr/th/h2/@id="yui2"]/tbody//tr[@class="even" or @class="odd"]'

which returns the following fragment:

<tr class="even">
    <td><a title="Version 2.7.0b" href="yui2/yui_2.7.0b.zip">Version 2.7.0b</a></td>
    <td>02/19/2009</td>
    <td><em class="md5">90778a161ce9108a23a590e5198b8116</em></td>
</tr>
<tr class="odd">
    <td><a title="Version 2.6.0" href="yui2/yui_2.6.0.zip">Version 2.6.0</a></td>
    <td>10/01/2008</td>
    <td><em class="md5">41bed4b882c9148cebff5dd1a0dd8727</em></td>
</tr>
<tr class="even">
    <td><a title="Version 2.5.2" href="yui2/yui_2.5.2.zip">Version 2.5.2</a></td>
    <td>05/28/2008</td>
    <td><em class="md5">eaadfcbcb651c50092bb679266aa3c20</em></td>
</tr>

Creating a Feed with Pipes

You may have noticed that I already had a problem. The link text for each of the items was not descriptive enough. If I used them as-is, people subscribing to the feed would see stuff like: “1.0.0b1″ and “3.0.0 Beta 1″, which would just be confusing. Instead I wanted them to look like this: “YUI Builder – 1.0.0b1″ and “YUI 3 – 3.0.0 Beta 1″. That’s why my query only targets one table at a time by its id. That way I can isolate each table and prefix each link text with the correct product’s name.

Here‘s what doing that looked like:

pipes

You’ll note that the first thing I did after fetching the rows was to prepend the link text. Here’s where things got tricky. On my first attempt to do this, back when Dav first asked for it, I got stuck here. I wasn’t able to target the first td in the three that are contained in each row. Thanks to Nagesh Susarla of the YQL/Pipes team however, I was able to target the td I wanted by including its index number in the field targeting string like so: item.td.0.a.content. By including the 0 in there, I’m telling YQL that I want the first td in the set.

After prepending the link text with the product name, I use a union operator to put the contents of all the queries together, since now I don’t need them apart anymore. Then it’s on to putting the yuilibrary.com domain name in front of the href attribute values since they weren’t there. Then I clean everything up by looping over my rows and creating clean items out of them using an Item Builder. (I could have just renamed or copied the fields I wanted and stayed with the rows as-is, but then I’d be delivering a lot of unnecessary junk in my feed.) Finally, I run the whole thing through a sort operator on the pubDate field (not in the screen capture) and output the result in a feed.

In the end, a YQL/Pipes team member gave Dav what he needed a lot faster than I could, but hey I learned something in the process. This technique comes in pretty handy when a page doesn’t have a feed and you want to track changes on it. So now that you know how to do it, go out there and rip the web apart!

Update: I’m proud to say that the feed that I created has been copied and made the feed for the YUI Downloads page.