A little Groovy script to parse your LinkedIn contacts

I intend to improve this a bit and convert it into a web crawler at some point, but I took my LinkedIn contacts page and analysed the HTML markup using Firefox and the Firebug plug-in.

LinkedIn Contacts analysed in Firefox Firebug

LinkedIn Contacts analysed in Firefox Firebug

If you look the UL node, the third line from the bottom of the image above, with the class of “conx-list” this is the node where all the good stuff resides.
If we look at the next screenshot you will see there is a “letter divider” LI node, followed by a series of LI nodes, all with numeric id’s. That’s the location of the nodes pertaining to each individual contact.

First contact. It's the LI node with a numeric id attribute after the letter divider

First contact. It's the LI node with a numeric id attribute after the letter divider

I added the NekoHTML and latest XercesImpl JARs to the classpath in the GroovyConsole. You can do this from the Script Menu.
I saved the page off to my desktop as a complete web page, so that I didn’t have to contend with automating a login form and used Groovy’s excellent XMLParser to format an XML results extract of the pertinent data.

Groovy LinkedIn contact parsing Script

Related posts:

Some other links that helped me were:

Groovy Recipes

Groovy in Action

Advertisements

About this entry