Groovy script showing LinkedIn Groups of my peers ranked by number of members then name

Before I continue with the Mashup and tweak the crawler some more to get latitude and longitudes for locations, I thought it would be fun to see what were the most popular LinkedIn groups with my peers (level one connections). They are predominantly Grails folks… So I came up with the following script (xmlgroups.groovy)

package jgf
// - LinkedIn UK 1st Million Members --- This group doesn't show up
def filesInError = []
def groups = processXML(filesInError)
showResultsInConsole(groups, filesInError)
return null  
def processXML(filesInError) {
  def XMLdir = new File("/Users/JGF/Desktop/LinkedInXML/")
  def groups = []
  def groupNamesKeyList = []
  XMLdir.eachFile{file ->
    if (file.isFile()) {
      def fn = file.getName()
      if (fn != '.DS_Store') {
        def idpos = fn.lastIndexOf('-')
        def id = fn[idpos + 1..-5]
        //if (id == '3161310') {
        def contactXML = file.text
        try {
          def contact = new XmlParser().parseText(contactXML)
            def currGroup = [name:, imgUrl: it.'img-url'.text(), url:, shared: it.shared.text(), count: 1]
            currGroup = updateGroupCount(groups, groupNamesKeyList, currGroup) 
        } catch (Exception e) {
          filesInError << fn
        }          // }
  // Sort groups by count descending, then name alphabetically
  groups.sort{a, b -> if (a.count == b.count) <=> else b.count <=> a.count} 
  return groups
def updateGroupCount(groups, groupNamesKeyList, currGroup) {
  def i = groupNamesKeyList.indexOf(
  if (i == -1) {
    groupNamesKeyList <<
    groups << currGroup
  } else {
    currGroup = groups[i]
    currGroup.count += 1
  return currGroup
def showResultsInConsole(groups, filesInError) {
  println '--- Files in Error ---'
  filesInError.each{file -> println file}
  println '--- Groups ---'
  groups.eachWithIndex{group, i ->
    //println "$"
    println "$group.count $$group.url $group.shared"
    if (i > 0 && i.mod(75) == 0) {
      println new jline.UnixTerminal().readCharacter(
  println '--- total Groups ---'
  println groups.size()
  return null
def processTemplateAndCreateHTMLPage(groups) {
  // HTML template
  groupHTMLTemplate = '''
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "">
    <meta http-equiv="Content-Type" content="text/html; charset=${encoding}" />
  <style type="text/css">
  .centred {
    text-align: center;
  .shared {
    color: red;
  .unshared {
    color: black;
     <h1 class='centred'>${title}</h1>
     <table width="100%" cellpadding="2" cellspacing="2">
       <% groups.eachWithIndex{group, i -> 
         def shared = (group.shared == 'true')
         def gsh
         def sharedAt
         if  (shared) {
            gsh = ' (Shared)'
            sharedAt = 'shared'
         } else {
            gsh = ''
            sharedAt = 'unshared'
         def name = "${}${gsh}"
         def url = "${group.url}"
         def newRow = (i.mod(25) == 0 && i != 0)
     <% if (newRow) { %>
     <% } %>
          <a href='${url}' target='_blank'>
            <img src='${group.imgUrl}' width='60' height='30' alt='${name}' title='${name}'/>
          <p class='${sharedAt}'>$group.count</p>
     <% } %>
  def engine   = new groovy.text.SimpleTemplateEngine()
  def template = engine.createTemplate(groupHTMLTemplate)
  def encoding = 'UTF-8'
  def title    = 'LinkedIn Groups of my peers ranked by number of members then name'
  def binding  = [encoding: encoding, title: title, groups: groups]
  def html     = template.make(binding).toString()
  def outfile  = new File("/Users/JGF/Desktop/groups.html")
  outfile.write(html, encoding)
  return null

ShowResultsInConsole routine explained:
When I wrote this I wanted to dump the results out, 75 rows at a time, so I could see all the results. The Groovy Console output only remembers, so many items so I wanted to control advancing the results based on any keystroke.
To begin with I tried following this thread on Nabble. But I couldn’t see how to keep the reader open and avoid getting an IOException all the time if I used {}. Even if I wrapped the whole script in this, the pesky Exception kept getting thrown.
So I headed back to here again and made use of the jline JAR.

Results of running xmlgroovy interactively paging 75 results at a time

Results of running xmlgroovy interactively paging 75 results at a time

Here’s the final tally of Groups amongst my peers analysed.

total groups = 2076 after running showResultsInConsole

total groups = 2076

processTemplateAndCreateHTMLPage explained:
GinA was a godsend here again. P311 gave me the inspiration.

If you want to get a copy of this script, you can click on the two part image of the Groovy script and copy/paste from a PDF version.
Finally, here’s the first/top 275 Groups from my peers, so you can see what everyone was most interested in

LinkedIn Groups of my peers ranked by number of members then name

LinkedIn Groups of my peers ranked by number of members then name

If you would like the HTML page itself send me an email, as you’d need the XML files already extracted from my earlier post about crawl.groovy.
Unfortunately WordPress doesn’t allow you to upload HTML pages as attachments. I tried going via Word -> PDF, but Word couldn’t handle it. Then tried saving from Firefox and loading that into Word. But then not all of the images were displayed correctly. I have since tweaked the anchor to open in a new page with target blank and re-assigned some of my Groups.


About this entry