Groovy Script to parse Yahoo contacts in CSV format and export to XML

Not all the contacts I’ve discovered whilst listening to the Grails podcast have become LinkedIn contacts. So I’ve also maintained a list of contacts in Yahoo’s web based contacts.
As a precursor to consolidating all my contacts into a Grails web application domain model, it’s time to export them from Yahoo. I initially wrote a script to export CSV into an XML format, to demonstrate how to process CSV files.

Here’s the first option you take:

Followed by taking the Yahoo CSV option:

Unfortunately, Yahoo didn’t have the forsight to write the file using UTF-8 encoding, so I’ve got to do a bit of manual tweaking to the data later. I’ve put in a suggestion to Yahoo.

When Yahoo exports the CSV file, you get a header row as the first record, which you obviously wouldn’t include in your results, but it’s a good sanity check to make sure the mapping process goes smoothly.

Here is the raw CSV file opened in TextMate:

Here’s a sample record in its converted XML format:

Here’s the script (YahooContacts3.groovy) to do the processing of the CSV file:

def fileIn = new File('/Users/JGF/Downloads/yahoo_ab.csv')
def pw  = new PrintWriter('/Users/JGF/Desktop/YahooContacts.xml', 'UTF-8')
def xml = new groovy.xml.MarkupBuilder(pw)
def id = 0
def token = ''
def fd
def fd2
def fields = []
def nl = System.getProperty("line.separator") // Newline character
xml.addressBook{
  fileIn.eachLine{ line ->
    use(SmartCsvParser) {
      fd = line.smartSplit(token, nl)
    }
    token = fd.token
    fd2 = fd.list
    fd2.each{ fields << it}
    if (fields.size() == 55) {
      //println "${fields[0]} ${fields[2]} ${fields[54]}"
      id += 1
      entry(id:id ) {
        first(fields[0])
        middle(fields[1])
        last(fields[2])
        nickname(fields[3])
        email(fields[4])
        category(fields[5])
        distlist(fields[6])
        yim(fields[7])
        hometel(fields[8])
        worktel(fields[9])
        pager(fields[10])
        fax(fields[11])
        mobile(fields[12])
        othertel(fields[13])
        ytel(fields[14])
        primaryem(fields[15])
        altemail1(fields[16])
        altemail2(fields[17])
        personalweb(fields[18])
        businessweb(fields[19])
        title(fields[20])
        company(fields[21])
        workad(fields[22])
        workcity(fields[23])
        workcounty(fields[24])
        workzip(fields[25])
        workcountry(fields[26])
        homead(fields[27])
        homecity(fields[28])
        homecounty(fields[29])
        homezip(fields[30])
        homecountry(fields[31])
        birthday(fields[32])
        anniv(fields[33])
        cust1(fields[34])
        cust2(fields[35])
        cust3(fields[36])
        cust4(fields[37])
        comments(fields[38])
        msid1(fields[39])
        msid2(fields[40])
        msid3(fields[41])
        msid4(fields[42])
        msid5(fields[43])
        msid6(fields[44])
        msid7(fields[45])
        msid8(fields[46])
        msid9(fields[47])
        skype(fields[48])
        ircid(fields[49])
        irqid(fields[50])
        googleid(fields[51])
        msnid(fields[52])
        aimid(fields[53])
        qqid(fields[54])
      }
      fields = []
    }
  }
}
return null

class SmartCsvParser{
  static def smartSplit(String line, String midToken, String nl){
    def list = []
    def thisToken
    def st = new StringTokenizer(line, ",")
    while (st.hasMoreTokens()) {
      if (midToken) {
        thisToken = midToken
        midToken = ''
      } else {
        thisToken = st.nextToken()
      }
      while ( thisToken.startsWith("\"") && !thisToken.endsWith("\"") && st.hasMoreTokens() ) {
        thisToken += "," + st.nextToken()
      }
      if (thisToken.startsWith("\"") && !thisToken.endsWith("\""))
        midToken = thisToken.replaceAll(',', nl)
      else
        list << thisToken.noQuote()
    }
    def res = [:]
    res.list = list
    res.token = midToken
    return res
  }

  static String noQuote(String token) {
    if(token.startsWith("\"")) {
      if (token.size() == 2)
        return ''
      else
        return token[1..-2]
    } else
      return token
  }
}

Here’s a sample script showing how to parse the resulting XML output with XmlParser:

The script is based loosely on a recipe provided in Scott Davis’s Groovy Recipes circa P141.
But you will see I had to make the CSV parser a bit more robust.
The CSV raw output generated places new line characters mid way through an element, resulting in multiple records being consolidated into a single field or node in XML terminology. I had to tweak this and preserve the text so it could be appended to with a newline character if the closing quotes wasn’t found. I basically replace commas with newline characters under such circumstances. You can see this in the comments node of the example given.

Advertisements

About this entry