Using StringBuilder and Regex to do an inline search & replace

One of the problems you can face if you try to do string replacements ‘inline’ is that as you replace one piece of text with another of a different length is that the matcher gets out of synch as soon as you change the string.

I was basically looking for a way to replace area codes in a string with placeholders.
I adapted this example from Jeffrey Friedel’s Mastering Regular Expressions book (P383).

def t = new StringBuilder('742 : Reserved as a future area code in the 289/905 region (to be overlain by 365 in 2013). [7]')
def p = /\b\d{3}\b/
def m = java.util.regex.Pattern.compile(p).matcher(t)
//def m = (t =~p)
m.each{println it}
def mp = 0
def i  = 0
while(m.find(mp)) {
 def iSt = i.toString()
 def ms = m.start()
 def me = m.end()
 mp = me
 t.replace(ms, me, iSt)
 def ml = me - ms
 mp = mp - ml + iSt.size()
 i+= 1
}
println t
742
289
905
365
0 : Reserved as a future area code in the 1/2 region (to be overlain by 3 in 2013). [7]

I couldn’t get the commented out Groovier declaration of m to work correctly, so I resorted to the more Java centric approach in order to get this working.

I’m guessing Groovy doesn’t re-establish the pattern for updated StringBuilder.
This is the result if I flip the commenting of m around (lines 3 & 4)!

0 : Reserved as a future area code in the 28105 2ion (to be overlain by 365 in313). [7]

I’m wondering is this a Groovy bug? Or is there another way of doing the compile in a Groovier way. Comments welcome,

This also works:

def t = '123 xxx 444 fgg 654'
def p = ~/\b\d{3}\b/
def m = (t =~ p)
def ms, me, ml, p1, p2, r, d, s, f
d = 0
m.eachWithIndex{it, i ->
  ms = m.start()
  me = m.end()
  ml = me - ms // Match length
  s = ms + d  // Adjusted Start
  f = me + d  // Adjusted End (Finish)
  p1 = (s) ? t[0..s-1] : '' // Part 1
  p2 = (f == t.size()) ? '' : t[f..-1] // Part 2
  r = '\$' + (i+1)  // Placeholder Replacement
  d = d - ml + r.size() // Update cummulative adjustment with diff in length
  t = p1 + r + p2 // recombine t with placeholder
}
println t
$1 xxx $2 fgg $3

Here are some associated useful links:


About this entry