Groovy Regular Expressions to abbreviate compass directions with look ahead and look behind.
I wrote this handy routine to abbreviate compass directions (plus central to C).
The thing was I didn’t want to abbreviate Us state names like North Dakota or South Dakota.
I also didn’t want to corrupt place names like Northridge, and I had to deal with all sorts of permutations of compass points.
As well not matching if a name follows a point, I also had the case where there is a place called George West. So I had my work cut out for me. :-)
Anyway here’s the routine
def points = [[k: ~/(N|n)ortheast(ern)?/ , v:'NE'],
[k: ~/(?>(N|n)orth(W|w)est(ern|:)?)(?! Terrotories)/ , v:'NW'],
[k: ~/(S|s)outheast(ern)?/ , v:'SE'],
[k: ~/(?>(S|s)outh(\s)?west(ern)?)(?! Hill| Bend)/ , v:'SW'],
[k: ~/(?>(N|n)orth(ern)?|Upstate)(?! Carolina| Dakota| Platte| Neck| Mariana Islands| Bay|ridge)/ , v:'N' ],
[k: ~/(E|e)ast(ern)?/ , v:'E' ],
[k: ~/(?>(S|s)outh(ern|side)?)(?! Carolina| Dakota)/ , v:'S' ],
[k: ~/(?!(?<=George ))(?>(W|w)est(ern| of the)?)(?! Virginia| Palm Beach)/ , v:'W' ],
[k: ~/(?>(C|c)entral|Center|Middle|the middle section of the)(?!town| Peninsula| Tennessee|ia)/ , v:'C' ]
]
def text = 'West Virginia'
points.each {p ->
def matcher = (text =~ p.k)
text = matcher.replaceAll(p.v)
println "p.v: ${p.v} text: $text"
}
return null
If I break apart line 8, which is the most sophisticated of the lines in the example, it’s saying:
- Don’t match ‘West’ (or a variation) if it’s prefixed by ‘George ‘.
- It’s using a ‘look behind’ ie. ?<= for ‘George ‘. So effectively after It’s matched ‘West’, it would discount ‘George West’.
- The exclamation mark is the ‘not’ symbol.
- Then we have the variations: ‘West’, ‘Western’, or ‘West of the’ (upper/lower case permutations).
- The question mark represents an ‘optional (zero or one occurrence)’, and the vertical bar an ‘or’ condition.
- The parentheses breaks the regex into ‘groups’ to which you can then apply qualifiers or cardinality rules.
- The ?> symbol this time is doing the a ‘look ahead’ to match anything matched from the West grouping variations but excluding ‘!’ a suffix of ‘ Virginia’ or ‘|’ ‘ Palm Beach’.
- So, finally ‘West Virginia’ or ‘West Palm Beach’ will not be matched, but ‘West Los Angeles’ would match.
Try setting text to ‘Northridge’, ‘George West’, ‘Middletown’, ‘South Bend’ etc and you’ll see it doesn’t abbreviate the text. But something like ‘East Los Angeles’, would become ‘E Los Angeles’.
Footnote:
After reading Mastering Regular Expressions, I found you can also set the regular expression to ‘ignore case’ mode with (?i) So the (N|n) can be simplified, by prefixing the RegEx pattern like so.
Obviously the RegEx would now match ‘NORTHERNEASTERN’, whereas it didn’t before, so it’s a broader matcher.
def points = [[k: ~/(?i:)northeast(ern)?/ , v:'NE'],
//... rest of code as before
]
Here are some associated useful links:
- Regex on Snipplr
- DZone Refcard on Groovy (has Regex symbols & examples)
- Javadoc for java.util.regex package
- The Java Tutorials: Regular Expressions
- Daniel S. Meyer’s Positive examples of positive and negative lookahead example
- Groovy Regex text manipulation example
- Groovy script demonstrating Regular Expressions to manipulate Dates
- Ascertaining how subtract works with Strings in Groovy
About this entry
You’re currently reading “Groovy Regular Expressions to abbreviate compass directions with look ahead and look behind.,” an entry on All things Grails and RIA
- Published:
- Tuesday, May 11, 2010 / 9:56 pm
- Category:
- Groovy
- Tags:
- Groovy, look ahead, look behind, RegEx
… And They Shall Know Me By My Speling Errors.
.: Blog.FlashGen.Com :: Mike Jones – Flash Platform Consultant :.
0xCAFEBABE Java Blog
:jasonrudolph => :blog
A Developer's Journal: grails
act:ualise | technology
Ad-Hockery: Gratuitous assumptions… which lead to the appearance of semi-intelligent behavior but are in fact entirely arbitrary. [Jargon File]
Agile Developer Venkat's Blog
All the Way to the Beginning Are you ready? Let's get started.
An Army of Solipsists
Ayone Blog
BeauScott.com AJAX, Flex and other RIA
Bill Gloff : Citytech
bit.fusion} my binary bits and pieces.
BlackBoxWhere technology and art disappear
Building Blocks – The Adventures of Joel Hooks and His Faithful Friend Code
Can’t see nothing but the source code
Carl Sziebertis a software engineer with an interest in Spring, Hibernate, Red5 and jQuery development.
Carol McDonald's blog | Java.net
Christophe Coenraets
Christophe Herreman
CK'S Blog about anything interests me…
Code adept: Random thoughts on Agile development and other things geeky.
Code Slinger | A DP Blog
Coding and More
Colin Harrington – Technologist, Consultant, Software Engineer, Entrepreneur and Musician
dahernan : This is a Unit Test
danlynn.com – Finding adventure in just about everything
def groovy : A mostly Groovy related blog
Delahuntyware: Software engineer by day, web ideas maniac by night.
Doug McCune – Data Visualization Engineer: I was Web 2.0 before your grandma was Web 2.0
Duncan Sommerville – Developer Thoughts
Dustin's Software Development Cogitations and Speculations
Enfranchised Mind | programming, politics, & other religious issues
Epseelon IT
Flex | Xebia Blog
FlexMonkey
Foxgem's Groovy Notebook
Getting Groovy (and Grails)
Glen Smith
Graeme Rocher's Blog
Grails « Matthias Bohlen
grails blog just one piece of the open source puzzle
Grails GeekGrails tips I wanna share with you ;)
Grails Inside: Grails, Groovy and related technologies.
grailsbubbles
Grey-Bearded Geek : Random Thoughts Of A Middle-Aged Software Engineer
Groovy Grails and Webby Things
Hamlet D'Arcy : Behind the times
http://msimtiyaz.wordpress.com/
i-grails A blog about the things I work with: IBM i (or i5/OS, OS/400, System i, i5, iSeries, AS/400) and grails.
InformIT: Steven Haines
IntelliGrape Blog
Ironic Programmer
James Williams' Blog
Java Architecture Rambings
Jeffry Houser's Blog
Jeffs Groovy Web Log
Jim Shingler's thoughts about Java, Grails, Groovy, Eclipse RCP, and life in general
John Ferguson Smart : Wakaleo Consulting
John's Blog: I'll start and see what happens…
Josh Long : Code Coffee
Joy of Groovy
Judd Solutions
Ken's Technical Thoughts
Kickin' down the cobblestones
LD. Music, software, life… and stuff.
Lean Java Engineering
Liars Poker
Lucas Teixeira: My own tech words
Marcel Overdijk's Blog
Mark Palmer
Matt Woodward's posterous
Matthew Taylor
Messages from mrhaki
Michael Kimsal, web technology expert, JavaScript, PHP, AIR, Grails, Groovy, senior web architect
Mike Hugo : Piragua Consulting
MJWall.com
moongrails : def blog = {println "all about grails"}
New York Java Consultant : Java, REST, Consulting & the fun of programming
Ola Bildtsen
On Technology and Tea « Matthew Morten
organic thoughts : seemingly random and unorganized bits of information
Our Craft : Making it better
Paul Bakkar
PETER LEDBROOK : A search for meaning in software and life
R Blank Los Angeles Technologist, Entrepreneur, Educator and Community Leader
Raible Designs | Matt Raible's discussions on Java and Web Development
refactr blog on software development, design, agile processes, and business
Ryan's blog on Adobe Flex, web technology, and other miscellanea
Scott Davis
Scott Ryan's Blog
Shawn Hartsock: Thoughts & ideas
Silvio Wangler's Blog : Impressions, notes and stories about software, icehockey
smallwig : Personal ramblings of Kevin Bourrillion, senior software engineer at Google, Inc.
Space of Flex/AIR technologies
Srinivas Guthula's Blog
STATE YOUR BIZNESS : Energized Work
Steamtrain to Hyperspace : The technical blog of Daniel Honig
Steve Dalton's blog | Refactor
Stuff I’ve learned recently…
Sven Haiges, exposed
Sven Lange: A web geek's blog
techno.blog("Dion")
Ted Naleid
The MetaSieve Blog
thejavajar : java, groovy, flex, python, ruby
There Can Be Only One : Blue Train Software
Thom Nichols blog
Tomás Lin’s Programming Brain Dump
Transentia pty. ltd.; development, consulting, training at the leading-edge of technology
Václav Pech Weblog
Zorched / One Line Fix Musings of a software developer in Milwaukee, WI.
1 Comment
Jump to comment form | comment rss [?] | trackback uri [?]