Well, it's a bit embarrassing to start off with a hack, but what the hey... it made my life instantly easier.
So here was my challenge... to dump to file 40,000 BSON (JSON-like) objects from my mongodb so that I could slurp them up with Groovy 1.8 and its new JsonSlurper. Then I'd be able to use dot
notation to get the values I want.
Here's the sequence of what I did. BTW, in case you didn't know, there's no native method within a mongo session to just say db.hats.dumpAll()... at least I haven't found it yet. So you need to get outside the normal mongodb command line session and use these utilities that come with the mongodb distribution.
/var/lib/mongodb-linux-i686-1.8.2/bin/mongodump --db mydb --port 27017 --collection hats --query '{ }' --out dumper/var/lib/mongodb-linux-i686-1.8.2/bin/bsondump dumper/mydb/leads.bson
First, mongodump dumps the whole collection I called "hats". But it's not in human readable form. You need bsondump for that. bsondump, by default, converts every row of mongodump output to a BSON string.
So, I thought I could just write a few lines of Groovy to convert each JSON string into an object and use dot notation to dereference the values I wanted. But I forgot I was still dealing with a BSON string. My problem was this:
{ "_id" : ObjectId( "4e56d1c780acbde57e951402" ), "size" : "8", "color...As you can see, the ObjectId string is not itself in quotes and therefore messes up JsonSlurper. So I submit the following solution that worked for me.
def f = new File(fname)def lineCount = 0
f.eachLine { line ->
def line2 = line.replaceFirst('ObjectId\\(','') .replaceFirst('\\),',',')
try {
def slurper = new JsonSlurper()
def res = slurper.parseText(line2)
println res.size + "|" + res.color
lineCount++
} catch (Exception e) {
e.printStackTrace()
}
}
println lineCount + " lines encountered."
I used replaceFirst instead of replaceAll() to avoid overkill and generally screwing up other innocent content that might contain the right matching parentheses with comma. Unlikely, but safe(r).
By the way, the reason for the exception handling is that bsondump outputs some stats, not in BSON format, every now and then. Odd. Obviously, I could have handled that more gracefully, but the exception handling did the job and the lines encountered gave me the target number I was looking for.
That's it. Got the job done. Enjoy.
David