Saturday, August 17, 2013

Chipping away at manipulating and querying mongoDB sub-documents (with Groovy/Java)

I'm building a little library of commonly needed MongoDB Groovy scripts using the MongoDB Java driver. Sub-documents are my favorite aspect of mongodb scheming. They're how I like to illustrate a core difference between document DBs and the "joining tables" world of SQL. So I figured I'd tackle storing and fetching sub-documents.

There are more programmer/java-friendly strategies such as Morphia. My preference is to start with a reasonably low level API to understand some foundation perspective before jumping up a level or two of very helpful abstraction. It's a control thing, I'm sure... ;}

It's easy to use native MongoDB javascript to create some test data, so I started with that. Below I'm using a collection called "diary". This document identifies the activities and activity dates performed by John.


mongo
> use blog
> db.diary.insert ({name: 'john'}, {'activities':[]}); // setup activities array for subsequent content updates

> db.diary.update({name:'john'},{"$push" : {"activities" : { "date" : "20130812", "name" :"Go to school"}}});

> db.diary.update({name:'john'},{"$push" : {"activities" : { "date" : "20130817", "name" :"Bird watching on Rio Grande"}}});

> db.diary.findOne({}, {_id:0}); // select all docs; don't display the _id
{"activities" : [
       {
               "date" : "20130812",
               "name" : "Go to school"
       },
       {
               "date" : "20130817",
               "name" : "Bird watching on Rio Grande"
       }
       ],
       "name" : "john"
}


I now have a key called "activities" that collects sub-documents in array form. Each item in the array is a sub-document representing a date-stamped activity, such as "Go to school", which I do every day in one form or other.

In the code below loops through any top-level documents, looking for the presence of the "activities" key. Since we're talking MongoDB, there's no requirement that such a key exists in every document. The responsibility of whatever you decide that policy should be is implemented in your code!

Here's some tested code the performs the query I've been looking for.

// main...
    myApp.dumpActivities(fetchActivities('john'))
// ...end of main

def fetchActivities(name) {
  def activities = [:]
  def q = new BasicDBObject().append("name",name)
  def cursor = diary.find(q)
  while (cursor.hasNext()) {

    def activityDocs = (BasicDBList) cursor.next().get("activities")

    # DANGER -- This loop assumes one event per day...
    for (BasicDBObject activity: activityDocs) {
      activities.put(activity.date, activity.name)
    }
  }
  return activities
}

def dumpActivities(activities) {
  activities.each { dateStamp, activity ->
    println "Activity: "+ activity + " on " + dateStamp
  }
}

So we're looping through each document looking for a match on the key "activities". When found, we use the key casting of the value associated with the activities key to create a BasicDBList object. Basically, we're treating the activity key's sub-document as the array that it is.

The innermost loop processes each sub-document as the hash (represented by the BasicDBObject object) that it is, collecting each key-value pair ("date","activity") as encountered.

So! Now I have the basis of my standard query pattern. Next post should be about inserting new activity sub-documents with Groovy.

Saturday, April 13, 2013

Magic numbers at it again: approaching critical mass of knowledge as video

When I was running Lutris Technologies in the mid-90's, everybody knew I had a fascination in two things: serendipity (as it influences business) and what I called "magic numbers."  I'll write about the serendipity stuff another time... but I used the concept of magic numbers as associated with the hiring of any new employee and the eventual impact on our Lutris culture.

  • The 7th employee and, all of a sudden, the need to call meetings.
  • The 10th employee and, all of a sudden, the need to hire an office manager.
  • The 12th employee and, all of a sudden, the puzzling interruption in the perfect flow and distribution of knowledge (amongst all of us).
It was somewhere around 20 when I realized, "geez, we need a real CEO." That's another story in itself.

But, to get back to the real reason for this posting... and that's about what I've observed recently and that observation is that video has become a true knowledge base.

My favorite example is Charlie Rose at http://www.charlierose.com When I just feel like learning something knew or wonder if he ever interviewed somebody I'm curious about, I'll go to his website and search.

It used to be that you searched for text, such as wikipedia.  I still do that.  But if I'm in a real learning mode, I go video first.

The impact?  I used to think of Youtube as a resource for music and kitten videos.  Instead, I watch videos on Quantum Mechanics or a new salesforce.com feature.  Or, as I just did, I search Youtube for videos on "defining mongodb schemas."

It's a wonderful phenomenon.  The charlierose.com is particularly interesting to me because I have always suspected his politics and social views were similar to mine.  So I know I'm going to like his questions of those he interviews.  So it's more than a site of pure knowledge.  It's one that supports an angle that I relate to.

So, somewhere along the line, in the past 3 or so years, and maybe I'm just late to the knowledge party, but one of those magic #'s was reached.  I guess it's the # that represents a sufficient # of topics (relative to my direction of personal growth and interest) supported by a critical mass of videos.  It's a curious kind of transformation because you don't realize it until it's been there for awhile.  Fun stuff.

So back to my mongo video...

David

Sunday, March 24, 2013

Tackling programming in chunks

Sometimes, out of nowhere, you discover you've acquired some wisdom over the years. Wisdom, in this case, was probably inherited from my Unix/Linux background... namely, break a problem into chunks, and attack it left to right.

For example, take a list of names, sort them and get rid of the redundant ones.

cat myNameListFile.txt | sort | uniq > myUniqListOfNames.txt

Unix has a nice way of pipe'ing the output from one app or tool to the next.  As you become familiar with the available tools, you start thinking in terms of how you can sub-divide tasks.

For example, you need to generate some customer numbers from your database.  This is a one off task, though it has the possibility of being useful further down the road.

You're not sure about the SQL.  You can do it, but it's going to take awhile to figure out that lengthy thing. And feeling comfortable with Unions and Joins can be quite a challenge. So, why not do it in chunks?  Why not do it in a series of SQL calls, feeding the results of the first query into the second.

But what if that isn't quite working for you in terms of doing so confidently.

This is where a bit of bash scripting or Groovy comes in.  There are more tricks to bash shell managing SQL queries and results than you may know.  I will address that in a future post.  It's how I survived before I discovered Groovy.

Here's the strategy:
1. Create a SQL query that gets all your customers' ID.
2. Execute the query from Groovy so that you can capture the results in a list. 

Maybe there were lots of conditions applied to that customer list.  Perhaps they're the customers who are not suspended and they reside in Ohio and they're new accounts as of 3 years ago.  Simple considerations, but nonetheless, considerations that lengthen your thought process and your SQL query.

Now you can move to the next phase or chunk.  All of a sudden that grand design of a SQL query has gotten a little bit simpler.  

Sunday, October 07, 2012

Career bootstrapping for young people: an alternative strategy

At a Satellite coffee shop here in ABQ yesterday, I heard a guy I knew to be a successful local businessman giving a 24 year old fella advice on breaking into business after college.  He focused on how he should look, dress and act.  I kept waiting for him to get past the surficial stuff but, alas, he kept going on about how to shape shift into something acceptable to the corporate stereotype.

So, in a fit of my occassional righteousness, I stood up, looked at the the advice-giving guy and said, "Don't forget to tell him about deed" and walked to the other end of the coffee shop to continue my work.  To his credit, the business guy understood my point and immediately translated my point to the young man.

So, the point of all this is, yes, it's a good idea to start combing your hair, but it's also important to think beyond just "getting" that first job out of college.
  1. Think in terms of "infiltrating" a business.  By that I mean, be determined to learn how the company works, from its ultimate business plan to how it implements that plan in manufacturing, sales and elsewhere.  Position yourself physically -- where you start in the organization.  Believe it or not, the mail room is a great way to achieve that, assuming there are still mail rooms in businesses.  As far as your "insertion point" goes, your ideal vision of that first job may not serve you best.  Ivory tower positions have their consequences.  "Big picture" ignorance is one of them.
  2. Start on the manufacturing side of your ultimate goal.  From that perspective, you'll see the reality of things.  As a wanna-be software developer, I saw how the company hired hordes of young developers from colleges around the Bay Area and those kids never had any idea what happened to their software after it was handed off.
  3. Then, the hard part, which goes hand-in-hand with big pay-offs, is to figure out the weaknesses in the system and find something you think you could improve.  Find a "coach" within the programming group, or general management, and tell them what you're seeing.  See if they'll advise you on a strategy.  For me, it was a fella named Mark Wong who said, "We're drowning in our current workload.  Here's a programming book.  Teach yourself how to program in our language and build a program for that missing process."
So, that's what worked for my career.  By shaving a couple of days off of how things went from design to manufacturing, I got noticed, promoted and even sent to school by the company to become a full engineer.  In today's world, if you're doing the software thing, I highly recommend you pick a company that has truly adopted the Agile methodology.  It's fundamental basis in transparency amongst contributing teams undermines the ivory tower thing.  Just make sure it's a true Agile culture.

So there it is.  I know times have changed.  But my gut tells me that this kind of career strategy is still a valid option.  Just be sure to comb your hair, for goodness sake!

Saturday, July 07, 2012

Key to understanding Higgs Boson... field versus particle

It's a bit of a stretch going from javascript to physics, but why not?  On the web today, "answers" are all around us.  It's fun to find the ones that work for me and, perhaps, you...

I watched the video by John Ellis referenced below.  It was the first explanation by anybody that made the Higgs Boson thing click for me, conceptually speaking.

For of all, composite and elementary particles (electrons, protons, neutrons, photons, neutrinos, quarks, gluons, klingons (just kidding) and leptons) have no mass intrinsically.  From a parental perspective, they're like pre-teens.  Full of nothing but potential... It's through their daily travels that they acquire mass as you would pick up lint when you put on black clothing  and your dog wants some attention).  But some particles, like photons and gluons, never wear lint-attracting clothing...

So here are my video points from Ellis' video:
  • The key is understanding that Higgs Boson introduces both a field and the particles (Higgs Bosons) that reside within that field.
  • It's the particle that CERN detected, not the fieldThe field is implied by the prediction and detection of the particle.
  • It's the journey through the Higgs Field, which is everywhere in our universe, that gives particles their mass, such as electrons.  
  • Certain particles that have no mass, like photons, can traverse the Higgs Field much like the bottom of a skier's ski... No Higgs Boson particles "stick" to the slick surface of that photon and therefore the photon acquires no mass.
  • What did CERN discover?  They discovered the particles, or snow flakes, that reside throughout the Higgs field... namely the Higgs Bosons. So they did not discover the field itself.   They discovered the particles that reside there... the things that latch onto certain particles, giving those particles mass.
  • So, another way of looking at it is that particles with no mass, like the photons that deliver light, are smooth and slick and therefore the Higgs Field that they plow through has little consequence in terms of build up of Higgs Boson particles.  And particles that acquire mass have a rough surface and therefore accumulate Higgs Boson particles along their journey.  The rough and smooth attributes are not necessarily attributes of particles.  I'm just using them as conceptual elements.
So, if you need to see moving pictures, watch the video below.  It's a good one.

If you want to know more about where the sub-particle name Boson came from (since Higgs gets all the press), be sure to read the web page at ibtimes.com.  Just another story from the amazing culture of India... with a great story about the role that Einstein played.

https://www.youtube.com/watch?feature=player_embedded&v=QG8g5JW64BA

David