Discrete Passions

Thursday, September 29, 2011

CNBC's "Explain This" makes KhanAcademy a Pioneer of the "HyperSchool"

Anybody think that CNBC has delivered the latest unanticipated Internet service that's a (revolutionary) turn for the better?

If you peruse my blog, you know I'm a big fan of khanacademy.org and Sail Khan's library of over 2,000 9-to-15 minutes classes. Perfect for the Web, for our rapid paced lifestyle and increasingly limited attention spans.

Why do I think this new features turns khanacademy into a new class of school I'm going to call "hyperschools"? Because it is the beginning of the destruction of our view of education as a silo where school is school and it happens during school. With "CNBC Explains", CNBC and Khan are showing that learning can be more spontaneous, more targeted, more convenient and more "in the moment."

The parallel is with hyperlinks themselves. They give the Web its ability to cross-wire information. Following hyperlinks, you can follow all kinds of trails of content (knowledge). Hyperschools, of which there is now only one -- khanacademy.org --, provide the same flexibility. Let me educate myself by attempting to read this interesting article about LIBOR (inter-bank trading rates) and I'll click on these "Explain This" icons as I see fit in order to more deeply understand what I'm reading (by following a quick 10 minute Khan class focused on specified, relevant topic (to the article).

Another metaphor is the Star Trek-like scenario of "I want to be expert enough to follow this article on Treasury Bonds, so I'll take a pill on everything I need to know to understand the article.

Hopefully, you get the idea. It feels like a subtle new B2B enhancement of a company's web site (CNBC to Khan)... but the more I thought about it, the more I like the feeling I got of yet more breaking of old barriers to education. Love it!

Monday, September 19, 2011

A more reasonable way to comment blocks of shell code...

I love using shell script for launching java and groovy apps because they're so good at setting the table in a way that keeps the application much simpler... especially when the shell script can handle common needs which is often the case for Operations-style functionality. Example shell script functionality includes

determining if there's sufficient space on the file system,
collecting files to process,
configuring the environment (as in defining traps that clean up temp files and remove lock files when an app closes either naturally or via control-C interrupt, or re-directing I/O so that a control-C won't kill the process you're running).

Because operations-style applications suffer from out-of-site, out-of-mind syndrome, having a solid strategy with shell scripts that can manage and measure the operating environment, and scream bloody murder when things aren't right, makes them a worthy design component.

I've never liked "the fact" that to comment out a block of code in unix shell programming I had to insert # symbols in front of every line.

I found out over the weekend the ideal way to do it using a here document and the ":" operator, which is a no-op

# all this code inside this section document
# is now invisible to the shell interpreter
# Add to use a # anywhere.

And here's how to do the equivalent with a here document.

: <<STUFF_TO_PASS_TO_COLON
all this code inside this here document
is now invisible to the shell interpreter
Didn't have to use a # anywhere.
STUFF_TO_PASS_TO_COLON

Saturday, August 27, 2011

Mongodb Tip #1 : dumping bson into legit json objects

Well, it's a bit embarrassing to start off with a hack, but what the hey... it made my life instantly easier.
So here was my challenge... to dump to file 40,000 BSON (JSON-like) objects from my mongodb so that I could slurp them up with Groovy 1.8 and its new JsonSlurper. Then I'd be able to use dot notation to get the values I want.

Here's the sequence of what I did. BTW, in case you didn't know, there's no native method within a mongo session to just say db.hats.dumpAll()... at least I haven't found it yet. So you need to get outside the normal mongodb command line session and use these utilities that come with the mongodb distribution.

/var/lib/mongodb-linux-i686-1.8.2/bin/mongodump --db mydb --port 27017 --collection hats --query '{ }' --out dumper
/var/lib/mongodb-linux-i686-1.8.2/bin/bsondump dumper/mydb/leads.bson

First, mongodump dumps the whole collection I called "hats". But it's not in human readable form. You need bsondump for that. bsondump, by default, converts every row of mongodump output to a BSON string.

So, I thought I could just write a few lines of Groovy to convert each JSON string into an object and use dot notation to dereference the values I wanted. But I forgot I was still dealing with a BSON string. My problem was this:

{ "_id" : ObjectId( "4e56d1c780acbde57e951402" ), "size" : "8", "color...

As you can see, the ObjectId string is not itself in quotes and therefore messes up JsonSlurper. So I submit the following solution that worked for me.

def f = new File(fname)
def lineCount = 0
f.eachLine { line ->
def line2 = line.replaceFirst('ObjectId\$','') .replaceFirst('\$,',',')
try {
def slurper = new JsonSlurper()
def res = slurper.parseText(line2)
println res.size + "|" + res.color
lineCount++
} catch (Exception e) {
e.printStackTrace()
}
}
println lineCount + " lines encountered."

I used replaceFirst instead of replaceAll() to avoid overkill and generally screwing up other innocent content that might contain the right matching parentheses with comma. Unlikely, but safe(r).

By the way, the reason for the exception handling is that bsondump outputs some stats, not in BSON format, every now and then. Odd. Obviously, I could have handled that more gracefully, but the exception handling did the job and the lines encountered gave me the target number I was looking for.

That's it. Got the job done. Enjoy.

David

Sunday, August 21, 2011

mongodb goodness, so far...

Lately, I've been working pretty extensively with mongodb. I classify it as a "JIT DB", as in Just-In-Time Schema Database. It's perfect for lazy moments when you're writing some code and it dawns on you that you need an additional field or even an additional table (called "collections" in Mongo).

"Lazy" is the wrong word. mongoDB is in a class of technologies and strategies that foster inspired notions and reduce barriers (like time and patience) to assert your ideas. SQL doesn't do that for me. The level of required schema pre-work and retrofitting has nipped some cool ideas in the bud... mongoDB encourages me to do it right now because I don't see any impedance! Throw together shell scripting, Groovy and mongoDB and let's just do it!

Here's a quick example that will hopefully illustrate for you the low impedance of mongoDB (and lots of other unSQL databases)...

Let's sort a table called myData by a timestamp field.

db.mydata.find().sort({timeStamp:-1}).

This is equivalent to

select * from mydata order by timeStamp desc;

mongodb comes back and says something to the effect of "can't do a big sort like this without an index." Well there you go... so you type

db.mydata.ensureIndex({timeStamp:1})

You try the sort again and it works. You've just experienced something like a conversation with your database! "I can't do this... you know what to do..."

In full disclosure, I acutally use Groovy for all my Java-style development now. I've completely lost interest in Java because 1) Groovy is way more satisfying and productive and 2) I, currently, have no
need to use Java for squeezing max performance out of code. I mostly use Groovy for batch-style work, updating Salesforce.com via the Web Services API and such.

With mongoDB there's a nasty little conceptual hurdle to jump over, especially switching back and forth between using native javascript commands and Java driver programming.
In Java, there are at least 2, 3 or more ways to construct a db operation

def doc = new BasicDBObject().append("lastName","Smith").append("firstName","Jack")
myData.query(doc)

...versus...

def person = [:] // Groovy syntax
person.lastName = "Smith"
person.firstName= "Jack"
def doc = new BasicDBObject(person)
myData.query(doc)

Straightforward enough. In the native mongo language, the query looks something like this...

db.myData.find({lastName:"Smith",firstName:"Jack"})

which is equivalent to

select * from myData where lastName = 'Smith' and firstName = 'Jack';

Now, because I'm a Linux guy and I love the power of intermingling shell scripts to glue Java/Groovy together as needed, here's one way way you might integrate that mongoDB (javascript) script language in a bash shell script using a here document.

function findUser {
lastName=$1
firstName=$2
mongo <<EOF
use employeeDB

var criteria = {lastName:"${lastName}",firstName:"${firstName}"}
var answer = db.leads.find(criteria)
answer.count()

EOF
}

findUser Smith Joe

Enough for now. For the next few weeks, I'll post some of the mongodb commands and concepts I found most useful.

Thursday, April 28, 2011

Theory on Time as a symptom, not the cause, coming together...

In a recent Facebook discussion, I referenced the development at http://www.physorg.com/news/2011-04-scientists-spacetime-dimension.html re: "...space-time has no time dimension..." and how it reinforces my gut-level feeling about the concept of time. Thanks to a response by my former Lutris colleague Daryl, a few thoughts came together.

The folks behind this article didn't go far enough to put some visual teeth into it (for us lay people), and
Once again, Julian Barbour and his work come to my rescue to explain what could be going on...

Barbour, his book called, "The End of Time" and his site http://platonia.com mean a lot to me. In particular, he has proposed a view of existence he's dubbed Platonia. Platonia looks like a typical landscape rendering you might see in an landscape architect's office except that it represents the likelihood of events (i.e., probability). The illusion that is created as physical space changes in sequence is the perception of time.

While looking at one of Barbour's very recent papers, I saw that he appears to be driving to a quantum theory of the universe. Or at least he's describing why the universe could be described in quantum terms. As I understand things, time prevents folks from getting to a quantum explanation... but only if you see time as a building block and not an outcome of the dynamics of things. If you remove time from the space-time equation and replace it with the sequencing of change, then his Platonia view of things as movements powered by probability, makes a quantum thoery of space, and its coming into existence, more attainable.

So that's as far as I can run with the original article I referenced at the beginning of this post. There's more to come now that I have a new reason for exploring Barbour's work again...

Life is so cool...