Monday, July 2, 2012

Logging and Debugging

I'm finding one of the biggest challenges working with Scala is debugging, and secondarily logging.  The former seems to be a tooling issue as much as anything, and to be honest, the latter is a matter of my putting time in to figuring it out.

With debugging, break points in the middle of highly condensed list comprehensions are very very hard to make.  I end up mangling the code with assignments and code blocks that I then have to re-condense later.

I've attached a debugger using the usual jdwp method, but it slows everything down so badly, and it's just not that much better than print statements.  I've been going through the Koans with a new employee at work, and it's been helping both of us greatly.  There's one koan that describes a way to sort of "monkey patch" objects, and as much as I dislike that approach in general, it sure as heck beats Aspects which are hard to control and often fickle unless they are part of your daily routine.

I came up with a small monkey patch for the List class that lets me use inline log function calls to get basic information about the state of a list in the middle of a comprehension chain, so I include it here in the hopes that somebody will find it useful, or have some better ideas!

class ListLoggingWrapper[+T](val original: List[T]) {
  def log(msg: String): List[T] = {
    println(msg + " " + original.size)
    original
  }
  def logSelf(msg: String, truncateTo: Int = 4096): List[T] = {
    println(original.toString().take(truncateTo))
    original
  }
}

implicit def monkeyPatchIt[T](value: List[T]) = new ListLoggingWrapper[T](value)

This helpful snippet allows you to call a new method 'log' on a List object that prints out the List size, and similar with 'logSelf' which allows you to print out the result of toString, truncated (working with large lists means you always end up with pages of hard to pick through output if you don't truncate I've found).

A list comprehension chain ends up looking something like this:

Util.getJsonFilePaths(args(0)).map {
      x: String =>
        new File(x).listFiles().toList.log("File List Size").filter(file => {
          Character.isDigit(file.getName.charAt(0))
        }).map(_.getPath).filter(_.endsWith(".json")).log("Json File Count").flatMap {
          jsonFile =>
            io.Source.fromFile(jsonFile).getLines().toList.log("Line Count for " + jsonFile).map(line => Json.parse[List[Map[String, Object]]](line)).flatMap(x => x).log("Elements in file").logSelf("Elements are", 255).filter(jsonEntry => {
              jsonEntry.get("properties").get.asInstanceOf[java.util.Map[String, Object]].asScala.get("filterPropertyHere") match {
                case None => false
                case Some(value) if (value.toString == "0") => false
                case Some(value) if (value.toString == "1") => true
                case _ => false
              }
            }
            )
        }
    }

Which is a piece of code to aggregate data across multiple JSON files filtering by a given property using Jerkson (which I still feel like I'm missing something with as it seems harder than it should be).

No comments:

Post a Comment