Technology Madness: Scala

Showing posts with label Scala. Show all posts

Thursday, January 3, 2013

Database Record Updates with Slick in Scala (and Play)

This is a simple operation that I found absolutely zero reference to in the documentation or the tutorials or the slides. Eventually after digging through old mailing lists, I came across the solution:

        
(for { m <- MessageQ if m.id === oldMessage.id } yield(m))
  .mutate(r=>(r.row = newMessage))(session)

This is for a simple message class and a function that takes two arguments: oldMessage and newMessage. The frustrating thing is that this is inconsistent with the simple formula for a single column update:

MessageQ.filter(_.id === 1234L).map(_.subject)
  .update("A new subject")(session)

When you try to apply this thinking to an update, you end up at a dead end. The mutate operator is also used for deletion too:

MessageQ.filter(_.id === 1234L)
  .mutate(_.delete())(session)

Note that you can typically leave out the session argument as it's declared implicit within the appropriate scope. I'm also switching between syntax alternates because for some reason, either my IDE or the compiler gets grumpy when I try to use the filter() style rather than the list comprehension style in certain contexts. I still have to figure that out.

I'd like to write a longer post later at some point, but this at least covers the highlights.

Thursday, June 21, 2012

Parallel Processing of File Data, Iterator groups and Sequences FTW!

I have occasion to need to process very large files here and there. It seems that Scala is very good at this in general. There is a nice feature in the BufferedSource class that allows you to break up file parsing or processing into chunks so that parallelization can be achieved.

If you've tried the obvious solution, simply adding .par, the method isn't present. So, you might convert to a List with toList. When you convert like this, Scala will then compile all the lines into a List in memory before passing it on. If you have a large file, you'll quickly run out of memory and your process will crash with an OutOfMemoryException.

BufferedSource offers us another way to do this with the grouped() method call. You can pass a group size into the method call to break your stream into a sequence of lists. So, instead of just a String sequence made up of millions of entries, one for each line, you get an set of Iterators made up of Sequences with 10,000 lines in each. A BufferedSource is a kind of Iterator, and any kind of Iterator can be grouped in this way, Sequences or Lists included. Now you have a Sequence type with a finite element count which you can parallelize the processing on and increase throughput, and flatMap the results back together at the end.

The code looks something like this:

io.Source.stdin.getLines().grouped(10000).flatMap { y=>
      y.par.map({x: String =>
        LogParser.parseItem(x)
      })}.flatMap(x=>x).foreach({ x: LogRecord =>
         println(x.toString)
      })

So with this, we can read lines from stdin as a buffered source, and also parallelize without the need to hold the entire dataset in memory!

At the moment, there is no easy way to force Scala to increase the parallelization level beyond your CPU core count that I could get to work. This kind of I/O splitting wasn't what the parallelization operations had in mind as far as I know, it's more a job for Akka or similar. Fortunately, in Scala 2.10, we'll get Promises and Futures which will make this kind of thing much more powerful and give us more easy knobs and dials to turn on the concurrency configuration. Hopefully I'll post on that when it happens!

Saturday, May 12, 2012

Scala is very nice - very very nice

Today I am gushing over Scala's par method and XML literals. I am fetching about 30,000 entries over REST calls. The server isn't super fast on this one, so each call takes a bit of time. Enter list.par stage left.

list.par creates a parallelizable list which given an operation will perform it in parallel across multiple CPUs. It spawns threads and performs the operation, then joins all the results together at the end, very handy.

This little three letter method is turning what would be a very very long arduous process into a much less long one. Much much less.

val myList = io.Source.fromFile("list.txt").getLines.par.map { x =>
  callService("FooService", "{id=\""+x"\"}")
}

It gets better. In Scala, XML can be declared as a literal. Not only that, but it runs inline like a normal literal, with a few special rules. This service is combining a bunch of json into an XML output.

val myOutput = io.Source.fromFile("list.txt").getLines.par.map { x =>
  callService("FooService", "{id=\""+x"\"}")
}.map { x =>
  Json.parse[Map[String, Object]](x)("url").toString
}.map { x =>
  <entry>
    <url>{ x }</url>
  </entry>
}.toString

Which I can now happily write to wherever I need to, a file, or a web service response. Nifty in the extreme.

In 2012, we live in a world of JSON and XML. Perl had it's day when text processing was king. Today, a language is needed that can cope with JSON, XML and Parallelization and still yield sane-looking code. I'm not a big Ruby fan, as anyone who knows me will tell you, but I'm willing to keep an open in. I'd like to see if Ruby can do this kind of thing as elegantly and easily and demonstrate it's a language for the web in 2012. Also, I should mention Akka as well, though I don't yet know enough about it, other than it can allegedly take parallelization inter-compuer with similar simplicity.

Wednesday, May 9, 2012

Simple Scala scripts : Scan a directory recursively

I'm using Scala increasingly as a scripting language at the moment. As my confidence with it is increasing, I'm finding it's becoming more and more useful for those throw-away scripting situations. Especially when then end up being not so throw-away after all.

def findFiles(path: File, fileFilter: PartialFunction[File, Boolean] = {case _ => false}): List[File] = {
  (path :: path.listFiles.toList.filter {
    _.isDirectory
  }.flatMap {
    findFiles(_)
  }).filter(fileFilter.isDefinedAt(_))
}

(replace {} with (), ditch newlines and it goes on one line well-enough, just doesn't fit in a Blogger template that way)
We might be duplicating the a shell find:

find | grep 'foo'

find ./ -name "foo"

And whilst the Scala is more complex, the Scala function can do operations on a File object, which gives you a lot of the rest of the power of the find command thrown in to the bargain. Plus, as it accepts a partial function, you can chain together filters. If you truly just wanted an analog for find:

def findFiles(path: File): List[File]  = 
  path :: path.listFiles.filter {
    _.isDirectory
  }.toList.flatMap {
    findFiles(_)
  }

Which is less complex that the first. This is still more work than find, but, the list you get back is an actual list. If you added anything useful to your find, say an md5 for each file, it gets less happy

find ./ | awk '{print "\""$0"\""}' | xargs -n1 md5sum

Maybe there's a better way, but that's what I've always ended up doing. The Scala is starting to compete now. Bump up the complexity one more notch, and I think Scala actually starts becoming less code and less obscure.

You might also notice that the example above can be fit nicely within the Map/Reduce paradigm. Scripting that is not only relatively easy, but can also be thrown at Hadoop for extra pzazz, and NoSQL buzz-worthyness.

Thursday, March 29, 2012

Play and Heroku

I've been messing around with Play, and decided that I'd push it up to Heroku based on the tutorial and things I've heard about Heroku.

I'm going to expand on this later but, if you forget the Procfile when deploying your Play application, it may cause your app to get totally jammed and never be able to start. I spent the next hour or two trying to figure out why my app wouldn't start, even after I'd put the Procfile in.

I solved the problem by deleting my app on Heroku and creating a new one. Then it started fine.

The docs on pushing a Play 2.0 app to Heroku all disagree with one another too, so I hope I can find a few to post a tutorial based on how I got it working!

Saturday, March 24, 2012

First day with Play

I started looking at play 2.0 for the first time today. I got a few hours with it at least, and I've been impressed with a few things.

The first and biggest thing is perhaps the most simple: compile on the fly. Grails does this, but very badly, and if I'm observing right, I think I can see why. It seems that Play compile checks at request time, not at file save time. As one who grew up computing, Ctrl-S or it's equivalent has become a nervous tick. Grails recompiling at save time almost always just ends up trashing the environment as I save about ten times a minute, and I end up having to restart it, which is very slow.

With FSC, Play compiles changes very quickly and I barely notice a lag at all. It doesn't get stuck anywhere I've noticed yet either like Grails can.

I feel like within a couple of hours, I got pretty far into having a functional, albeit basic web app going. Working with Anorm is interesting too, I'm not sure if I like how they've done it yet, but after years living with JPA and Hibernate's arcane query structures and unexpected query stupidity (although there was always a good reason, it was still annoying), I find this way of integrating SQL and code better than most. It has some similarity with O/R Broker which is what I've been using with Jersey so far, but Anorm is more sophisticated and I think easier to work with.

The error reporting in Play is also excellent. You can see quickly and precisely what went wrong with your code, there's no guesswork and decrypting enigmatic error messages, it just tells you with your source code right there: this is where it's broken!

Saturday, December 3, 2011

OMG <3 Scala

Today was a good day. I wrote a somewhat complex piece of logic for a jump planner for EVE online, and, because of the awesomeness of Scala, it worked as conceived, first time.

This is one of the many reasons I love functional programming. As you take your problem and reduce it down to it's most basic components, strip out logic, simplify to a mathematical expression, simplify that expression, suddenly errors become clear in the very writing of it.

Scala feels so much closer to that expression than Java. So much verbosity is inherently prone to error, writing error free code in Scala seems dramatically more possible than in Java.

SBT and Gradle

Liistening to a not quite cent episode of Java Posse podcast, I heard them talk about two new build tools that are in the running to succeed Maven and Apache Ivy. They are SBT and Gradle respectively.

As with the rest of the Java community who are moving on, one is Scala based and the other is Groovy based.

I'm not going to get into a Scala vs Groovy thing here, as i can't say I know either of them to a high enough level to really make a fair comparison.

I've had a poke at both of them, and so far, I'm really diggin the simplicity of SBT and it's interactive features for working with Scala, which are super nice.

I've only got a small example going with my EVE project, and it so small it's barely worth posting. Having said that, it did clue me in to working with scalatest, which is much more elegant than JUnit. I'm curious to use it also with JUnit and java code, I've heard that it can work well with both Scala sources and Java. The interactive test suite features seem like a really awesome feature, continuous integratinon built right in.