Showing posts with label development. Show all posts
Showing posts with label development. Show all posts

Saturday, May 12, 2012

Scala is very nice - very very nice

Today I am gushing over Scala's par method and XML literals. I am fetching about 30,000 entries over REST calls. The server isn't super fast on this one, so each call takes a bit of time. Enter list.par stage left.

list.par creates a parallelizable list which given an operation will perform it in parallel across multiple CPUs.  It spawns threads and performs the operation, then joins all the results together at the end, very handy.

This little three letter method is turning what would be a very very long arduous process into a much less long one. Much much less.

val myList = io.Source.fromFile("list.txt").getLines.par.map { x =>
  callService("FooService", "{id=\""+x"\"}")
}

It gets better. In Scala, XML can be declared as a literal. Not only that, but it runs inline like a normal literal, with a few special rules. This service is combining a bunch of json into an XML output.

val myOutput = io.Source.fromFile("list.txt").getLines.par.map { x =>
  callService("FooService", "{id=\""+x"\"}")
}.map { x =>
  Json.parse[Map[String, Object]](x)("url").toString
}.map { x =>
  <entry>
    <url>{ x }</url>
  </entry>
}.toString


Which I can now happily write to wherever I need to, a file, or a web service response. Nifty in the extreme.

In 2012, we live in a world of JSON and XML. Perl had it's day when text processing was king. Today, a language is needed that can cope with JSON, XML and Parallelization and still yield sane-looking code. I'm not a big Ruby fan, as anyone who knows me will tell you, but I'm willing to keep an open in. I'd like to see if Ruby can do this kind of thing as elegantly and easily and demonstrate it's a language for the web in 2012.  Also, I should mention Akka as well, though I don't yet know enough about it, other than it can allegedly take parallelization inter-compuer with similar simplicity.

Wednesday, May 9, 2012

Simple Scala scripts : Scan a directory recursively

I'm using Scala increasingly as a scripting language at the moment. As my confidence with it is increasing, I'm finding it's becoming more and more useful for those throw-away scripting situations.  Especially when then end up being not so throw-away after all.

def findFiles(path: File, fileFilter: PartialFunction[File, Boolean] = {case _ => false}): List[File] = {
  (path :: path.listFiles.toList.filter {
    _.isDirectory
  }.flatMap {
    findFiles(_)
  }).filter(fileFilter.isDefinedAt(_))
}

(replace {} with (), ditch newlines and it goes on one line well-enough, just doesn't fit in a Blogger template that way)
We might be duplicating the a shell find:

find | grep 'foo'
or
find ./ -name "foo"

And whilst the Scala is more complex, the Scala function can do operations on a File object, which gives you a lot of the rest of the power of the find command thrown in to the bargain. Plus, as it accepts a partial function, you can chain together filters. If you truly just wanted an analog for find:

def findFiles(path: File): List[File]  = 
  path :: path.listFiles.filter {
    _.isDirectory
  }.toList.flatMap {
    findFiles(_)
  }

Which is less complex that the first. This is still more work than find, but, the list you get back is an actual list. If you added anything useful to your find, say an md5 for each file, it gets less happy
find ./ | awk '{print "\""$0"\""}' | xargs -n1 md5sum
Maybe there's a better way, but that's what I've always ended up doing. The Scala is starting to compete now. Bump up the complexity one more notch, and I think Scala actually starts becoming less code and less obscure.

You might also notice that the example above can be fit nicely within the Map/Reduce paradigm. Scripting that is not only relatively easy, but can also be thrown at Hadoop for extra pzazz, and NoSQL buzz-worthyness.

On things that "save time"

Over the years, I've often heard things about things that "save time" in development.  For many years, I was gun-shy of IDEs.  Too often they break down, and the entire thing has to be reset and reconfigured from nothing.  It made the cost out-weight the benefits.  After a few more years passed, IDEs got better, and when somebody introduced me to IntelliJ, I was finally convinced that IDEs could actually save me time overall, not cost me, normally at the most inopportune moment.

So now we have IDEs that don't suck.  Take a simple thing like method lookup.  What's the time difference between hitting Command-B in IntelliJ, or having to do Ctrl-H, change the tab, and type it in in Eclipse (I'm sure there's a better way in Eclipse, there always is, but it normally non-obvious).  It amounts to a few seconds at most.  So, there's not really a significant difference right?  In time, this is perhaps true, and some might argue that a few seconds here and there can add up, and I might go into that later.  For me, the real issue is not time at all, it's space.  Space in your brain.

A brain is like a CPU in some ways, and like a CPU it has a cache (at least this seems like a good analogy to me), you have multiple levels, at least L1 and L2, maybe L3.  L1 caches are small and very fast.  They handle what's the immediate focus of attention in your brain right now.  Jumping through the code, tracing back a problem, going up the code path.  When needing to search, instead of jumping directly to the caller, you have to go through a set of operations.  This results in only a small time difference, but, it's like having to put three or four operations in your L1 cache instead of none.  Hitting Ctrl-B is a zero effort operation.  It's just like an op-code - Ctrl-B does this, that's what I need.  Opening the search dialog is a zero effort operation.  Remembering to switch the tab, not a zero effort operation, copy/pasting the right string in, not a zero effort operation, checking to make sure it's including the right files, similar, and if it's a big project, watching the search run, and then popping up an error dialog, not zero effort.

Another four things are now put into focused attention, significantly depleting what's there.  Two seconds of time has busted through maybe 20% or more of a brain's L1 cache (I think I read somewhere that the average human can only concentrate on no more than four to six things at once).  That two seconds can turn into two hours as the most important thing that was being held on to at the top of the stack in your brain which was in "L1" gets lost down into L2 or worse.  We fix the immediate problem, but forget why.  The local manifestation is gone perhaps, but the bigger issue is forgotten, and still very present.

Two seconds, concentration was diminished, which caused two hours of lost time.  This is one way how every little operation in a development environment can be critical.  Is this an exaggeration?  I'm not sure it is.  Even if it is for this one thing, imagine this problem multiplied by two, or four.  Not just one missing zero-cost operation, but two, or three.  Suddenly with a more fluid environment, with just a few things made drastically better, development becomes less stilted and happens better.

Saturday, December 3, 2011

OMG <3 Scala

Today was a good day. I wrote a somewhat complex piece of logic for a jump planner for EVE online, and, because of the awesomeness of Scala, it worked as conceived, first time.

This is one of the many reasons I love functional programming. As you take your problem and reduce it down to it's most basic components, strip out logic, simplify to a mathematical expression, simplify that expression, suddenly errors become clear in the very writing of it.

Scala feels so much closer to that expression than Java. So much verbosity is inherently prone to error, writing error free code in Scala seems dramatically more possible than in Java.

SBT and Gradle

Liistening to a not quite cent episode of Java Posse podcast, I heard them talk about two new build tools that are in the running to succeed Maven and Apache Ivy. They are SBT and Gradle respectively.

As with the rest of the Java community who are moving on, one is Scala based and the other is Groovy based.

I'm not going to get into a Scala vs Groovy thing here, as i can't say I know either of them to a high enough level to really make a fair comparison.

I've had a poke at both of them, and so far, I'm really diggin the simplicity of SBT and it's interactive features for working with Scala, which are super nice.

I've only got a small example going with my EVE project, and it so small it's barely worth posting. Having said that, it did clue me in to working with scalatest, which is much more elegant than JUnit. I'm curious to use it also with JUnit and java code, I've heard that it can work well with both Scala sources and Java. The interactive test suite features seem like a really awesome feature, continuous integratinon built right in.

Tuesday, October 18, 2011

Where is the internet I grew up with?

Following a short post on g+, and a short rant on Facebook, I decided to fill out the idea a bit more.

Where is the internet we grew up with?

For those of us in our 30s, the internet was fresh and shiny in the late nineties (for some, earlier). Altavista was a search engine of choice, eBay and Amazon were in their infancy. Hotmail was just getting serious and Facebook had probably not really even been thought of. Like many others, I had a geocities page, and life seemed good. It was the new digital frontier. Rough and ready, available to be shaped by anyone who could come along and take ahold of the new emerging technologies.

Before long, eBay and Amazon become the power-houses they are today. Many other internet ideas came and went, most of them because they just weren't all that compelling, some because of bad business management.

Here we stand, looking into 2012. What is the internet today? Who's hot, and who's not?

Facebook, Twitter, Flickr and Tubmlr, today's engines of "Social Media". eBay and Amazon, what feel like two of the last vanguard of the old ways. Amazon always had a great business case: sell stuff cheap through economies of scale, great shipping and a superior customer experience. People can buy anything they want in a few clicks (1-click maybe), and have it at their door in hours. Gone are the days of 28 days for shipping. Instant gratification for just about any kind of thing you can want (more or less). eBay, once a giant auction site is pretty far down the slope of decline. They are now charging such high fees, and their site has been encumbered with so much commercial content that regular Joe auctions are buried amongst the dross. Wierdly, they feel like they are trying to be Amazon, except that Amazon is already Amazon, and does a darn good job of it. In my opinion, unless eBay wises up fast, they are going to end up holding a bag of stuff with nowhere left to go. People using eBay are complaining more and more, and shifting ways of selling things to other venues like Craig's list. At this point unfortunately, there doesn't seem to be much else in public view that presents as an alternative to eBay. It is hard to create a vacuum in such a large market, but eBay is pulling the air out of what used to be their core business model. If that vacuum gets to a critical level, a whole slew of new blood will enter the market, and eBay's dominance will shatter under the pressure their vacuum has created leaving them with what exactly? A not-quite-as-good-as-amazon for sale site?

Flickr seems to be doing okay, but it's photo presence have been significantly eroded by Facebook itself and other sites like Picassa and for serious photographers, pay sites like SmugMug who do a much much better job than Flickr with content management, presentation and semi-pro features.

For many, Facebook has become the internet. Twitter is an add on, and Tubmlr a fascination.

I don't know how things are at Twitter financially, I'm not up on that so much, but I hope things are holding up for them. It's a neat medium, but I'm not sure many people understand why. I think it might be that 140 character limit. It's a nice short tweet. Not intrusive, something you can glance at without being completely distracted from whatever else you are doing. It's the essence of tl;dr. Perfection for a world with ADD. I hope they don't cook what I believe to be their golden goose.

Facebook is in a precarious position. Sitting atop the chaos of the modern age. The hate for Facebook is pretty big, and whilst Google plus impressed some people, there hasn't been the mass exodus that many had hoped for, or predicted. Google screwed up the name situation. I feel they had good reasons, but ultimately, it shot them in the foot. I know that many of the technorati, the people who could make or break it, are the same people who are idealists, and who care about those kind of details. These are the people who still remember the way Facebook has betrayed them year over year. The name situation made them feel that Google was incompetent and didn't "get it". With them deserting in droves, Google Plus may never be more than a sideshow.

I have to say that so far, I haven't really embraced Tumblr. My daughter has a Tumblr, and so do a few other people I know. I forget if I have created one or not.

What does the next decade of the internet look like? What kinds of things will shape it? Some say the API revolution is a big deal. I'm not so sure. APIs require skill to use, and the number of students taking computer science courses at universities has been dropping off. The software industry was one place the 99% were promised a bright future, and honestly, it's still one place they might be able to get it. The API trend I worry is something being driven by the big players. The corporately dominated players. Those that can afford to do API things without compromising their core business model.

What is going on to combat this movement towards utter corporate dominance of the internet of the next decade? I think there is still room for a few garage-built systems to make a hit. People aren't taking a leaf out of Apple's book. Make new technologies, create new revenue streams, future-proof your company by always living in the future. Until they do, the bright and the young can always be a step ahead. The question is if their one-step ahead is really one-step ahead, or one step sideways.

There are some innovations in the software space that are helping with this, things like Ruby on Rails and it's software cousins of a similar ilk. These systems are making it possible for the garage bands to make their dreams happen quicker and easier. I'm concerned that some of this 'innovation' is more like a step sideways than a step forward. People are exchanging tried and tested methodologies for bleeding edge systems relying on "the cloud" to build their ideas into reality. This wouldn't be so bad, but the big promise of the cloud, on demand scalability, is an empty promise for DIY programmers. To achieve real scalability takes knowledge and experience, especially within that environment. For Joe average, this is not present.

The young and the bright can be a step ahead, or a step sideways, but with current technology, hitting a glass ceiling happens too fast. With so many internet users, it takes only one tsunami to hit the shores of your fledgling system to bring it down, where too often it will stay down with no path to recovery. What used to be a quick incremental increase in traffic is today a massive exponential rush. A site can go "viral" in 24 hours or less, and go from a few thousand users to a million people crushing the system. Without experience, it's pretty hard for Joe average to deal with that. The bar to enter the Cyclone that is today's internet has become very high indeed.

How do we solve this problem? I think for some that is the $64,000 question. How do you scale with such rapidity? Systems like MongoDB are being pushed as potential answers. They work in the cloud (more or less), and offer what seems like a good proposition. Unfortunately, they can't do 80% of what a traditional SQL database can do, and once you start growing, that 80% becomes really important. Things like reporting and data warehousing that can help you understand your website users who are now your customers. Other technologies like Map-Reduce are impressive, but most folks don't understand the real value-proposition for Map-Reduce, and worse, most Joe average programmers aren't fluent in the ways of functional programming. Heck, I know very few CompSci grads who can put together programs in a functional style after a decade in the field working declarative systems in Java and others. Lisp and SML and similar are thought on with a level of slight revulsion, a bad college memory of what seems like an arcane way to do things.

So what now? I don't yet know, and that's half the fun! It's undiscovered territory, and no-one really knows what will happen next.