object CSVParser extends RegexParsers { def apply(f: java.io.File): Iterator[List[String]] = io.Source.fromFile(f).getLines().map(apply(_)) def apply(s: String): List[String] = parseAll(fromCsv, s) match { case Success(result, _) => result case failure: NoSuccess => {throw new Exception("Parse Failed")} } def fromCsv:Parser[List[String]] = rep1(mainToken) ^^ {case x => x} def mainToken = (doubleQuotedTerm | singleQuotedTerm | unquotedTerm) <~ ",?".r ^^ {case a => a} def doubleQuotedTerm: Parser[String] = "\"" ~> "[^\"]+".r <~ "\"" ^^ {case a => (""/:a)(_+_)} def singleQuotedTerm = "'" ~> "[^']+".r <~ "'" ^^ {case a => (""/:a)(_+_)} def unquotedTerm = "[^,]+".r ^^ {case a => (""/:a)(_+_)} override def skipWhitespace = false }
Tuesday, June 12, 2012
Parsing CSVs in Scala
I did a quick google on parsing CSVs in Scala, and one of the top hits was a stack overflow question where the answer was wrong. Very wrong. So, I threw together a quick parser in Scala to get the job done. I'm not saying it's good, but it passes the spec tests I have included quotes and quoted commas both with single and double quotes. I hope this is useful, and perhaps somebody can improve upon it.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment