# Archive for category Clojure

### Lumberjack – lumberjack.nginx (version 0.1.0)

As I posted last time, lumberjack is my start of a log line analyzer/visualizer project in Clojure. This write up will cover the version 0.1.0 lumberjack.nginx namespace.

As this is a version 0.1.0, and to get it out, I am parsing Nginx log lines that take the following format, as all the log lines that I have been needing to parse match it.

173.252.110.27 - - [18/Mar/2013:15:20:10 -0500] "PUT /logon" 404 1178 "http://shop.github.com/products/octopint-set-of-2" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"



The function nginx-logs takes a sequence of Nginx log filenames to convert to a hash representing a log line by calling process-logfile on each one.

(defn nginx-logs [filenames]
(mapcat process-logfile filenames))


The function process-logfile takes a single filename and gets the lines from the file using slurp, and then maps over each of the lines using the function parse-line.

(defn- logfile-lines [filename]
(string/split-lines (slurp filename)))

(defn process-logfile [filename]
(map parse-line (logfile-lines filename)))


At this point, this is sufficient for what I am needing, but have created an issue on the Github project to address large log files, and the ability to lazily read in the lines so the whole log file does not have to reside in memory.

The function parse-line, holds a regex, and does a match of each line against the pattern. It takes each part of the match and associates to a hash using the different parts of the log entry as a vector of the keywords that represent each part of the regex. This is done by reducing against an empty hash and taking the index of the part into match, the result of re-find.

(def parts [:original
:ip
:timestamp
:request-method
:request-uri
:status-code
:response-size
:referrer])

(defn parse-line [line]
(let [parsed-line {}
pattern #"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})? - - $(.*)$ \"(\w+) ([^\"]*)\" (\d{3}) (\d+) \"([^\"]*)\".*"
match (re-find pattern line)]
(reduce (fn [memo [idx part]]
(assoc memo part (nth match idx)))
parsed-line (map-indexed vector parts))))


Looking at this again a few days later, I went and created and issue to pull out the definition of pattern into a different definition, outside of the let, and even the parse-line function. I also want to go back and clean up the parsed-line from the let statement as it does not need to be declared inside the let, but can just pass the empty hash to the reduce. This was setup there before I refactored to a reduce, and was just associating keys one at a time to the index of matched as I was adding parts of the log entry.

Any comments on this are welcome, and I will be posting details on the other files soon as well.

Thanks,
–Proctor

### Lumberjack – Log file parsing and analysis for Clojure

I have just pushed a 0.1.0 version of a new project called Lumberjack. The goal is to be a library of functions to help parse and analyze log files in Clojure.

At work I have to occasionally pull down log files and do some visualization of log files from our Nginx webservers. I decided that this could be a useful project to play with to help me on my journey with Clojure and Open Source Software.

This library will read in a set of Nginx log files from a sequence, and parse them to a structure to be able to analyze them. It currently also provides functionality to be able to visualize the data as a set of time series graphs using Incanter, as that is currently the only graphing library I have seen so far.

A short future list of things I would like to be able to support that come to mind very quickly, and not at all comprehensive:

• Update to support use of BufferedReader for very long log files so the whole file does not have to reside in memory before parsing, and take advantage of lazyness.
• The ability to only construct records with a subset of the parsed data, such as request type, and timestamp.
• The ability to parse log lines of different types, e.g. Apache, IIS or other formats
• Additional graphs other than time series, e.g. bar graphs to show number of hits based off of IP Address.
• Possibility of using futures, or another concurrency mechanism, to do some of the parsing and transformation of log lines into the data structures when working on large log files.

The above are just some of my thoughts on things that might fit well as updates to this as I start to use this more and flush out more use cases.

I would love comments on my code, and any other feedback that you may have. This is still early but I wanted to put something out there that might be of some use to others as well.

You can find Lumberjack on my Github account at https://github.com/stevenproctor/lumberjack.

–Proctor

### Log File parsing with Futures in Clojure

As the follow up to my post Running Clojure shell scripts in *nix enviornments, here is how I implemented an example using futures to parse lines read in from standard in as if the input was piped from a tail and writing out the result of parsing the line to standard out.

First due to wanting to run this a script from the command line I add this a the first line of the script:


!/usr/bin/env lein exec


As well, I will also be wanting to use the join function from the clojure.string namespace.


(use '[clojure.string  nly (join)])


When dealing with futures I knew I would need an agent to adapt standard out.

(def out (agent *out*))


I also wanted to separate each line by a new line so I created a function writeln. The function takes a Java Writer and calls write and flush on each line passed in to the function:

(defn writeln [^java.io.Writer w line]
(doto w
(.write (str line "\n"))
.flush))


Next I have my function to analyze the line, as well as sending the result of that function to the agent via the send-off function.

(defn analyze-line [line]
(str line "   " (join "  " (map #(join ":" %) (sort-by val > (frequencies line))))))

(defn process-line [line]
(send-off out writeln (analyze-line line)))


The analyze-line function is just some sample code to return a string of the line and the frequencies of each character in the line passed in. The process-line function takes a line and calls send-off to the agent out for the function writeln with the results of calling the function analyze-line.

With all of these functions defined I now need to just loop continuously and process lines that are not empty, and call process-line for each line as a future.

(loop []
(when line
(future (process-line line)))
(recur)))


### Running Clojure shell scripts in *nix environments

I was recently trying to create a basic piece of Clojure code to play with “real-time” log file parsing by playing with futures. The longer term goal of the experiment is to be able to tail -f a log file pipe that into my Clojure log parser as input.

As I wasn’t sure exactly what I would need to be doing, I wanted an easy way to run some code quickly without having to rebuild the jars through Leiningen every time I wanted to try something, in a manner similar to the way I am thinking I will be using it if the experiment succeeds.

I created a file test_input with the following lines:

1 hello
2 test
3 abacus
4 qwerty
5 what
6 dvorak


With this in place, my goal was to be able to run something like cat test_file | parser_concept. After a bit of searching I found the lein-exec plugin for Leiningen, and after very minor setup I was able to start iterating with input piped in from elsewhere.

The first step was to open my profiles.clj file in my ~/.lein directory. I made sure lein-exec was specified in my user plugins as so:

{:user {:plugins [[lein-exec "0.2.1"]
;other plugins for lein
]}}


With this in place I just put the following line at the top of my script.clj file:

#!/usr/bin/env lein exec


I then changed the permissions of script.clj file to make it executable, I was able to run the following and have my code run against the input.

cat test_input | ./script.clj


I will be posting a follow up entry outlining my next step of experimenting with “processing” each line read in as a future.

### Looking for a new job

The company I have been working for had to just go through a round of layoffs, due to not getting one of their main contracts renewed do to lack of funding by the state, and I have been included in that round.

My primary experience has been in C# on the Microsoft .Net framework for the past 10 years. Of those 10 years, 9 of them were working against the same product, and product suite allowing me to learn how some inconsequential decisions are not so inconsequential long term, and the value of care and commitment to a code base can be.

I am thinking that this would be a good opportunity to open myself up in trying to venture into a new language and toolset such as Ruby or Clojure. This is my open announcement for anybody who would like to poach a .Net developer if you are having problems finding developers fluent in your programming language.

–Proctor

### DFW Clojure User Group

This is for those of you not on the Clojure Google Group mailing list.

There is now a Clojure User Group for the Dallas/Fort Worth metroplex. Alex Robbins has setup a meetup group which can be found here http://www.meetup.com/DFW-Clojure/.

The kickoff meeting is set for Wednesday, September 19, 2012.

Thanks to Alex and the sponsors of this user group. Let’s make it a good one.

–Proctor

### Clojure function has-factors-in?

Just another quick post this evening to share a new function I created as part of cleaning up my solution to Problem 1 of Project Euler.

Was just responding to a comment on Google+ on my update sharing the post Project Euler in Clojure – Problem 16, and I saw the commenter had his own solution to problem 1. In sharing my solution I realized that I could clean up my results even further, and added a function has-factors-in?. These updates have also been pushed to my Project Euler in Clojure Github repository for those interested.

(defn has-factors-in? [n coll]
(some #(factor-of? % n) coll))


(defn problem1
([] (problem1 1000))
([n] (sum (filter #(or (factor-of? 3 %) (factor-of? 5 %))) (range n))))


It now becomes:

(defn problem1
([] (problem1 1000))
([n] (sum (filter #(has-factors-in? % [3 5]) (range n)))))


This change makes my solution read even more like the problem statement given.

–Proctor

### Project Euler in Clojure – Problem 16

Here is my solution to Problem 16 of Project Euler. As always, my progress you can tracked on GitHub at https://github.com/stevenproctor/project-euler-clojure.

Problem 16 of Project Euler is:

What is the sum of the digits of the number 2^1000

This problem was straight forward since I already had the function digits-of defined from problem8. I was able to be very declarative in my problem, so much so, that it reads as the problem statement you are asked to solve.

(defn problem16
([] (problem16 1000))
([n] (sum (digits-of (expt 2 n)))))


As always, any feedback you have for me is greatly appreciated.

–Proctor

### Project Euler in Clojure – Problem 15

Here is my solution to Problem 15 of Project Euler. As always, my progress you can tracked on GitHub at https://github.com/stevenproctor/project-euler-clojure.

Problem 15 of Project Euler is to find the starting number with the longest Collatz sequence, summarized from the problem page as:

Starting in the top left corner of a 22 grid,
there are 6 routes (without backtracking) to the bottom right corner.

How many routes are there through a 2020 grid?

I started this problem, by trying to tracing and counting the routes through grids of 2×2, 3×3, and 4×4, and even setled in and did a 5×5 square. Having these numbers, and knowing I had two choices for ever position I was in, except for when the path got to the far edge and bottom, I had a hint at the growth rate of the problem. I tried some powers of 2 with the relationship of the numbers, and some factorials with the numbers. After seeing some possible relationships with the factorials that might be leading me in the right direction, I tried a number of permutation calculations, and the combination calculations. Having seen the numbers show up in different combination results, I then spent time back-calculating from those numbers into my ns, and found that the pattern seemed to match 2n Choose n.

The source code to this was the factorial function:

(defn factorial [n]
(multiply (range 1M (inc (bigdec n)))))


And, I could have done it recursively, but I figured I would just operate against the sequence of numbers, especially now that the reducers are available in the Clojure 1.5-Alpha 3 release (at the time of this writing) of Clojure. After I get through a few more problems (of which I am working ahead of these posts), I am thinking it would be interesting to run the same Project Euler Problems against 1.4 and 1.5 using the reducers library, just substituting map/reduce for the reduce/combine functionality, and seeing how much effort it takes to move them over, as well as the differences in the timings of the different problems.

The other base function I needed was a combination function:

(defn combination [n k]
(cond (zero? n) 0
(zero? k) 1
:else (/ (factorial n) (* (factorial (- n k)) (factorial k)))))


This function just does the basic calculation for combinations, from the formula:

$\frac{n!}{\big((n-k)!\ *\ k!\big)}$

With that, and my stumbling upon the matching of the fact that ${2n}\choose{n}$ is the solution to ne number of paths through the square the function problem15 is defined as:

(defn problem15
([] (problem15 20))
([n] (combination (+ n n) n)))


As always, any feedback you have for me is greatly appreciated.

–Proctor

### Aspect Oriented Timing in C# with Castle Windsor

I was making some refurbishments on some reporting code in our application that used EF and was suffering from the Select N+1 problem. If truth, it was much worse, as it was an Select N+1 problem up to 6 levels deep depending on where the report was run from.

I was changing the code to use a denormalized view from the database, and then run a SQL Query using Entity Framework. When doing this I was asked to get the timings of the report, both against the new way, and the existing way.

As this is incidental to what I was really trying to do, I did not want to litter timing code, and logging mechanisms into classes that already existed. This smelled of Aspect Oriented Programming (AOP). While I had not done anything using AOP before, I knew that it was great for cross-cutting concerns like logging, timings, etc. Having been digging into Clojure and LISP recently, this also seemed like cases of the :before, :after and :around methods in Common LISP, or the similar behavior in Eiffel as pointed out in Bertrand Meyer’s Object Oriented Software Construction, not to mention the time function in Clojure which is a function whose single concern is simply the to manage capturing the timing a function passed into it. My hope was to simplify, or un-complect, the code, and keep those concerns separate.

In our project, we have Castle Windsor setup as the IOC container, and Windsor supports a type of Aspect Oriented Programming using something called Interceptors. I found documentation on setting it up on a post by Ayende, and one Andre Loker. The issue was some of the places I wanted to setup the capturing of the timings were in different areas than where the handlers were registered for Windsor.

After some hunting around, I managed to come up with being able to add an interceptor to an already registered component by using the following line of code, where the IReportingService is the class I want to time the method calls around, and the LogTimingInterceptor is the class that captures the timing of the method call and sends it to the logger:

container.Kernel.GetHandler(typeof(IReportingService)).ComponentModel.Interceptors.Add(new InterceptorReference(typeof(LogTimingInterceptor)));


Hope someone else can find this useful,
–Proctor