Milinda Pathirage

“If you can dream it, you can do it.” -
Walt Disney

Read this first

Copying Additional Files With Gradle Application Plugin

When building application distributions for Java apps you often need to bundle default configuration files, other resource, etc. into your application distribution. If you are using Gradle with Gradle Application Plugin for creating the application distribution for your project, you can use following code fragment in your Gradle project to copy addition files.

applicationDistribution.from("src/main/resources/conf") {
    into "conf"
}

Above code fragment copies content of conf directory into application distributions conf directory located in the root directory of your distribution.

View →


Publishing Play Web Application Metrics to InfluxDB

metrics-play which is a fork of metrics-play project can be used to publish Play web application metrics including JVM metrics to InfluxDB via Graphite protocol.

To add support for publishing Play application metrics to InfluxDB, first you have to add metrics-play with Graphite reporter into you Play app dependencies like below:

libraryDependencies ++= Seq(
  "com.kenshoo" %% "metrics-play" % "2.3.0_0.2.1-graphite",
  javaJdbc,
  javaEbean,
  cache,
  javaWs
)

And then you have to add the Play plugin com.kenshoo.play.metrics.MetricsPlugin to your play.plugins file. Then in the Play application configuration, you can enable and configure the Graphite reporter like below:

metrics {
  graphite {
    enabled = true
    period = 1
    unit = MINUTES
    host = localhost
    port = 2003
  }
}

Above steps assume that you have configured InfluxDB Graphite input plugin like below.

...

Continue reading →


Server Monitoring Solution Using Grafana, InfluxDB and collectd

Couple of days ago, I wanted to add couple of new nodes to Ganglia deployment I maintained to monitor HTRC services and cluster nodes. Even though everything looks okay after installing and configuring Ganglia monitor daemons in new machines, I couldn’t get them to publish monitoring data to Ganglia’s gmetad. Worse thing was, I couldn’t find any errors (I am not sure whether I looked at the correct location, but I couldn’t find anything). I first tried to install Performance Co-Pilot with Netflix Vector, but couldn’t figure out how to setup a central metric collection server. Even though PCP and Vector combination looked really great, having to type the node host name every time I wanted to monitor a server was not what I wanted.

So I decided to give a try to Grafana, InfluxDB and collectd combination. I was able to get this setup working within couple hours with several dashboards for...

Continue reading →


Interesting Resources on Writing

Starting to participate in The100DayProject by writing every day for 100 days got me into research more about writing. Writing is major part of the life as grad student. But I was far behind my writing and I wanted to improve by writing frequently. Its well known that writing is hard and it always be hard for lot of us. But writing more and more will make you better at it. Following two articles contains some interesting ideas and tips on how to get better at writing.

  • Make Writing a Part of Your Identity
  • How to write 1000 words a day

‘Make Writing a Part of Your Identity’ has some interesting ideas about how to make writing a habit and why it is important to make writing a habit. I strongly suggest that article to anyone who wants to be better at writing. The article contains couple of major ideas.

  • You have to track writing if you want to improve writing.
  • You have to make writing a...

Continue reading →


Versioning REST APIs

Yesterday, a discussion happened around versioning REST APIs resulted in an interesting sequence of events where one person even started to attack my self and one other person via personally e-mails. So I wanted to explore more about versioning REST APIs to understand the problem and solutions better. Lets start by looking at why versioning is needed.

As Troy Hunt discussed in his popular post on API versioning, the main reason is evolution of software. Its hard (may be even impossible) to get software right in the first release. As world moves on, new requirements may come. So introducing a new version is unavoidable.

When your API get used by various clients, you may have to maintain multiple versions. It’s not realistic to expect that everyone will migrate to new version within short time period.

There are multiple popular ways to version a REST API and there are proponents and...

Continue reading →


CQL - Continuous Query Language

In today’s data driven economy, organizations depend heavily on data analytics to stay competitive. Advances in Big Data related technologies transformed how organizations interact with data and as a result more and more data is generated at ever increasing rates. And most of these data is available as continuous streams and organizations utilizes stream processing technologies to extract insights in real-time (or as data arrives). As a result of this change in how we collect and process data stream processing platforms like Apache Storm, Spark Streaming and Apache Samza were created based on about a decade of experience using Big Data processing technologies such as Hadoop.

But these modern platforms lack support for SQL like declarative query languages and require sound knowledge on imperative style programming and distributed systems to effectively utilize them. But for broader...

Continue reading →


Freshet - CQL based Clojure DSL for Streaming Queries (Draft)

This blog post is still a draft.

Interest on continuous queries on streams of data has increased over the last couple of years due to the need of deriving actionable information as soon as possible to be competitive in the fast moving world. Existing Big Data technologies designed to handle batch processing couldn’t handle today’s near real-time requirements and distributed stream processing systems like Yahoo’s S4, Twitter’s Storm, Spark Streaming and LinkedIn’s Samza were introduced into the fast growing Big Data eco-system to tackle real-time requirements. These systems are robust, fault tolerant and scalable to handle massive volumes of streaming data, but lack first class support for SQL like querying capabilities. All of these frameworks provide high-level programming APIs in JVM compatible languages.

In the golden era of stream processing research, a lot of work has been done...

Continue reading →


Good Reads: October 9th 2014

  • Mining of Massive Datasets - This is a very interesting book which covers many topics in Big Data including Map-Reduce, Recommendation Systems, Mining Social-Network Graphs, Dimensionality Reduction and Large-Scale Machine Learning. I am currently in the 2nd chapter, but found lot of interesting things related to Map-Reduce such as modeling relation algebra using Map-Reduce which is really interesting. Those who are interested in large scale data mining can also follow the free online course from the authors of this book.
  • Visualizing MNIST: An Exploration of Dimensionality Reduction - This is also a really interesting post about dimensionality reduction in the context of machine learning. The post is well written, even someone not familiar with machine learning, deep learning and dimensionality reduction can read and understand underlying concepts with the help of awesome...

Continue reading →


Good Reads: September 30th 2014

  • Linearizability versus Serializability - Clarifies the differences between linearizability and serializability, two important properties about interleavings of operations in databases and distributed systems.
  • Paper Summary: High-availability distributed logging with BookKeeper - Distributed logging with high availability and many distributed readers are interested in reading the logs.
  • Turning the database inside out with Apache Samza - Different way of thinking about databases and how we develop database backed applications. Propose the idea of applying Stream abstraction everywhere, from database to backend web services to the UI.

Continue reading →


Academic Writing With Markdown, Pandoc and Emacs

LaTeX is the de-facto standard for academic writing. And there are several editors available for LaTeX. But the problem with these editors and generally editing LaTeX in any editor like Emacs is the fact that LaTeX is not writer friendly. LaTeX commands will dominate your document and can be distracting most of the time.

What we want is writer-friendly LaTeX editor as described here1. But we don’t have such a editor with features described in [1]. One alternative I found2 is to use combination of Markdown, Pandoc and LaTeX.

Basic process is as follows:

  1. Create LaTeX header and footer files, where header includes up to the abstract and footer include bibliography and document end tag. Any package imports or new command definitions can go in header.
  2. Write the main content in Pandoc Markdown.
  3. Convert Markdown file to LaTeX using Pandoc.
  4. Append generated LaTeX file and footer latex file...

Continue reading →