GridLogZ: In-Memory Log Analysis

History

I wanted to create a proof of concept that demonstrated how to leverage JBoss Data Grid (JDG) with JBoss EAP. I wanted to leverage the map / reduce framework in JDG. However, I needed a data source: a big data source.

I decided to use log data because analyzing log files is an established use case for big data / analytics. For example, analyzing log files with Apache Hadoop.

However, analyzing log files with Apache Hadoop is an offline, batch oriented process. First, the log file is imported into Apache HDFS. Then, it is analyzed by running an Apache MapReduce job.

Inspiration

A log file contains log messages.

What if the log messages were persisted to a distributed data store in real time in addition to or instead of a log file?

What if they were persisted to an in-memory data grid?

Inspired by Splunk:

  • Do without access to log files.
    No more requests for log files or access to log files.
  • Aggregate log data from multiple servers / applications.
    No more opening of multiple log files in a single text editor.

Inspired by NoSQL / Big Data:

  • The log data could be distributed.
  • The log data could be analyzed with map / reduce tasks.

Project

GridLogZ is a set of components that enable the persistence and analysis of log message from JBoss EAP in JBoss Data Grid (JDG).
https://github.com/shane-k-j/gridlogz

  • Common – The model. It is based on the java.util.logging.LogRecord class.
  • Services – REST services for persisting and analyzing log messages.
  • Log Handler – A log handler that persists log messages via the services.
  • Web – The front end. HTML5 + D3 (link)

Screenshot

This is a tree map chart that shows the number of log records per logger (package) and per class. A class is display as a rectangle. The size of the rectangle is based on the number of log messages. Further, the class rectangles are grouped by package to form a larger rectangle.

gridlogz-treemap

Screencast

This is the screencast that was shown at Red Hat Summit last week. It is one part presentation and one part demonstration. It does not include an audio soundtrack.


What’s Next?

I’m planning to publish a post tomorrow morning with a screencast that demonstrates how to build and install GridLogZ with JBoss EAP and JDG. In addition, I’m planning to update the README on GitHub. While the UI is limited to a few charts with the initial commit, I’m writing distributed tasks to retrieve log messages (e.g. by time) and return them as JSON. However, I could use some help with the front end.

Update: As promised: build, install, and configure GridLogZ (link).

,

About Shane K Johnson

Technical Marketing Manager, Red Hat Inc.

View all posts by Shane K Johnson

2 Comments on “GridLogZ: In-Memory Log Analysis”

  1. Alexandro Podkopaev Says:

    Hi!
    Comparing GridLogZ to logstash+elasticsearch, GridLogZ looks like a tool for runnig prebuild by developer analytics, not ad hoc. Am i get it right?

    Reply

    • Shane K Johnson Says:

      Hi Alexandro,

      You are more or less right. My original intention was to create a set of flexible map / reduce tasks that support a variety of arguments. On one hand, these m / r tasks are not quite ad hoc. On the other hand, they could and should be written in such as way as to support a variety of arguments.

      Shane

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: