Hamlog, Prince of Denmark

I’m running a session on analysing log files next week, and for an example file, I turned the script of Hamlet from project Gutenberg into a log file with a bit of careful adjusting of the file and adding some timestamps.

Hamlog imitation log file
Text from Project Gutenburg, http://www.gutenberg.org/ebooks/1524
2016-09-05 15:19:14<SHAKESPEARE.>SCENE. Elsinore.
Scene I. Elsinore. A platform before the Castle.
[Francisco at his post. Enter to him Bernardo.]
2016-09-05 15:20:23<Ber.>Who's there?
2016-09-05 15:20:45<Fran.>Nay, answer me: stand, and unfold yourself.
2016-09-05 15:21:00<Ber.>Long live the king!
2016-09-05 15:21:12<Fran.>Bernardo?
2016-09-05 15:21:22<Ber.>He.
2016-09-05 15:21:43<Fran.>You come most carefully upon your hour.
2016-09-05 15:22:07<Ber.>'Tis now struck twelve. Get thee to bed, Francisco.
2016-09-05 15:22:30<Fran.>For this relief much thanks: 'tis bitter cold,
And I am sick at heart.
2016-09-05 15:22:55<Ber.>Have you had quiet guard?
2016-09-05 15:23:11<Fran.>Not a mouse stirring.
2016-09-05 15:23:25<Ber.>Well, good night.
If you do meet Horatio and Marcellus,
The rivals of my watch, bid them make haste.
2016-09-05 15:24:15<Fran.>I think I hear them.—Stand, ho! Who is there?
2016-09-05 15:24:30<SHAKESPEARE.>[Enter Horatio and Marcellus.]
2016-09-05 15:24:46<Hor.>Friends to this ground.
2016-09-05 15:25:03<Mar.>And liegemen to the Dane.
2016-09-05 15:25:18<Fran.>Give you good-night.
2016-09-05 15:25:35<Mar.>O, farewell, honest soldier;
Who hath reliev'd you?
2016-09-05 15:25:59<Fran.>Bernardo has my place.
Give you good-night.
2016-09-05 15:26:15<SHAKESPEARE.>[Exit.]
2016-09-05 15:26:29<Mar.>Holla! Bernardo!
2016-09-05 15:26:39<Ber.>Say.
What, is Horatio there?
2016-09-05 15:27:01<Hor.>A piece of him.
2016-09-05 15:27:22<Ber.>Welcome, Horatio:—Welcome, good Marcellus.
2016-09-05 15:27:45<Mar.>What, has this thing appear'd again to-night?
2016-09-05 15:28:00<Ber.>I have seen nothing.
2016-09-05 15:28:19<Mar.>Horatio says 'tis but our fantasy,
And will not let belief take hold of him
Touching this dreaded sight, twice seen of us:
Therefore I have entreated him along
With us to watch the minutes of this night;
That, if again this apparition come
He may approve our eyes and speak to it.
2016-09-05 15:29:58<Hor.>Tush, tush, 'twill not appear.
2016-09-05 15:30:12<Ber.>Sit down awhile,
And let us once again assail your ears,
That are so fortified against our story,
What we two nights have seen. 

and so on

and this was some of the exercises outlines if people want to play around with the data

#helper library for working with dates
#helper library for manipulating and summarising data
#helper library for cleaning up messy data
#helper library for graphmaking

#don't forget to set the working directory to this file with the Session menu - Set Working Directory - To Source File Location command

#read in the data
hlines <- readLines("../datafiles/hamlog.txt")

#check if lines need removing at the start or end using the head and tail functions
head(????, n=30)
tail(????, n=30)

#remove any lines that need removing
hlines <- hlines[????:length(hlines)]

#noting that it seems to use <>, we check if there are any double uses of the symbol that might confuse splitting
#we use grep() looking for "<.*<" or "<.*<"
#your code goes here

#there are no double instances but before we act, we should confirm the < and > are all in the same lines
#lets use grep with "<" to make a set 
setlessthan <- grep(????)
#and use grep with ">" to make a set 
setgreaterthan <- grep(????)
#and a setdiff set function to see all the members of one set that are not in the other set
setdiff(setlessthan, setgreaterthan)
setdiff(setgreaterthan, setlessthan)

#and there are no differences, which makes it safe to split on < and >

#put the data into a data frame so we can use dplyr and tydyr
hlog <- data.frame(ln = hlines, stringsAsFactors = FALSE)

#so we write a test split
hlog %>%
 separate(ln, into=c("time", "process", "message"), sep="[<>]", extra="merge") %>%
#but we want the follow-on lines in the message column
#we can use a mutate with a test of if there is anything in the process column
#if there is, use the existing message value, if there is not use the value currently in time
# so testing if process is blank within a dplyr chain is is.na(process)

hlog %>%
 separate(ln, into=c("time", "process", "message"), sep="[<>]", extra="merge") %>%
 mutate(message = ifelse(????, ????, ????)) %>%

# if we are happy with what we are seeing, we can built on the result
# for example in the mutate (or another mutate) converting the time column to be time rather than text
# modify the above code using the lubridate's ymd_hms() function to convert the time column
# you can do multiple mutates by using commas to separate the changes in one mutate, or pipe to a second

#next, we can build on the results by adding some fill steps for time and process that repeat the entries to fill in the spaces

#as we are interested in log entries rather than lines, we can group by time stamp and process, with group_by(time, process)
#then combine the grouped entries with paste(message, collapse="\n")

#instead of View()ing the result, we might want to save it as an object to work with

#now that you have a complete data set of log entries, you can make a graph of when each process was active.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s