I’m running a session on analysing log files next week, and for an example file, I turned the script of Hamlet from project Gutenberg into a log file with a bit of careful adjusting of the file and adding some timestamps.
Hamlog imitation log file Text from Project Gutenburg, http://www.gutenberg.org/ebooks/1524 2016-09-05 15:19:14<SHAKESPEARE.>SCENE. Elsinore. ACT I. Scene I. Elsinore. A platform before the Castle. [Francisco at his post. Enter to him Bernardo.] 2016-09-05 15:20:23<Ber.>Who's there? 2016-09-05 15:20:45<Fran.>Nay, answer me: stand, and unfold yourself. 2016-09-05 15:21:00<Ber.>Long live the king! 2016-09-05 15:21:12<Fran.>Bernardo? 2016-09-05 15:21:22<Ber.>He. 2016-09-05 15:21:43<Fran.>You come most carefully upon your hour. 2016-09-05 15:22:07<Ber.>'Tis now struck twelve. Get thee to bed, Francisco. 2016-09-05 15:22:30<Fran.>For this relief much thanks: 'tis bitter cold, And I am sick at heart. 2016-09-05 15:22:55<Ber.>Have you had quiet guard? 2016-09-05 15:23:11<Fran.>Not a mouse stirring. 2016-09-05 15:23:25<Ber.>Well, good night. If you do meet Horatio and Marcellus, The rivals of my watch, bid them make haste. 2016-09-05 15:24:15<Fran.>I think I hear them.—Stand, ho! Who is there? 2016-09-05 15:24:30<SHAKESPEARE.>[Enter Horatio and Marcellus.] 2016-09-05 15:24:46<Hor.>Friends to this ground. 2016-09-05 15:25:03<Mar.>And liegemen to the Dane. 2016-09-05 15:25:18<Fran.>Give you good-night. 2016-09-05 15:25:35<Mar.>O, farewell, honest soldier; Who hath reliev'd you? 2016-09-05 15:25:59<Fran.>Bernardo has my place. Give you good-night. 2016-09-05 15:26:15<SHAKESPEARE.>[Exit.] 2016-09-05 15:26:29<Mar.>Holla! Bernardo! 2016-09-05 15:26:39<Ber.>Say. What, is Horatio there? 2016-09-05 15:27:01<Hor.>A piece of him. 2016-09-05 15:27:22<Ber.>Welcome, Horatio:—Welcome, good Marcellus. 2016-09-05 15:27:45<Mar.>What, has this thing appear'd again to-night? 2016-09-05 15:28:00<Ber.>I have seen nothing. 2016-09-05 15:28:19<Mar.>Horatio says 'tis but our fantasy, And will not let belief take hold of him Touching this dreaded sight, twice seen of us: Therefore I have entreated him along With us to watch the minutes of this night; That, if again this apparition come He may approve our eyes and speak to it. 2016-09-05 15:29:58<Hor.>Tush, tush, 'twill not appear. 2016-09-05 15:30:12<Ber.>Sit down awhile, And let us once again assail your ears, That are so fortified against our story, What we two nights have seen.
and so on
and this was some of the exercises outlines if people want to play around with the data
#helper library for working with dates library(lubridate) #helper library for manipulating and summarising data library(dplyr) #helper library for cleaning up messy data library(tidyr) #helper library for graphmaking library(ggplot2) #don't forget to set the working directory to this file with the Session menu - Set Working Directory - To Source File Location command #read in the data hlines <- readLines("../datafiles/hamlog.txt") #check if lines need removing at the start or end using the head and tail functions head(????, n=30) tail(????, n=30) #remove any lines that need removing hlines <- hlines[????:length(hlines)] #noting that it seems to use <>, we check if there are any double uses of the symbol that might confuse splitting #we use grep() looking for "<.*<" or "<.*<" #your code goes here #there are no double instances but before we act, we should confirm the < and > are all in the same lines #lets use grep with "<" to make a set setlessthan <- grep(????) #and use grep with ">" to make a set setgreaterthan <- grep(????) #and a setdiff set function to see all the members of one set that are not in the other set setdiff(setlessthan, setgreaterthan) setdiff(setgreaterthan, setlessthan) #and there are no differences, which makes it safe to split on < and > #put the data into a data frame so we can use dplyr and tydyr hlog <- data.frame(ln = hlines, stringsAsFactors = FALSE) #so we write a test split hlog %>% separate(ln, into=c("time", "process", "message"), sep="[<>]", extra="merge") %>% View() #but we want the follow-on lines in the message column #we can use a mutate with a test of if there is anything in the process column #if there is, use the existing message value, if there is not use the value currently in time # so testing if process is blank within a dplyr chain is is.na(process) hlog %>% separate(ln, into=c("time", "process", "message"), sep="[<>]", extra="merge") %>% mutate(message = ifelse(????, ????, ????)) %>% View() # if we are happy with what we are seeing, we can built on the result # for example in the mutate (or another mutate) converting the time column to be time rather than text # modify the above code using the lubridate's ymd_hms() function to convert the time column # you can do multiple mutates by using commas to separate the changes in one mutate, or pipe to a second #next, we can build on the results by adding some fill steps for time and process that repeat the entries to fill in the spaces #as we are interested in log entries rather than lines, we can group by time stamp and process, with group_by(time, process) #then combine the grouped entries with paste(message, collapse="\n") #instead of View()ing the result, we might want to save it as an object to work with #now that you have a complete data set of log entries, you can make a graph of when each process was active.