0409:cs0409005/ch3.tex

1: \section{Log Varieties}

2: A computer network contains a variety of different infrastructure

3: devices, each of which may be instrumented to produce multiple audit

4: logs. Although the topic of computer and network audit logs is broad, a

5: topic of its own, we feel a brief survey of some of the different

6: types of logs is an important starting point in understanding the

7: issues of sharing heterogeneous logs for network measurement and

8: security research.

9:

10: Note that we are making it a point to emphasize {\em heterogeneous}

11: audit logs. The fact that the audit logs are different is significant

12: because it promotes multiple views for discovery, robustness against

13: attack, interoperability, extensibility and flexibility. However,

14: heterogeneity also provides new avenues of attack against the

15: anonymization system. While one type of log may not be enough to

16: break the anonymization system, information may be inferred from

17: multiple logs that can be used in a successful attack against

18: anonymized data. Thus we seek to create anonymization schemes for

19: many types of logs.

20:

21: What follows is a description of commonly implemented network and system logs

22: summarized from \cite{Yurcik03}. These logs provide situational

23: awareness of what is happening where and when on networks/systems by

24: auditing system activities, transactions performed and network

25: signaling. These logs are useful for detecting network problems,

26: malicious activity and recovery from accidental or intentional

27: failures.

28:

29: \subsection{TCPdump}

30: One of the most common ways of collecting network data into

31: logs is through use of the TCPdump utility. This utility captures

32: headers from packets on a network interface set in promiscuous mode

33: and displays the binary data in a human-readable formats.

34:

35: While TCPdump is a valuable tool, it focuses only on the TCP/IP suite

36: of protocols. There are a variety of other utilities for sniffing raw

37: packets of any protocol from any point on a network. Referred to as

38: {\it sniffers}, the most common examples are the open source tool

39: Ethereal and Sniffer from Network General Corporation---until

40: recently Sniffer was owned by McAfee. As networks

41: increasingly employ switch technology, sniffers that rely on a shared

42: medium network (e.g. traditional Ethernet) are being moved from end

43: systems to servers and routers (and now wireless networks).

44: Alternatively, commercial switches often employ special ports

45: created particularly for sniffers to tap into. Sniffer logs are

46: valuable in discerning low-level attacks such as abnormal traffic

47: attacks (e.g. 802.11 ARP poisoning); however, their scope is limited

48: by their monitoring position within a network, and the log size makes them

49: cumbersome to analyze.

50:

51: \subsection{NetFlows}

52: NetFlow logs contain records of unidirectional flows between

53: computer/port pairs across an instrumentation point (e.g. router) on

54: a network. Ideally, there is an entry per socket. These records can

55: be exported from routers or software such as ARGUS or NTOP. NetFlows

56: are a rich source of information for traffic analysis consisting of

57: some or all of the following fields depending on version and configuration:

58: IP address pairs (source/destination), port pairs

59: (source/destination), protocol (TCP/UDP), packets per second,

60: time-stamps (start/end and/or duration) and byte counts.

61:

62: \subsection{Syslog}

63: Syslogs are a UNIX standard for capturing information about networked

64: devices, daemon processes and even kernel messages. Messages are

65: encoded by level (e.g. warning, error, emergencies) and by facility

66: (e.g. service areas such as printing, e-mail and network). Syslog can also

67: serve as a distributed error manager by forwarding log entries to

68: centralized syslog servers for processing. Syslogs can be

69: pattern-matched for known attack signatures. They can also be

70: searched for potentially suspicious activities such as critical events

71: (e.g. modules being loaded and core dumps), unsuccessful login

72: attempts, new account creation (especially accounts with special

73: privileges), suspicious connections to unused ports, or simply the

74: cessation of logging messages from a host ( which may indicate the logging

75: process has suspended or logs wiped).

76:

77: \subsection{Workstation Logs}

78: Workstation logs are standard utilities that keep login/logout entries

79: on a workstation's local hard disk (e.g. Window's event viewer). Some

80: application software also maintain access logs. For example, virus

81: scanners maintain logs of previous scans. Virus scanners and mail

82: agents themselves may log all outgoing mail messages as

83: well. Workstation logs are enabled by default on most operating

84: systems, but it is almost always possible for an adversary with

85: escalated privileges to disable them.

86:

87: \subsection{ARP Cache}

88: Routers and switches contain cached tables of recent resolutions of

89: MAC addresses to IP addresses called ARP (Address Resolution Protocol)

90: caches. The entries are of two types: {\em dynamic} entries that are

91: added/removed automatically over time and {\em static} entries which

92: remain in the cache until the computer is restarted. Each dynamic

93: ARP cache entry has a potential lifetime of between 2 and 10 minutes

94: (depending on operating system settings, traffic levels and cache

95: size),  and a log of all entries can be created over a specified time

96: period.

97:

98: The ARP cache is useful for determining static IP addresses,

99: identifying unregistered/unknown (including maliciously spoofed) and

100: misconfigured devices attached to a network, detecting certain layer 2

101: attacks (e.g. ARP poisoning), identifying what IP address(es) a

102: particular hardware address is using, to debugging connectivity problems

103: a device may be experiencing and tracking unsuccessful connection attempts to

104: devices that either are not currently on the network or do not exist

105: (e.g. port scans to non-existent hosts). These logs are

106: becoming more important with the recent growth of wireless networks

107: and ease at which it is possible to perpetrate ARP poisoning and

108: man-in-the-middle attacks against them.

109:

110: \subsection{DNS Cache}

111: DNS (Domain Name Server) caches contain mappings between

112: fully-qualified hostnames and their corresponding IP addresses based

113: on recent requests to other name servers. The amount of time a name

114: server retains cached data is controlled by the time-to-live (TTL)

115: field for the data. These logs can be created via periodic (period is

116: shorter than minimum TTL) snap shots of the cache timed to capture

117: data at least once before it expires. Host tables (.rhosts and

118: hosts.equiv), which also map hostnames to IP addresses, provide static

119: mapping information. DNS cache records provide useful evidence

120: of attacks purported against DNS services and sometimes of DDoS

121: worms that create high volumes of DNS queries for a target.

122:

123: \subsection{Dial-up Servers}

124: Dial-up server logs maintain system accounting records of who makes

125: incoming network connections. They are a very reliable source of

126: information to investigators because the log is difficult to poison

127: with false information. Even if an attacker steals another's

128: credential to login, the telephone records are very difficult to fake,

129: thus giving away the attackers location if nothing else. However, as

130: VoIP becomes more prevalent, the reliability of telephone records may

131: diminish somewhat. Also, dial-up connections are much slower than

132: many attackers can tolerate, and with the prevalence of broadband, attacks

133: through dial-up servers have become less common. We expect VPN logs

134: to replace the role of dial-up server logs in the near future.

135:

136: \subsection{Kerberos}

137: Kerberos logs contain records of all ticket requests and

138: uses. This information can be used to generate login graphs and

139: determine who was logged into a particular workstation at a particular

140: time. This may help in detecting tickets that have been compromised,

141: perhaps by a brute force password attack, and used by automated tools

142: and scripts.

143:

144: \subsection{SNMP}

145: SNMP (Simple Network Management Protocol) logs, referred to as

146: Management Information Bases (MIBs), are databases of managed objects

147: that store information about a wide variety of network devices.

148: The SNMP operator application monitors network devices via

149: polls to network device agents for specified MIB information or traps

150: from network device agents notifying the operator of an event.

151:

152: \subsection{Routing Tables}

153: Routing table logs (e.g. inter-domain BGP, intra-domain OSPF or RIP)

154: provide information about routing-based attacks and errors ranging

155: from individual misbehaving routers that drop/misroute packets or

156: inject disruptively large routing tables to the systemic

157: network-wide advertisement of false routing information or instability

158: caused by the propagation of worms. Global, local or peer routing

159: tables all provide different vantage points for analysis.

160:

161: \subsection{Firewalls}

162: A firewall is a computer or network device that interfaces between an internal

163: network/computer and external networks that are trusted to a lesser

164: extent (e.g. the Internet) to enforce an organizational access control

165: policy by processing packets/connections based on the rule set. Note

166: this definition is being expanded as there are now personal firewalls

167: that are being installed on workstations. These differ in that they

168: are positioned at the endpoints and hence can be more application aware.

169: Firewall logs are important in a recursive way, to maintain the

170: effectiveness of the firewall's internal rule set. A rule set exactly specifies

171: what traffic to permit/block and typically grows in the number of

172: rules beyond human comprehension in a commercial setup.

173:

174: Firewalls can be used to monitor both normal activity (types of

175: services requested and used, common external IP addresses accessing

176: internal services, common access time patterns) and suspicious

177: activity (probes to ports with no authorized services,

178: external-to-internal flows with spoofed internal IP addresses,

179: out-bound connections from uncharacteristic internal machines and

180: modification/disabling of the firewall rule set).

181:

182: \subsection{Intrusion Detection Systems}

183: Logs from Intrusion Detection Systems (IDS) contain alerts indicating

184: specific attacks that have occurred. Generally, while a firewall has a

185: proactive, preventative focus, an IDS has a passive, reactive focus.

186: The assumption is that eventually some one will break through the

187: perimeter firewalls, and then one needs to be able to detect

188: intruders. IDSs can be categorized by sensor

189: placement (network versus host) and by technique (signature versus

190: anomaly), with all types producing both alerts and detailed logs.

191: Real-time IDSs have been plagued by large log sizes and high false

192: positive rates---especially in anomaly based systems, but incremental

193: improvements are increasing their effectiveness for post-mortem forensics.

194:

195: \subsection{Mail Servers}

196: Mail logs maintain a log of completed transactions (as well as a queue

197: of pending transactions) including the sender and recipient addresses,

198: subject titles, time-stamps and file sizes. Common log reports

199: generated include: total length of time spent receiving and sending

200: e-mail, the number of e-mails by an entity (organization, group, or

201: individual) over a specific period of time (day/week/month),

202: stratification of e-mail by time (work hours/off-hours) and common

203: addresses, stratification by size and type of file attachments and

204: identification of dormant accounts.

205:

206: \subsection{Web Servers}

207: Web server logs have traditionally been used to provide feedback on

208: performance and misconfigurations (e.g. link errors). Web logs provide

209: detailed records of requests to the web-server and statistical

210: information about network traffic. Web log record attributes can

211: include: the source IP address from which a request was generated,

212: whether the request was satisfied, a userid determined by the HTTP

213: authentication, a status code and the size of the object returned

214: with each satisfied request.

215:

216: While traditionally used for performance information, web logs are

217: being used more frequently for security analysis. They can be used

218: to detect illegitimate requests (e.g. asking to run a script in a

219: directory that should not be accessible) that exploit

220: misconfigurations, buffer overflow attacks on the web server used to

221: run arbitrary processes with the privileges of the web server daemon

222: and attacks targeting specific web applications and scripts that are

223: not secure.

224:

225: \subsection{DHCP}

226: Dynamic Host Configuration Protocol (DHCP) server logs can be used to

227: track IP address assignments to devices as they join/leave a network.

228: DHCP servers manage two databases: (1) an Address Pool database for

229: holding IP addresses and other network configurations and (2) a

230: Binding database for mappings between hardware MAC addresses and an

231: entry in the Address Pool. Though most frequently used to assign

232: dynamic IP addresses, DHCP can also assign a dedicated static address

233: for a device that re-joins. On a network that uses DHCP with dynamic

234: addresses, maintaining a log is absolutely necessary to be able to

235: forensically associate dynamically changing IP addresses to specific devices.

236:

237: \subsection{Scanners}

238: Scanners for defensive purposes are used to perform risk management by

239: detecting vulnerabilities and notifying system administrators of the

240: vulnerabilities and patches that need to be installed. They typically

241: generate reports rather than logs, though. Scanners range from simple

242: port scanners such as NMAP that report open ports and operating

243: systems detected, to very advanced scanners such as NESSUS that runs NMAP to

244: detect open ports and determines exact versions of services running, and

245: whether these systems are vulnerable to known exploits. Along with

246: the rise in managed security, there are companies such as Qualys that

247: provide proprietary scanners through a web interface and scan for

248: you. Qualys, for example, will scan from the outside---or inside if

249: you purchase a special network device---and produce a complete,

250: customizable report in multiple formats that indicates what

251: vulnerabilities your systems have. With these managed services you

252: typically get easier to read reports and the most up-to-date

253: vulnerability databases.

254:

255: While this list of sixteen log types may seem exhaustive, it is not.

256: The list of potential log sources are more numerous than the number of

257: services and daemon processes in deployment.  There are many

258: proprietary logs  we did not mention, including reference monitor

259: alerts, router traps and a myriad of application software

260: logs---though many of the latter log through syslog on UNIX based

261: platforms.  The challenge to those working on the problem of log

262: anonymization for security is that they must consider as many log

263: sources as possible and try to generate a basis of logs that contain

264: as much information as possible with minimal over lap between logs.