1: \section{Log Varieties}
2: A computer network contains a variety of different infrastructure
3: devices, each of which may be instrumented to produce multiple audit
4: logs. Although the topic of computer and network audit logs is broad, a
5: topic of its own, we feel a brief survey of some of the different
6: types of logs is an important starting point in understanding the
7: issues of sharing heterogeneous logs for network measurement and
8: security research.
9:
10: Note that we are making it a point to emphasize {\em heterogeneous}
11: audit logs. The fact that the audit logs are different is significant
12: because it promotes multiple views for discovery, robustness against
13: attack, interoperability, extensibility and flexibility. However,
14: heterogeneity also provides new avenues of attack against the
15: anonymization system. While one type of log may not be enough to
16: break the anonymization system, information may be inferred from
17: multiple logs that can be used in a successful attack against
18: anonymized data. Thus we seek to create anonymization schemes for
19: many types of logs.
20:
21: What follows is a description of commonly implemented network and system logs
22: summarized from \cite{Yurcik03}. These logs provide situational
23: awareness of what is happening where and when on networks/systems by
24: auditing system activities, transactions performed and network
25: signaling. These logs are useful for detecting network problems,
26: malicious activity and recovery from accidental or intentional
27: failures.
28:
29: \subsection{TCPdump}
30: One of the most common ways of collecting network data into
31: logs is through use of the TCPdump utility. This utility captures
32: headers from packets on a network interface set in promiscuous mode
33: and displays the binary data in a human-readable formats.
34:
35: While TCPdump is a valuable tool, it focuses only on the TCP/IP suite
36: of protocols. There are a variety of other utilities for sniffing raw
37: packets of any protocol from any point on a network. Referred to as
38: {\it sniffers}, the most common examples are the open source tool
39: Ethereal and Sniffer from Network General Corporation---until
40: recently Sniffer was owned by McAfee. As networks
41: increasingly employ switch technology, sniffers that rely on a shared
42: medium network (e.g. traditional Ethernet) are being moved from end
43: systems to servers and routers (and now wireless networks).
44: Alternatively, commercial switches often employ special ports
45: created particularly for sniffers to tap into. Sniffer logs are
46: valuable in discerning low-level attacks such as abnormal traffic
47: attacks (e.g. 802.11 ARP poisoning); however, their scope is limited
48: by their monitoring position within a network, and the log size makes them
49: cumbersome to analyze.
50:
51: \subsection{NetFlows}
52: NetFlow logs contain records of unidirectional flows between
53: computer/port pairs across an instrumentation point (e.g. router) on
54: a network. Ideally, there is an entry per socket. These records can
55: be exported from routers or software such as ARGUS or NTOP. NetFlows
56: are a rich source of information for traffic analysis consisting of
57: some or all of the following fields depending on version and configuration:
58: IP address pairs (source/destination), port pairs
59: (source/destination), protocol (TCP/UDP), packets per second,
60: time-stamps (start/end and/or duration) and byte counts.
61:
62: \subsection{Syslog}
63: Syslogs are a UNIX standard for capturing information about networked
64: devices, daemon processes and even kernel messages. Messages are
65: encoded by level (e.g. warning, error, emergencies) and by facility
66: (e.g. service areas such as printing, e-mail and network). Syslog can also
67: serve as a distributed error manager by forwarding log entries to
68: centralized syslog servers for processing. Syslogs can be
69: pattern-matched for known attack signatures. They can also be
70: searched for potentially suspicious activities such as critical events
71: (e.g. modules being loaded and core dumps), unsuccessful login
72: attempts, new account creation (especially accounts with special
73: privileges), suspicious connections to unused ports, or simply the
74: cessation of logging messages from a host ( which may indicate the logging
75: process has suspended or logs wiped).
76:
77: \subsection{Workstation Logs}
78: Workstation logs are standard utilities that keep login/logout entries
79: on a workstation's local hard disk (e.g. Window's event viewer). Some
80: application software also maintain access logs. For example, virus
81: scanners maintain logs of previous scans. Virus scanners and mail
82: agents themselves may log all outgoing mail messages as
83: well. Workstation logs are enabled by default on most operating
84: systems, but it is almost always possible for an adversary with
85: escalated privileges to disable them.
86:
87: \subsection{ARP Cache}
88: Routers and switches contain cached tables of recent resolutions of
89: MAC addresses to IP addresses called ARP (Address Resolution Protocol)
90: caches. The entries are of two types: {\em dynamic} entries that are
91: added/removed automatically over time and {\em static} entries which
92: remain in the cache until the computer is restarted. Each dynamic
93: ARP cache entry has a potential lifetime of between 2 and 10 minutes
94: (depending on operating system settings, traffic levels and cache
95: size), and a log of all entries can be created over a specified time
96: period.
97:
98: The ARP cache is useful for determining static IP addresses,
99: identifying unregistered/unknown (including maliciously spoofed) and
100: misconfigured devices attached to a network, detecting certain layer 2
101: attacks (e.g. ARP poisoning), identifying what IP address(es) a
102: particular hardware address is using, to debugging connectivity problems
103: a device may be experiencing and tracking unsuccessful connection attempts to
104: devices that either are not currently on the network or do not exist
105: (e.g. port scans to non-existent hosts). These logs are
106: becoming more important with the recent growth of wireless networks
107: and ease at which it is possible to perpetrate ARP poisoning and
108: man-in-the-middle attacks against them.
109:
110: \subsection{DNS Cache}
111: DNS (Domain Name Server) caches contain mappings between
112: fully-qualified hostnames and their corresponding IP addresses based
113: on recent requests to other name servers. The amount of time a name
114: server retains cached data is controlled by the time-to-live (TTL)
115: field for the data. These logs can be created via periodic (period is
116: shorter than minimum TTL) snap shots of the cache timed to capture
117: data at least once before it expires. Host tables (.rhosts and
118: hosts.equiv), which also map hostnames to IP addresses, provide static
119: mapping information. DNS cache records provide useful evidence
120: of attacks purported against DNS services and sometimes of DDoS
121: worms that create high volumes of DNS queries for a target.
122:
123: \subsection{Dial-up Servers}
124: Dial-up server logs maintain system accounting records of who makes
125: incoming network connections. They are a very reliable source of
126: information to investigators because the log is difficult to poison
127: with false information. Even if an attacker steals another's
128: credential to login, the telephone records are very difficult to fake,
129: thus giving away the attackers location if nothing else. However, as
130: VoIP becomes more prevalent, the reliability of telephone records may
131: diminish somewhat. Also, dial-up connections are much slower than
132: many attackers can tolerate, and with the prevalence of broadband, attacks
133: through dial-up servers have become less common. We expect VPN logs
134: to replace the role of dial-up server logs in the near future.
135:
136: \subsection{Kerberos}
137: Kerberos logs contain records of all ticket requests and
138: uses. This information can be used to generate login graphs and
139: determine who was logged into a particular workstation at a particular
140: time. This may help in detecting tickets that have been compromised,
141: perhaps by a brute force password attack, and used by automated tools
142: and scripts.
143:
144: \subsection{SNMP}
145: SNMP (Simple Network Management Protocol) logs, referred to as
146: Management Information Bases (MIBs), are databases of managed objects
147: that store information about a wide variety of network devices.
148: The SNMP operator application monitors network devices via
149: polls to network device agents for specified MIB information or traps
150: from network device agents notifying the operator of an event.
151:
152: \subsection{Routing Tables}
153: Routing table logs (e.g. inter-domain BGP, intra-domain OSPF or RIP)
154: provide information about routing-based attacks and errors ranging
155: from individual misbehaving routers that drop/misroute packets or
156: inject disruptively large routing tables to the systemic
157: network-wide advertisement of false routing information or instability
158: caused by the propagation of worms. Global, local or peer routing
159: tables all provide different vantage points for analysis.
160:
161: \subsection{Firewalls}
162: A firewall is a computer or network device that interfaces between an internal
163: network/computer and external networks that are trusted to a lesser
164: extent (e.g. the Internet) to enforce an organizational access control
165: policy by processing packets/connections based on the rule set. Note
166: this definition is being expanded as there are now personal firewalls
167: that are being installed on workstations. These differ in that they
168: are positioned at the endpoints and hence can be more application aware.
169: Firewall logs are important in a recursive way, to maintain the
170: effectiveness of the firewall's internal rule set. A rule set exactly specifies
171: what traffic to permit/block and typically grows in the number of
172: rules beyond human comprehension in a commercial setup.
173:
174: Firewalls can be used to monitor both normal activity (types of
175: services requested and used, common external IP addresses accessing
176: internal services, common access time patterns) and suspicious
177: activity (probes to ports with no authorized services,
178: external-to-internal flows with spoofed internal IP addresses,
179: out-bound connections from uncharacteristic internal machines and
180: modification/disabling of the firewall rule set).
181:
182: \subsection{Intrusion Detection Systems}
183: Logs from Intrusion Detection Systems (IDS) contain alerts indicating
184: specific attacks that have occurred. Generally, while a firewall has a
185: proactive, preventative focus, an IDS has a passive, reactive focus.
186: The assumption is that eventually some one will break through the
187: perimeter firewalls, and then one needs to be able to detect
188: intruders. IDSs can be categorized by sensor
189: placement (network versus host) and by technique (signature versus
190: anomaly), with all types producing both alerts and detailed logs.
191: Real-time IDSs have been plagued by large log sizes and high false
192: positive rates---especially in anomaly based systems, but incremental
193: improvements are increasing their effectiveness for post-mortem forensics.
194:
195: \subsection{Mail Servers}
196: Mail logs maintain a log of completed transactions (as well as a queue
197: of pending transactions) including the sender and recipient addresses,
198: subject titles, time-stamps and file sizes. Common log reports
199: generated include: total length of time spent receiving and sending
200: e-mail, the number of e-mails by an entity (organization, group, or
201: individual) over a specific period of time (day/week/month),
202: stratification of e-mail by time (work hours/off-hours) and common
203: addresses, stratification by size and type of file attachments and
204: identification of dormant accounts.
205:
206: \subsection{Web Servers}
207: Web server logs have traditionally been used to provide feedback on
208: performance and misconfigurations (e.g. link errors). Web logs provide
209: detailed records of requests to the web-server and statistical
210: information about network traffic. Web log record attributes can
211: include: the source IP address from which a request was generated,
212: whether the request was satisfied, a userid determined by the HTTP
213: authentication, a status code and the size of the object returned
214: with each satisfied request.
215:
216: While traditionally used for performance information, web logs are
217: being used more frequently for security analysis. They can be used
218: to detect illegitimate requests (e.g. asking to run a script in a
219: directory that should not be accessible) that exploit
220: misconfigurations, buffer overflow attacks on the web server used to
221: run arbitrary processes with the privileges of the web server daemon
222: and attacks targeting specific web applications and scripts that are
223: not secure.
224:
225: \subsection{DHCP}
226: Dynamic Host Configuration Protocol (DHCP) server logs can be used to
227: track IP address assignments to devices as they join/leave a network.
228: DHCP servers manage two databases: (1) an Address Pool database for
229: holding IP addresses and other network configurations and (2) a
230: Binding database for mappings between hardware MAC addresses and an
231: entry in the Address Pool. Though most frequently used to assign
232: dynamic IP addresses, DHCP can also assign a dedicated static address
233: for a device that re-joins. On a network that uses DHCP with dynamic
234: addresses, maintaining a log is absolutely necessary to be able to
235: forensically associate dynamically changing IP addresses to specific devices.
236:
237: \subsection{Scanners}
238: Scanners for defensive purposes are used to perform risk management by
239: detecting vulnerabilities and notifying system administrators of the
240: vulnerabilities and patches that need to be installed. They typically
241: generate reports rather than logs, though. Scanners range from simple
242: port scanners such as NMAP that report open ports and operating
243: systems detected, to very advanced scanners such as NESSUS that runs NMAP to
244: detect open ports and determines exact versions of services running, and
245: whether these systems are vulnerable to known exploits. Along with
246: the rise in managed security, there are companies such as Qualys that
247: provide proprietary scanners through a web interface and scan for
248: you. Qualys, for example, will scan from the outside---or inside if
249: you purchase a special network device---and produce a complete,
250: customizable report in multiple formats that indicates what
251: vulnerabilities your systems have. With these managed services you
252: typically get easier to read reports and the most up-to-date
253: vulnerability databases.
254:
255: While this list of sixteen log types may seem exhaustive, it is not.
256: The list of potential log sources are more numerous than the number of
257: services and daemon processes in deployment. There are many
258: proprietary logs we did not mention, including reference monitor
259: alerts, router traps and a myriad of application software
260: logs---though many of the latter log through syslog on UNIX based
261: platforms. The challenge to those working on the problem of log
262: anonymization for security is that they must consider as many log
263: sources as possible and try to generate a basis of logs that contain
264: as much information as possible with minimal over lap between logs.