Labs

Tools useful for hunting

Log Aggregation

Much of the work you’ll do hunting involves performing data transformation on log data. You must have a tool that allows you to centrally aggregate those logs and interact with them. This is a basic necessity for hunting.

Data Manipulation Tools

Many of the best data manipulation tools are free on the command line, or application you might traditionally use for other purposes. Scripting languages and specific libraries are also useful for this purpose.

  • Python

  • Pandas (Python Library)

  • R

  • CyberChef

  • Regex Buddy

  • linux cmd tools

    • sed

    • awk

    • sort

    • uniq

    • cut

    • jq

    • grep

Specialized Analysis Tools

A significant portion of the hunter’s tool kit is dedicated to analysis tools that are unique to specific data types. I’ve highlighted some of my favorites for many of my preferred data types here.

Network

  • Wireshark

  • tshark

  • molo.ch

  • network miner

  • nfdump

File

  • YARA

OS

  • Osquery

  • GRR

Memory

  • Volatility

Threat Intel

  • Domain Tools

  • Passive Total

  • Alienvault OTX

  • Hybrid Analysis

  • Virus Total

  • any.run

  • Cuckoo

  • Maltego

  • urlscan.io

  • shodan.io

  • censys.io

Have BITS Jobs been used for malicious purposes on my network?

Your goal is to use the attack-based hunting method to find anomalous BITS-related network communication using Bro/Zeek HTTP communication logs.

References:

You may not be able to 100% confirm that the anomaly you've found is evil. The goal here is simply to find the anomaly that is most likely to lead to an incident.

Accessing the Lab Data

  • All data for this exercise is already loaded on to the student VM in the lab1 index.

  • To query the data, open Kibana using the desktop icon. Click the Discover tab and search for _index:lab1. This will return all the lab data.

  • Any search or data transformation you perform must include the string _index:lab1 to limit your result set exclusively to this lab.

  • To ensure you see all the data, make sure the time range selection at the top right of the screen is always set to Last 5 Years.

Initial Research

First, start out by doing some basic research on how BITS functions and how attackers have used it to transmit files, evade detection, and facilitate command and control. You can use the links provided in the lab introduction. You should focus on answering the following:

  • What is the designed purpose of BITS?

  • How do attackers use BITS for malicious purposes?

  • What does BITS look like in logs and/or network data?

What am I looking for?

You're looking for evidence of the BITS mechanism used to transfer files to support malicious activities.

Where am I likely to find it?

HTTP transaction logs are all you have available in this exercise, so you'll clearly be focusing on BITS related communication rather than execution logs of the bitsadmin tool.

Apply your knowledge of BITS communication to the available fields in the Zeek HTTP data. What fields are most useful for anomaly hunting?

Here's a short list of considerations:

  1. Method

  2. Mime Type

  3. Domain

  4. URI

How can I manipulate data to see it?

For each of the fields identified, you're looking for anomalies that are likely to occur infrequently. So, perform an aggregation on each field to examine the distribution of values, focusing on outliers (least frequent occurrence).

To create aggregations:

  1. Go to the Visualize Tab

  2. Create a new Data Table

  3. Select the lab1 index

  4. Under Buckets, click Split Rows.

  5. Under Aggregation, select Terms.

  6. Choose the field you wish to aggregate the unique values for.

  7. Change the size to 50000 so it includes all the logs from this lab.

  8. Click the play button (if you don't have any data, make sure your date range is set to Last 5 years)

Review the distribution of results. What looks unique and weird? Research it by looking at the full event (search the string in the Discover tab) or using Google to perform research.

As you find things likely to be benign, exclude them from your aggregations. You can do this by adding an exclusion to the search at the top of that screen. You should keep adding these until you get down to a reasonable set of results you can manually examine:

  • NOT domain:(*microsoft* OR *windows*)

  • NOT uri:(*jpg* OR *png* OR *edge*)

The two fields that should become most interesting to you are the domain and URI fields because of their variability.

Does anything in Windows Program Execution logs look malicious?

References:

I also have some friendly intelligence for you to consider in this exercise.

There are three network segments of traffic included here. You can identify member workstations by their hostname values.

  • Finance: Accounting users

    • Leonard

    • Deanna

    • Pavel

  • Dev: Development users

    • Geordi

    • Scotty

    • Sonya

  • Admin: IT administrators

    • Jean-luc

    • Jim

    • Kathryn

Accessing the Lab Data

  • All data for this exercise is already loaded on to the student VM in the lab2 index.

  • To query the data, open Kibana using the desktop icon. Click the Discover tab and search for _index:lab2. This will return all the lab data.

  • Any search or data transformation you perform must include the string _index:lab2 to limit your result set exclusively to this lab.

  • To ensure you see all the data, make sure the time range selection at the top right of the screen is always set to Last 5 Years.

  • The analysis or comparison of timestamps are not relevant for this exercise.

Initial Research

First, start out by doing some basic research on what EID 4688 represents. Also consider the role of program execution in a typical attack, either directly (execution of malware) or indirectly (execution of file that will launch malware). These attacks span multiple categories including:

  • Any malware

  • Malicious documents (office/PDF)

  • Malicious use of legitimate applications

  • Initial compromise

  • Lateral movement

  • Exfiltration

  • Sensitive file access

This is a very broad realm of possibilties, but there aren't too many relevant fields. You can use the links provided in the introduction for input.

What fields are most likely to contain evidence of attacks?

The primary field we're concerned with is NewProcessName since it provides the most context about what was executed.

In addition, the following fields also provide useful context about the nature of the execution:

  1. Timestamp (not relevant in this exercise)

  2. HostName

  3. User

What would be anomalous in these fields?

Here are some ideas for things to look for in relation to the identified fields.

  • Mirroring legitimacy

  • Oddly high frequency of occurrence

  • Generic non descriptives

  • Use of legitimate applications in an abnormal context (odd command line arguments)

  • Weird content formatting

  • Unexpected entity relationships (by user/host)

  • Improper timing

  • Baseline deviations

How can I manipulate data to see it?

The technique used to manipulate data will vary based on the field and anomaly you're looking at.

Search/Aggregate by unique NewProcessName + apply reductions to exclude normal

  • Mirroring legitimacy

  • Generic non descriptives

  • Weird content formatting

  • Use of legitimate applications in an abnormal context (odd command line arguments)

Search/Aggregate by unique NewProcessName + count occurrence per host and compare normal vs. expected

  • Oddly high frequency of occurrence

Search/Aggregate by unique NewProcessName by Host(s) or User(s) + examine context vs. expected

  • Unexpected entity relationships (by user/host)

  • Baseline deviations

Search/Aggregate by unique NewProcessName + examine timing vs. expected for the user/app

  • Improper timing

Has Emotet executed successfully on my network?

Your goal is to use the attack-based hunting method to find evidence of potential Emotet activity using only Bro/Zeek connection logs.

References:

I also have some friendly intelligence for you to consider in this exercise.

Network IP Range: 192.168.100.0/24

  • Domain Controllers: 192.168.100.2-5

  • Internal Web Servers: 192.168.100.10-15

  • Workstations: 192.168.100.100-245

Accessing the Lab Data

  • All data for this exercise is already loaded on to the student VM in the lab3 index.

  • To query the data, open Kibana using the desktop icon. Click the Discover tab and search for _index:lab3. This will return all the lab data.

  • Any search or data transformation you perform must include the string _index:lab3 to limit your result set exclusively to this lab.

  • To ensure you see all the data, make sure the time range selection at the top right of the screen is always set to Last 5 Years. If you'd like to get more specific, all the data for this lab is timestamped during the month of January 2019.

Initial Research

First, start out by doing some basic research on how Emotet functions and how it has evolved over time. While it started as a simple banking trojan, it's functionality has expanded and it is also used in the delivery of other malware. You can use the links provided in the lab introduction to study this information.

You should focus on answering the following:

  • How is Emotet generally delivered to a host for an initial infection?

  • Once successfully running on a system, how does Emotet spread to others?

  • What are Emotet's goals? How does it achieve them?

What am I looking for?

You're looking for evidence of Emotet on your network.

This exercise is unique because it forces you to really parse down the threat intel you're provided and consider which possible anomalies could manifest given the limited data you have to work with.

The only thing you have to work with here are network connections. So, you can only find evidence of Emotet's network communication. That will probably be the initial infection or the lateral movement.

Where am I likely to find it?

Bro/Zeek connection are all you have available in this exercise, so you won't be able to look for some of the more obvious indicators. Instead, you'll have to focus on network behaviors.

Apply your knowledge of Emotet network communication to the available fields in the Zeek connection data. What fields are most useful for anomaly hunting?

There aren't many available for us:

  1. Originating (Source) IP

  2. Originating (Source) Port

  3. Responding (Dest) IP

  4. Responding (Dest) Port

  5. Duration

  6. Original Bytes

  7. Respond Bytes

So, we have to consider the anomaly types that are mostly likely to manifest here. On the initial infection side:

  • Delivery from a phishing link

  • Externally facing SMB service exploited via EternalBlue vulnerability

On the lateral movement/spread side:

  • SMB exploitation via EternalBlue

In many of these cases you are limited to only a few types of anomalies: frequency of occurrence, baseline deviations, unexpected one to many relationships, abnormal relationships. Most of these require some application of the friendly intelligence information.

How can I manipulate data to see it?

Delivery from a phishing link

  • Search for inbound port 25 traffic. Aggregate port 25 by source host. Sort by LFO.

Externally facing SMB service exploited via EternalBlue vulnerability

  • Search inbound port 445 traffic. Aggregate port 445 by dest host. Compare to friendly intelligence. Should these hosts (if they exist) be receiving SMB data from the internet? Consider the amount of connections and total transferred traffic -- you can do this by running more aggregations.

SMB exploitation via EternalBlue

  • Search internal to internal port 445 traffic. Aggregate unique count of dest hosts by source host. Sort by MFO. Compare to friendly intelligence. Should these hosts (if they exist) be communicating with so many other hosts over SMB? Consider the amount of connections and total transferred traffic -- you can do this by running more aggregations.

Has a User Account Control (UAC) bypass been used for malicious purposes on my network?

Your goal is to use the attack-based hunting method to find evidence of likely UAC bypass using the provided Windows logs.

References:

Accessing the Lab Data

  • All data for this exercise is already loaded on to the student VM in the lab4 index.

  • To query the data, open Kibana using the desktop icon. Click the Discover tab and search for _index:lab4. This will return all the lab data.

  • Any search or data transformation you perform must include the string _index:lab4 to limit your result set exclusively to this lab.

  • To ensure you see all the data, make sure the time range selection at the top right of the screen is always set to Last 5 Years. If you'd like to get more specific, all the data for this lab is timestamped during the month of January 2019.

Initial Research

First, start out by doing some basic research on how UAC functions and how attackers have been able to bypass. You might even want to simulate this yourself in a lab environment. You can use the links provided in the lab introduction to guide your research, but you probably won't want to start there. You should focus on answering the following:

  • What does the execution of a process that relies on a UAC prompt look like?

  • What ways might attacks or suppression of UAC look like in log data?

What am I looking for?

You're looking for evidence of UAC being bypassed to launch another malicious process.

Where am I likely to find it?

You're working with Windows logs here, and if you look closely you only have Windows process execution (4688) and termination (4689) logs.

Apply your knowledge of UAC execution to the available fields in the Windows log data. What fields are most useful for anomaly hunting?

Here's a short list of considerations:

  1. EventTime

  2. NewProcessName / ProcessName

  3. EventTimeDelta

  4. Command Line

In this case you really have to consider the relationship between the UAC consent.exe process (the prompt the user has to click to allow an elevated process to run) and the execution of that elevated process. That order will occur in a specific manner, but more importantly, consider the human factor involved in acknowledging the prompt.

A few ideas for things to look for:

  • Rarely executed process

  • Odd process start and end timing of consent.exe

  • Unexpected user ownership for the elevated process

How can I manipulate data to see it?

Rarely executed process

  • Aggregate the NewProcessName field and sort by LFO.

Odd process start and end timing of consent.exe

  • Search for low consent.exe EventTimeDelta values and look for following elevated process.

Unexpected user ownership for the elevated process

  • Search for elevated process and review user ownership. Compare to other executions of that same process.

Does anything in HTTP traffic on our guest network look malicious?

Your goal is to use the data-based hunting method to find anomalous HTTP communication using Bro/Zeek HTTP logs.

References:

Accessing the Lab Data

  • All data for this exercise is already loaded on to the student VM in the lab5 index.

  • To query the data, open Kibana using the desktop icon. Click the Discover tab and search for _index:lab5. This will return all the lab data.

  • Any search or data transformation you perform must include the string _index:lab5 to limit your result set exclusively to this lab.

  • To ensure you see all the data, make sure the time range selection at the top right of the screen is always set to Last 5 Years.

  • Timestamp data for this exercise is irrelevant.

Initial Research

First, start out by doing some basic research on how attackers use HTTP for malicious purposes. These attacks span multiple categories including:

  • Use as a command and control channel

  • A mechanism for redirection to websites hosting malicious content

  • A protocol that can be eavesdropped to intercept credentials and session cookies

  • A pathway towards exploitation of browsers or their plugins (flash, java)

  • A mechanism for carrying web application attacks (SQL injection, XSS, etc.)

This is very broad, which makes DBH for HTTP difficult and diverse. You can use the links provided in the introduction for input.

What fields are most likely to contain evidence of attacks?

We can expect that guest Wi-Fi is an isolated network segment, so there are no internal assets that can be directly accessed other than the routing device upstream from the public network. Since we only have HTTP logs, it's unlikely we'll observe any attacks from the public network to the private network. Aside from misconfigurations, we'll mostly be focused on people using the wifi network to launch attacks against other customers or organizations.

What fields are most useful for anomaly hunting of HTTP data in this context?

Here's a short list of considerations to limit your scope:

  1. Mime Type

  2. Domain

  3. Cookie

What would be anomalous in these fields?

Here are some ideas for things to look for in relation to each field.

Mime Type

  • Executable files

  • Archive files

  • Suspiciously named files

  • Unrecognized file extensions

  • Requests for files with no referrer

  • Multiple requests for the same file

Domain

  • Mirroring legitimacy

  • Generic non-descriptives

  • High degree of randomness

Cookie

  • Cookie/website language mismatches

  • The same session cookie used by multiple sources

  • Unexpected cookie obfuscation

  • Sensitive information leakage

How can I manipulate data to see it?

The technique used to manipulate data will vary based on the field and anomaly you're looking at.

Mime Type

  • Search/Aggregate by unique file name + Apply reductions to exclude normal

  • Executable files

  • Archive files

  • Suspiciously named files

  • Unrecognized file extensions

  • Requests for files with no referrer

  • Aggregate file name by count

  • Multiple requests for the same file

Domain

  • Search/Aggregate by unique domain + Apply reductions to exclude normal

  • Mirroring legitimacy

  • Generic non-descriptives

  • Search + Apply entropy calculation

  • High degree of randomness

Cookie

  • Search responses by content language for individual country + aggregate unique cookies + examine individually

  • Cookie/website language mismatches

  • Aggregate by unique src ip + Aggregate by unique cookies

  • The same session cookie used by multiple sources

  • Aggregate by unique cookies + apply entropy calculation

  • Unexpected cookie obfuscation

  • Aggregate by unique cookies + examine individually / Search for unique strings

  • Sensitive information leakage

Does anything in VPN authentication logs look malicious?

Your goal is to use the data-based hunting method to find anomalous VPN authentication logs.

References:

This lab also includes friendly intelligence in the form of a user list.

EXECUTIVES:

  • mark.higgins

  • carmen.hart

  • nancy.bower

  • derek.ellis

  • sheila.scerra

  • katelyn.perez

  • dolores.pepin

  • william.thacker

  • douglas.hubbs

  • tony.fann

IT ADMINS:

  • robert.daugherty

  • mark.moskovitz

  • dominic.elias

  • carlos.snellings

  • janet.morgan

ACCOUNTING:

  • leanna.tung

  • janice.turman

  • russell.martin

  • loyd.emily

  • jenifer.king

SALES:

  • jasmine.miller

  • pam.mcfarland

  • faye.rene

  • dorothy.gray

  • peggy.fondren

  • anita.howard

  • daniel.elmore

  • heather.folks

  • nicole.avella

  • eva.williamson

DEVELOPERS:

  • joe.jackson

  • william.osher

  • kathy.ayala

  • patricia.garcia

  • george.gertsch

  • sarah.jenkins

  • james.sedotal

  • cindy.wilson

  • erin.cheeks

  • anthony.gieger

  • anthony.jones

  • fannie.lewis

  • joseph.bell

  • david.bryan

  • timothy.sirles

  • james.sainz

  • leslie.galbraith

  • philip.fitzgibbon

  • rachael.hall

  • melissa.mcguire

Executive and sales users spend quite a bit of time traveling, but most everyone uses the VPN for remote access.

Accessing the Lab Data

  • All data for this exercise is already loaded on to the student VM in the lab6 index.

  • To query the data, open Kibana using the desktop icon. Click the Discover tab and search for _index:lab6. This will return all the lab data.

  • Any search or data transformation you perform must include the string _index:lab6 to limit your result set exclusively to this lab.

  • To ensure you see all the data, make sure the time range selection at the top right of the screen is always set to Last 5 Years. If you'd like to get more specific, all the data for this lab is timestamped from August through December 2018.

Initial Research

First, start out by doing some basic research on how attackers leverage VPN access as part of their compromise. These attacks span multiple categories including:

  • Attacking flaws in the VPN appliance itself

  • Using the VPN to access to internal network with stolen credentials to a legitimate account

  • Using the VPN to access the internal network with credentials belonging to an attacker-created account established at some other point in the compromise.

What fields are most likely to contain evidence of attacks?

VPN logs don't provide much context, but they do tell us what user authenticated to the VPN, and where they authenticated from. We can treat this like we would treat most authentication logs, with the caveat that this authentication mechanism can be accessed from the outside world.

Here's a short list of considerations to limit your scope:

  1. EventTime

  2. username

  3. source_ip

  4. source_country

  5. source_state

What would be anomalous in these fields?

Here are some ideas for things to look for in relation to each field.

EventTime

  • Logins at unexpected times for a specific user

  • Logins from multiple distant locations within a short time window

  • Unexpected input formatting

username

  • Unknown or illegitimate usernames

  • Improperly formatted usernames

  • Unexpected input formatting

source_ip

  • Large or unexpected number of source IPs for a user

  • Unexpected input formatting

source_country

  • Large or unexpected number of source countries for a user

  • Countries that a user doesn't travel to or work from

  • Unexpected input formatting

source_state

  • Large or unexpected number of source states

  • States that a user doesn't travel to or work from

  • Unexpected input formatting

How can I manipulate data to see it?

The technique used to manipulate data will vary based on the field and anomaly you're looking at.

EventTime

  • Search logins per user and compare to normal work hours

  • Search logins per user and identify cases where two logins from distant locations occur within a short time window.

  • Aggregate by timestamp value (reducing precision) and sort by LFO.

username

  • Search/aggregate by unique username. Compare to friendly intelligence.

  • Search/aggregate by unique username. Sort by LFO. Look for improper formatting.

source_ip

  • Search/aggregate unique source IPs by user. Compare to friendly intelligence.

source_country

  • Search/aggregate unique countries by user. Compare to friendly intelligence.

source_state

  • Search/aggregate unique states by user. Compare to friendly intelligence.

Last updated