3.Advanced Filtering Techniques in KQL

Welcome to my deep dive into the world of KQL, where I push beyond the basics to unlock the true power of data analytics. In the previous blogs, I have covered foundational elements and basic querying techniques, but now it's time to elevate our skills to the next level.

Agenda:

Introduction

What is Advanced Filtering in KQL

What are the benefits

My recommendations

Conclusion

Introduction

In this third blog post of our series, I will explore advanced filtering techniques in KQL, focusing on essential commands such as `where`, `join`, `extend`, and `parse`. As I delve deeper into these sophisticated commands, I will uncover how they can be utilized to refine and manipulate data more effectively. These techniques are invaluable for analyzing large datasets efficiently and extracting meaningful insights that drive better decision-making in security operations. Mastering these advanced filtering techniques not only enhances your data analysis capabilities but also enables you to perform more granular and complex queries, ultimately empowering you to leverage the full potential of KQL. By the end of this post, you'll have a robust toolkit for data analysis that can significantly improve your ability to monitor and secure systems within your organization.

What is Advanced Filtering in KQL?

Advanced filtering in KQL involves using sophisticated commands to refine and manipulate data more effectively. These commands enable users to filter, combine, and transform datasets to meet specific analytical needs. The primary filtering commands I'll discuss—`where`, `join`, `extend`, and `parse`—each serve distinct functions that, when used together, provide a powerful toolkit for data analysis.

These commands are not only foundational to KQL but also essential for performing refined data analysis. By understanding and utilizing them effectively, users can gain deeper insights and more accurate results from their data queries.

The `where` command is often the starting point for data filtering in KQL. By specifying conditions, users can narrow down large datasets to focus on relevant records. This command is indispensable in the initial stages of data analysis, allowing for quick and efficient filtering based on various criteria.

The `join` command, on the other hand, is crucial for combining data from multiple sources. This command enables users to merge tables based on common columns, facilitating comprehensive analysis by correlating different datasets. Whether you are integrating logs from various systems or combining user data with event data, the `join` command can help create a unified view of your information.

Next, I have the `extend` command, which allows users to add new columns to a dataset. This is particularly useful for creating calculated fields or adding additional data points that are derived from existing columns. By using `extend`, you can enhance your datasets with custom metrics, making your analysis more robust and tailored to your specific needs.

Lastly, the `parse` command is essential for extracting structured data from unstructured text. This command is highly useful when dealing with logs or other text-based data sources that need to be parsed into meaningful fields. With `parse`, you can transform messy text data into a structured format that is easier to analyze and visualize.

Where

The `where` command is used to filter rows based on a specified condition. It is one of the most fundamental and frequently used commands in KQL.

The `where` command is integral to KQL's filtering capabilities, allowing users to isolate data that meets specific criteria. By setting conditions within the `where` clause, you can target precise subsets of data, thus excluding irrelevant information and focusing on what truly matters. This command supports a variety of operators such as `==`, `!=`, `>`, ` =`, `<=`, `in`, and `contains`, which can be combined with logical operators like `and`, `or`, and `not` to build complex and refined filters.

For instance, you can use the `where` command to filter events based on date ranges, specific values in a column, or even patterns within text fields. This flexibility is particularly useful in scenarios such as security monitoring, where pinpointing specific events or anomalies within large datasets is crucial. The ability to chain multiple `where` conditions together further enhances its utility, enabling more granular control over the data selection process.

Moreover, the `where` command can be used in conjunction with other KQL commands to create powerful and efficient queries. By starting with a broad dataset and progressively narrowing it down through successive `where` clauses, you can iteratively refine your search results to obtain highly specific insights.

Example:

Filtering by Event Type

This query retrieves successful sign-ins from the SigninLogs table over the past hour.

SigninLogs
| where TimeGenerated > ago(1h)
| where ResultType == "0"
| project UserPrincipalName, IPAddress, AppDisplayName, Location

Filters SigninLogs records to only include successful sign-ins (ResultType == "0") from the past hour.

Projects (selects) specific columns to make the output more readable: UserPrincipalName, IPAddress, AppDisplayName, and Location.

Searching for Suspicious Login Activity

This query finds suspicious sign-ins where a user logged in from two or more different locations in the last 24 hours. This might indicate an account compromise.

SigninLogs
| where TimeGenerated > ago(24h)
| summarize distinct_locations = dcount(Location) by UserPrincipalName, bin(TimeGenerated, 1h)
| where distinct_locations >= 2
| project UserPrincipalName, distinct_locations, TimeGenerated

Filters sign-in events from the past 24 hours.

Groups records by UserPrincipalName and counts distinct locations (dcount(Location)) within each hour.

Returns only records where the user signed in from two or more different locations, suggesting suspicious activity.

Using Multiple `where` Clauses

Queries can become more precise by incorporating multiple `where` clauses, which help to narrow down the results to specific criteria. This technique is highly effective when filtering vast datasets to find the exact information needed. Each `where` clause acts as an additional filter, refining the search to include only the most relevant records.

For example, consider a scenario where you want to find all security alerts for a particular user that occurred within the last 24 hours and have a severity level of "High." You could construct the query as follows:

SecurityAlert
| where TimeGenerated > ago(24h)
| where Severity == "High"
| where AccountName == "John Doe"

This query filters the data to show only "High" severity alerts for "John Doe" within the specified time frame, significantly narrowing down the results to the most critical information.

Join

The `join` command combines rows from two tables based on a related column. This is particularly useful for correlating data from different sources.

The `join` function is a powerful tool for combining data from multiple tables into a single, cohesive result set. By aligning rows based on a related column, it allows you to merge datasets and draw insights that are not immediately apparent when the tables are viewed in isolation.

There are several types of joins you can use, each serving different needs:

- Inner Join: Returns only the rows where there is a match in both tables.

- Left Outer Join: Returns all rows from the left table, and the matched rows from the right table. If there is no match, the result is NULL on the side of the right table.

- Right Outer Join: Returns all rows from the right table, and the matched rows from the left table. If there is no match, the result is NULL on the side of the left table.

- Full Outer Join: Returns all rows when there is a match in one of the tables. This means it returns NULLs when there is no match.

The syntax for a basic join operation involves specifying the kind of join and the condition on which the join is based.

This query helps security teams track devices running unsupported software that may require updates or replacement to mitigate security risks:

1:

This KQL query helps security teams identify devices with critical vulnerabilities that are exploitable, specifically within the last 7 days, and belonging to a certain machine group (e.g., "DK"). It checks if these vulnerabilities are actively being exploited and tracks which devices are at risk, helping prioritize remediation efforts.

DeviceInfo
| where Timestamp >= ago(7d)
| where MachineGroup contains "DK"
| where isnotempty(DeviceName)
| join DeviceTvmSoftwareInventory on DeviceName
| where EndOfSupportStatus contains "EOS"
| project OSPlatform, DeviceName, SoftwareVendor, SoftwareName, SoftwareVersion, EndOfSupportStatus, EndOfSupportDate

2:

The query demonstrates the use of various join operations to merge datasets based on specific conditions. It begins by identifying devices with critical software vulnerabilities and joins this information with a knowledge base of vulnerabilities to check if exploits are available. This combined data is then joined with device information, filtering for recent data, and further narrowed down to a specific machine group. 

DeviceTvmSoftwareVulnerabilities
| where VulnerabilitySeverityLevel == "Critical"  
| join kind=inner (DeviceTvmSoftwareVulnerabilitiesKB) on CveId
| where IsExploitAvailable == "1"  
| join kind=inner (DeviceInfo) on DeviceId
| where Timestamp >= ago(7d)
|where MachineGroup contains "DK"

3:

The query helps identify devices that may be at risk because they have not had a successful antivirus scan in the last 14 days. This can be a sign that the device may be vulnerable to malware or other security threats. By focusing on onboarded devices, the query ensures that only devices actively managed by the security system are included. This can help security teams prioritize remediation efforts for devices that have not been properly scanned for threats within the designated time frame.

let Timerange = 14d;
DeviceInfo
| where OnboardingStatus == "Onboarded"  // Only onboarded devices
| where isnotempty(OSVersion)  // Devices with OSVersion data
| where Timestamp > ago(Timerange)  // Only recent data from the last 14 days
| summarize LastSeen = arg_max(Timestamp, *) by DeviceId  // Get the latest timestamp for each device
| join kind=leftanti (
   DeviceEvents
   | where ActionType == "AntivirusScanCompleted"  // Focus on AV scan completed events
   | where Timestamp > ago(Timerange)  // Only consider scans in the last 14 days
   | summarize LastSuccessfulAVScan = max(Timestamp) by DeviceId  // Find the most recent successful scan per device
) on DeviceId  // Join on DeviceId to match with devices without a recent scan
| project DeviceId, DeviceName, OSPlatform, OSVersion, LastSeen  // Project relevant fields
| sort by OSPlatform, DeviceName  // Sort by OSPlatform and DeviceName

Extend

The `extend` command adds new columns to the dataset, which are computed from existing columns. It allows for enhanced data manipulation and enrichment.

The `extend` command is an invaluable asset in the realm of data analytics, offering the ability to generate new dimensions of data directly from your existing dataset. When you need to add calculated fields or derived attributes, `extend` steps in to dynamically enhance your data, enabling deeper and more nuanced analysis.

For instance, if you have a dataset of user activities with timestamps, you might want to extract the date and hour of each activity to analyze login patterns more effectively. Using `extend`, you can create new columns that break down the timestamp into more granular components, such as the date and hour.

Here are some examples that illustrate the use of the `extend` command:

This query is used to analyze sign-in logs and get a count of different conditional access policies and their results.  (This is one of Steven Lims queries form his GitHub repository)

SigninLogs
| where TimeGenerated > ago(30d)
| where ConditionalAccessPolicies != "[]"
| mv-expand ConditionalAccessPolicies
| extend CADisplayName = tostring(ConditionalAccessPolicies.displayName)
| extend CAResult = tostring(ConditionalAccessPolicies.result)
| summarize Count=count() by CADisplayName, CAResult
| sort by CADisplayName asc

This query is used to analyze sign-in logs and get a count of different conditional access policies and their results.

It start with the sign-in logs dataset and filter the logs to include only those generated in the last 30 days. It further filter to include only records where conditional access policies are present (i.e., not empty).

Next, it expand the array of conditional access policies so each policy in the array gets its own row. IT creates a new column named `CADisplayName` that contains the display name of each conditional access policy. It also creates another new column named `CAResult` that contains the result of each conditional access policy.

I then group the results by the policy display name and result, and count the number of occurrences for each group. Finally, I sort the results by the policy display name in ascending order.

In summary, this query filters and processes sign-in logs to count and list the results of various conditional access policies applied in the past 30 days, sorted by the policy names.

The above shows how powerful the “extend” functionality is.

Måske du kan finde et eksempel mere?

Parse

The `parse` command is a powerful and versatile tool in Kusto Query Language (KQL) that allows you to extract and create new columns from existing string columns. This is particularly useful when dealing with unstructured or semi-structured data, where important information might be embedded within a string field.

The basic syntax of the `parse` command involves specifying the string column to be parsed and defining a pattern that includes placeholders for the fields you want to extract. These placeholders are then used to create new columns in the result set.

For example, consider a log file where each entry is a single string containing a user identifier, an IP address, and an action. Using the `parse` command, you can extract these pieces of information into separate columns, making the data easier to analyze and query.

I primarily utilize the parse_json functionality when working with the parse command.

Especially the `parse_json` function in KQL is an invaluable tool for handling JSON data. This function allows you to extract and manipulate data from JSON-formatted strings, thereby enhancing your ability to work with complex or nested data structures.

The basic syntax for `parse_json` involves passing a string expression that contains JSON data. The function then parses this string and returns a dynamic object, enabling you to easily access and query the individual elements within the JSON structure. This is particularly useful in scenarios where data is stored in a semi-structured format, such as logs, configuration files, or API responses.

For example, consider a dataset where each record is a JSON string containing details about location. Using the `parse_json` function, you can convert these JSON strings into dynamic objects and extract specific fields for further analysis:

SigninLogs
| extend LocationDetails = parse_json(LocationDetails)
| project 
    city = tostring(LocationDetails.city), 
    state = tostring(LocationDetails.state), 
    countryOrRegion = tostring(LocationDetails.countryOrRegion)

This query is used to extract specific fields from a JSON string in the SigninLogs dataset.

SigninLogs: This part of the query specifies the dataset I are working with, which is the SigninLogs.

| extend LocationDetails = parse_json(LocationDetails): The `extend` command is used to create a new column or modify an existing one. Here, it is creating or modifying the column LocationDetails by parsing the JSON string in LocationDetails and converting it into a dynamic object.

| project: The `project` command is used to select the columns I want to keep in the output.

city = tostring(LocationDetails.city): This line extracts the 'city' field from the parsed JSON object and converts it into a string, naming the column 'city'.

state = tostring(LocationDetails.state): Similarly, this line extracts the 'state' field and converts it into a string, naming the column 'state'.

countryOrRegion = tostring(LocationDetails.countryOrRegion): Finally, this line extracts the 'countryOrRegion' field from the JSON object and converts it into a string, naming the column 'countryOrRegion'.

In summary, this query takes a JSON string from the LocationDetails column in the SigninLogs dataset, converts it into a dynamic object, and then extracts specific fields (city, state, and countryOrRegion) as separate columns for further analysis.

What Are the Benefits of Advanced Filtering Techniques?

Advanced filtering techniques in KQL offer several significant benefits:

Precision: By utilizing advanced filtering commands, you can narrow down your dataset with remarkable precision, enabling you to focus exclusively on the most pertinent data points. This ensures that your analysis is both accurate and relevant, leading to more informed decision-making.

Efficiency: These techniques allow you to efficiently filter and process large datasets, significantly reducing the query execution time and resource consumption. This efficiency is crucial for handling vast amounts of data quickly and effectively, making your workflows smoother and more productive.

Correlated Insights: Advanced filtering enables you to combine data from multiple sources, providing a comprehensive view and uncovering patterns that might be missed when analyzing isolated data sets. This holistic approach to data analysis helps in identifying trends and correlations that can offer valuable insights.

Data Enrichment: By adding computed columns or extracting specific fields, you can enrich your datasets, making them more informative and actionable. This enrichment process enhances the value of your data, allowing for more detailed and granular analysis.

Complex Analysis: These techniques facilitate sophisticated analyses that require multi-step data transformations and filtering. This capability is essential for conducting in-depth and meaningful investigations, allowing you to explore complex relationships and derive significant conclusions from your data.

Overall, advanced filtering techniques in KQL empower you to perform more precise, efficient, and insightful data analyses, ultimately leading to better outcomes and more strategic decisions.

My Recommendations for Best Practices

To effectively utilize advanced filtering techniques in KQL, consider the following my best practices:

Start with Clear Objectives: Define your goals for the queries. Identify the key metrics and events to monitor, which will guide the structure of your queries and the selection of filtering techniques.

Combine Filters Judiciously: Use multiple filtering clauses to progressively refine your dataset. This approach simplifies your queries and makes them more intuitive.

Leverage Joins for Correlation: When correlating data from different tables, use joins mindfully to avoid unintended data omissions or duplications.

Extend for Enrichment: Create new columns that provide additional context or calculated values to make your data more actionable.

Parse for Unstructured Data: Extract meaningful fields from logs or other unstructured data accurately, ensuring that the parse patterns used are precise to prevent data loss.

Optimize Performance: Structure your queries to enhance performance by applying filters early to reduce dataset size before performing complex operations.

Document and Share Queries: Maintain a repository of commonly used advanced queries and share them with your team to promote collaboration and ensure consistency in security operations. Well-documented queries also serve as valuable references for future analysis and troubleshooting.

Conclusion

By now, you should have a comprehensive understanding of advanced filtering techniques in KQL, including the use of `where`, `join`, `extend`, and `parse`. These techniques are crucial for refining and manipulating data to gain deeper insights and drive better decision-making in your security operations. Incorporating these best practices into your use of KQL will enhance your ability to monitor and secure your systems effectively.

In the next post in my series, I will delve deeper into KQL's capabilities and showcase how to leverage its full potential in your security operations. Stay tuned as I continue this journey together, mastering KQL and enhancing my security practices.

I am looking forward to the opportunity to master KQL and enhance security practices together.

Forrige
Forrige

4.KQL Variables to Optimize Your Query

Næste
Næste

2.Time-Based Queries and Functions