What if I want to see Network Logs on Azure? - Part1

Hello everyone!

in this article, I propose a journey in the Azure Network observability. After choosing what to do in terms of Azure Network topology, at one point, we have to consider operations, which means we need to see which flows go through which firewalls. As most people using Azure probably know, those firewalls can take different forms and the associated visibility will depends of the firewall nature. So, we will start by a rapid review of the possiblility for the Network filtering in Azure and we will have a look at the different logs (because visibility is almost always related to logs) available for the Azure Network services through different scenarios.

Let’s get started!

1. Review of Azure Network filtering options

Depending on the objective, we may use differents solutions for Network filtering in Azure:

locally to a virtual network, we usually rely on Network Security Groups. If you’re familiar with this service, you probably know that it’s perfect to secure flows between workload inside Azure subnet in a granular way, as long as the requirement is on the layer 4.
In a Hub & Spoke scenario, with virtual Network or Virtual WAN and Virtual Hub, we rely on a central Firewall that can be used to filter interspoke traffic, egress traffic to Internet, and in some scenario (but with additional services) Internet exposure. This Hub Firewall can be an Azure Firewall, or a 3rd party appliance. However, We’ll focus on Azure Firewall here, because we are interested in Azure Native observability options

I mentionned additional services for Internet exposure, including WAF related solutions, but we will focus mainly on NSGs and Azure Firewall. IMHO, WAF is a full topic on its own so I’d prefer addressing that in a dedicated article.

Last, we also have some Security control available with Azure Network Manager, but, as for WAF, I think it deserves its own article ^^.

Ok let’s get to the heart of the topic.

2. Getting visibility for flows going through NSGs - Basics

NSGs are L4 statefull Firewall, as we said already, perfect for VNet local filtering. We can use Network watcher to evaluate the flows but it does not show information on real packet going through the NSG. For that we need to rely on logs, which are NOT configured by default.

As for most of Aure managed services, we have Azure Resources Logs available for NSG, that we can set in the Diagnostic Settings section of the resource

As displayed on the picture, there are 2 available categories:

Network Security Group Event
Network Security Group Rule Counter

To be able to look at those logs, we need to send the logs in a sink that allows querying. There are many options, but the native one (and unfortunately quite expensive also) is to use a log analytics workspace. It quite easy to do it from the portal, and doing it through terraform would look like this:

resource "azurerm_monitor_diagnostic_setting" "NsgDiagSettings" {
  for_each                   = local.Subnets
  name                       = local.Subnets[each.key].Nsg.DiagSettingsName #format("%s-%s", "diag", azurerm_network_security_group.Nsgs[each.key].name)
  storage_account_id         = local.StaLogId
  log_analytics_workspace_id = local.LawLogId
  target_resource_id         = azurerm_network_security_group.Nsgs[each.key].id

  dynamic "enabled_log" {
    for_each = data.azurerm_monitor_diagnostic_categories.Nsg[each.key].log_category_types
    content {
      category = enabled_log.value

    }
  }

  dynamic "metric" {
    for_each = data.azurerm_monitor_diagnostic_categories.Nsg[each.key].metrics
    content {
      category = metric.value

    }
  }
}

Notice the reference to the data source for the log categories, which is instrumented this way, and allows us to automatically get all the log categories. On the other hand, it implies a re-evaluation at each plan / apply.

data "azurerm_monitor_diagnostic_categories" "Nsg" {

  for_each    = local.Subnets
  resource_id = azurerm_network_security_group.Nsgs[each.key].id
}

There is also an Azure policy available for configuring NSG that would not have a diagnostic settings. Details on this policy can be found on the fantastic azadvertizer.net corresponding page.

Now looking at the logs, the category Network Security Group Rule Counter gives mostly basic informations on network flows, while Network Security Group Rule Counter

Let’s have a look. As we’ve seen earlier, logs are available from a log analytocs workspace, so we need to have access to one. However, because the diagnostic settings are resources associated directly to the resources, it is possible the query logs from either the workspace , or directly from the resource:

On the log analytics workspace, the following request, executed from the resource, will allows us to see all NSG related event.

AzureDiagnostics
| where TimeGenerated >= ago(24h)
| summarize count() by Category

However, if we run this request directly from the workspace, we need to specify the scope if we want to see only NSG related logs. In this case the query becomes this:

AzureDiagnostics
| where TimeGenerated >= ago(24h)
| where ResourceId == "/SUBSCRIPTIONS/00000000-0000-0000-0000-000000000000/RESOURCEGROUPS/<rGName>/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/<nsgName>"
| summarize count() by Category

Note: The query may not return any result if the resource id is not written in full capital, e.g /SUBSCRIPTIONS/00000000-0000-0000-0000-000000000000/RESOURCEGROUPS/<rGName>/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/<nsgName>

If we want to only limit ourselves to the NSGs, in general, we can use this query:

AzureDiagnostics
| where TimeGenerated >= ago(24h)
| where ResourceId contains "/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/"
| summarize count() by Category

3. Statistics on rules involved in flows

With the category Network Security Group Rule Counter, we can get basic informations such as the rules involved in network flows:

AzureDiagnostics
| where TimeGenerated >= ago(24h)
| where Category == "NetworkSecurityGroupRuleCounter"
| where ResourceId == "/SUBSCRIPTIONS/00000000-0000-0000-0000-000000000000/RESOURCEGROUPS/<rGName>/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/<nsgName>"
| project TimeGenerated,ruleName_s,type_s,direction_s,SourceSystem,primaryIPv4Address_s

We’ll note the difference displayed in the logs between default nsg rules, prefixed with DefaultRule_ and custom rules, prefiexed with UserRule_

Let’s add the summarize and render verbs, and create a diagram:

AzureDiagnostics
| where TimeGenerated >= ago(24h)
| where Category == "NetworkSecurityGroupRuleCounter"
| where ResourceId == "/SUBSCRIPTIONS/00000000-0000-0000-0000-000000000000/RESOURCEGROUPS/<rGName>/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/<nsgName>"
| project TimeGenerated,ruleName_s,type_s,direction_s,SourceSystem,primaryIPv4Address_s
| summarize count() by ruleName_s, direction_s
| render barchart 

AzureDiagnostics
| where TimeGenerated >= ago(24h)
| where Category == "NetworkSecurityGroupRuleCounter"
| where ResourceId contains "PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/"
| project TimeGenerated,ruleName_s,type_s,direction_s,SourceSystem,primaryIPv4Address_s
| summarize count() by ruleName_s, direction_s
| render barchart 

So that’s not bad, but not enough for Network troubleshhoting, if we need any, so let’s move forward and start displaying information in the flows

4. Display informations on flows

Using the category Network Security Group Event, we can get additional informations and filter traffic depending on source or destination IP. However, here, we can only use conditions_sourceIP_s and conditions_destinationIP_s to use as filter on the IPwhich display informaiton on the ip that was evaluated in the rule. But we’ll see that there are limits. The parameter conditions_destinationPortRange_s gives information on the port that the flows are trying to reach. The parameter primaryIPv4Address_s can be useful to display information on the targeted IP.

The below query display the ingress flows blocked on a chosen NSG. In this case there is only 1 VM associated with this NSG.

AzureDiagnostics
| where TimeGenerated >= ago(24h)
| where Category == "NetworkSecurityGroupEvent"
| where ResourceId == "/SUBSCRIPTIONS/00000000-0000-0000-0000-000000000000/RESOURCEGROUPS/<rGName>/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/<nsgName>"
| where direction_s == "In"
| where type_s == "block"
| project TimeGenerated,ruleName_s,type_s,direction_s,SourceSystem,primaryIPv4Address_s, conditions_sourceIP_s, conditions_destinationIP_s, conditions_destinationPortRange_s, ResourceType, ResourceId

On the results displayed, we can see interesting informations on the way the logs are displayed.

first, the parameter primaryIPv4Address_s gives, in this case, only information on the target IP
second, the source IP is not well identified here, an we can either see conditions_sourceIP_s equals to 0.0.0.0/0,0.0.0.0/0 or not defined. It completely depends on the nature of the rule here. If the rule specifies an IP in the source, then, this IP is displayed in conditions_sourceIP_s. In the example, the rule is the default rule DefaultRule_DenyAllInBound, which is coded as below in the portal:

priority	name	direction	access	protocol	source Port Range(s)	destination Port Range(s)	source Address Prefix(es)	destination Address Prefix(es)
65500	DenyAllInBound	Inbound	Deny	*	*	*	*	*

Because the source IP defined is *, we get 0.0.0.0/0,0.0.0.0/0 in the logs.

Let’s see with the allowed ingress flows now. For that we’ll change the value of the parameter type_s to allow. We’ll also switch from a specific resource id to a more broad research to get information on more than one NSG:

AzureDiagnostics
| where TimeGenerated >= ago(24h)
| where Category == "NetworkSecurityGroupEvent"
| where ResourceId contains "PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/"
| where direction_s == "In"
| where type_s == "allow"
| project TimeGenerated,ruleName_s,type_s,direction_s,SourceSystem,primaryIPv4Address_s, conditions_sourceIP_s, conditions_destinationIP_s, conditions_destinationPortRange_s, ResourceType, ResourceId

As we can see, we have on one entry a reference to a public IP that is allowed for SSH. The associated rule is as below, and we can see that the IP is specifically mentionned in the rule and thus visible in the log.

Note: In this rule, WebServer refers to an Application Security Group.

priority	name	direction	access	protocol	source Port Range(s)	destination Port Range(s)	source Address Prefix(es)	destination Address Prefix(es)
100	AllowMyIpAddressSSHInbound	Inbound	Allow	TCP	*	22	`My_Public_Ip`	WebServer

The other entry is on another NSG and we can only see that the traffic is allowed by the default rule DefaultRule_AllowVnetInBound. However, because there is no reference in the default rules to spceific IP, we do not get information in the conditions_sourceIP_s.

To get egress flow blocked, we just have to change the direction_s parameter value from In to Out

The below query display the egress flows blocked:

AzureDiagnostics
| where TimeGenerated >= ago(24h)
| where Category == "NetworkSecurityGroupEvent"
| where ResourceId == "/SUBSCRIPTIONS/00000000-0000-0000-0000-000000000000/RESOURCEGROUPS/<rGName>/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/<nsgName>"
| where direction_s == "Out"
| where type_s == "block"
| project TimeGenerated,ruleName_s,type_s,direction_s,SourceSystem,primaryIPv4Address_s, conditions_sourceIP_s, conditions_destinationIP_s, conditions_destinationPortRange_s, ResourceType, ResourceId

The below query will get the Egress flow allowed

AzureDiagnostics
| where TimeGenerated >= ago(24h)
| where Category == "NetworkSecurityGroupEvent"
| where ResourceId == "/SUBSCRIPTIONS/00000000-0000-0000-0000-000000000000/RESOURCEGROUPS/<rGName>/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/<nsgName>"
| where direction_s == "Out"
| where type_s == "allow"
| project TimeGenerated,ruleName_s,type_s,direction_s,SourceSystem,primaryIPv4Address_s, conditions_sourceIP_s, conditions_destinationIP_s, conditions_destinationPortRange_s, ResourceType, ResourceId

An interesting fact: while earlier, the primaryIPv4Address_s was refering to the target IP, this time, it’s the source IP. It does make sense, considering that the NSG is a distributed firewall associated to the NIC.

This means that investigating flows with the diagnostic settings logs should be done considering the direction of the flows first. Then It’s possible to look flows on a specific machine/IP.

For example, we can use this query to identify all the allowed flows in our logs:

AzureDiagnostics
| where TimeGenerated >= ago(4d)
| where direction_s == "In"
| where type_s == "allow"
| project TimeGenerated,ruleName_s,type_s,direction_s,SourceSystem,primaryIPv4Address_s, conditions_sourceIP_s, conditions_destinationIP_s, conditions_destinationPortRange_s, ResourceType, ResourceId
//| summarize count() by primaryIPv4Address_s, ruleName_s
//| render barchart 

If we add the lines | summarize count() by primaryIPv4Address_s, ruleName_s and | render barchart , by removing the // in front, we get a nice summarizing chart

If we just change the parameter type_s value to block, we get the denied flows.

But we do not really get a full picture of the flows, as we only get either the Azure destination IP, or the Azure Source IP. Ok, let’s wrap up

5. Summary

So what have we seen? First, the Azure resource logs on the NSGs can give us some information on flows going through NSG. With some simple queries, we get statistics on the rules responsible for allowed or denied flows. Second, we get different informations depending of the rules. A rule with a specific IP tagged in it wil gives us more details than a generic rule with a wide range (in source ou destination address). Third, well, it’s a corollary of the second point, while we can get information from the NSG diagnostic settings, this is far from complete. We cannot identify a source IP that try to reach a workload associated to the NSG, nor can we identify which IP is trying to reach an Azure workload if it’s not clearly specified in the rule. So this is far from complete, an dnot totally helpfull for troubleshooting.

We’ll stop here for today, but just so you know, we do have means for network flows troubleshooting, and the NSG flow logs to get more visibility on those flows.

That’s all for today! See you soon ^^