Lead Image © Qi Feng, 123RF.com

Lead Image © Qi Feng, 123RF.com

NetFlow reporting with Google Analytics

Traffic Analysis

Article from ADMIN 27/2015
By
The free Google Analytics is a convenient way to analyze website usage; but, with a few minor modifications, the service can also be used for simple evaluations of any data traffic on the company network. We show you how to use Google Analytics to capture and analyze NetFlow data.

Cisco IOS NetFlow [1] collects IP traffic statistics at network interfaces, providing a valuable source of information to system administrators who want to gain in-depth insights into the activities of their enterprise network. Routers and Layer 3 switches that support NetFlow collect client connection information and send it to a central server at irregular intervals. Since the introduction of NetFlow by Cisco, other major network hardware vendors have followed suit and implemented proprietary versions or the RFC-based version [2]. The basic principle is the same.

NetFlow

A NetFlow package [4] includes up to 30 one-way connection entries (depending on the version and package size). For example, each entry from version 5 includes:

  • Source and destination IPv4 addresses
  • Source/destination port numbers
  • IP protocol (e.g., TCP, UDP, or ICMP)
  • Incoming and outgoing router interfaces
  • Number of transported bytes and packages
  • Start and end of the connection
  • Type of service (priority bits)

Newer implementations with NetFlow version 9 offer additional information about Multicast, IPv6, BGP (Border Gateway Protocol), and MPLS (Multiprotocol Label Switching). The package's information content can be freely chosen so that no empty fields or uninteresting entries are sent.

When using NetFlow in a professional environment, you are given the choice between a commercial NetFlow analyzer with many features or an open source implementation at zero cost. In this article, I describe a new, third variant: analysis of traffic data from the cloud. A NetFlow collector local to the company collects all the information and sends it (or just random samples) to Google Analytics for further storage and evaluation (see the "Google Analytics" box).

Google Analytics

Google Analytics is a web analysis program that has little in common with NetFlow. Information about visitor behavior on web pages is collected and evaluated. Google Analytics focuses on effectiveness metrics, sales figures, website optimization, and monitoring the success of marketing campaigns. An estimated 50 to 60 percent of all websites use Google Analytics.

Google Analytics offers many approaches to visualizing the flood of information clearly in the form of dashboards and custom reports. In addition to the usual hit lists of the most frequented servers, it can display or discover unwanted protocols (e.g., SIP, OpenVPN, POP3) and tell you which client generates the most Internet traffic. Questions like "Which Windows file servers are being used, and which machines offer unauthorized shares?" or "Which client is accessing the firewall's management interface?" can also be answered with the available reports (see the "Tolerance by Google" box).

Tolerance by Google

Google Analytics does not in any way limit its use on web pages or web services. Examples of use provided in the official documentation count the installations of an iPhone/Android app, identify in-app purchases, and log time metrics. The training materials call these web-free applications a "digital environment" or "offline business data." A quote from the Analytics Academy, Lesson 1.2, says: "You can even use Google Analytics in really creative ways to collect 'offline' business data, like purchases that happen in your retail stores, as long as you have an accurate way of collecting and sending that data to your Analytics account" [3].

Providing Information

Routers, multilayer switches, firewalls, and virtual environments (hypervisor, vSwitch) supply information about IP connections, and all major manufacturers provide a way to export this information. Protocols such as NetFlow (Cisco), J-Flow (Juniper), or the standardized variants sFlow and IPFIX are also available.

Existing routers usually provide NetFlow functionality without additional costs, and the configuration is very simple. Preference is given to routers with NetFlow that are located close to the collector and have capacity to spare. When selecting the interfaces, you must ensure that network traffic is not counted twice (incoming for router A and outgoing for router B). The device configuration typically already supports initial filtering, so that uninteresting or safety-critical traffic data are ignored.

NetFlow offerings are thin on the ground in the SOHO area, but with a little luck, you might have a router with DD-WRT or pfSense. Unfortunately, the popular DSL routers from AVM do not support NetFlow.

What happens if your own routers do not offer a flow export? In this case, you can use a workaround in which a Linux computer receives a copy of all network packets via a mirror port and creates a NetFlow export from it. Suitable open source software for this includes, for example, the iptables module ipt_netflow or the pmacct and softflowd programs.

Collecting Traffic Data

As soon as the first router is configured as a NetFlow exporter, it sends information about terminated (or timed out) connections to the specified IP address at irregular intervals. The NetFlow collector, which is a Linux service that listens on UDP port 2055, resides behind this IP address. The connection information (see the "NetFlow" box) is taken from the received NetFlow samples and stored briefly on the local hard disk.

The open source nfdump [5] tool does this job on an existing Linux server or on a lean virtual machine (VM). A CPU core, 256MB of RAM, and a 2GB hard drive are sufficient for the VM. You can install on CentOS, Fedora, or Red Hat systems with the Yum package manager. The nfdump package is available from the EPEL repository:

$ yum install nfdump

Before starting, expand the local firewall to include a rule for incoming packages on port 2055 (SELinux requires no adjustment):

$ iptables -I INPUT -p udp -m state \
               --state NEW -m udp \
               --dport 2055 -j ACCEPT
$ ipt6ables -I INPUT -p udp -m state
            --state NEW -m udp \
            --dport 2055 -j ACCEPT

The collector is launched using

$ nfcapd -E -T all -p 2055 -l /tmp -I any

to test the installation. The first NetFlow data should be visible in the Linux console after a short time. (See the "NetFlow Configuration" box.)

NetFlow Configuration

Example of a configuration for a Cisco router 1921 with IOS 15.2.

interface GigabitEthernet0/1
ip flow ingress
ip flow-export version 5
ip flow-export destination 10.10.1.1 2055

Example of a configuration for HP 9300 series.

interface Ethernet 1/1
ip route-cache flow
ip flow-export enable
ip flow-export version 5
ip flow-export destination 10.10.1.1 2055 1

Preparing Google Analytics

You must have a Google account, which you can upgrade to include the Analytics service so you can use Google Analytics (GA) [6]. Cautious users might want to check the GA conditions against their own company policy beforehand. Next, create an account and a property within GA. Google then announces the tracking ID (e.g., UA-12345678-1). This is entered in the script flow-ga.pl (see next section) and connects the reported NetFlow data with the Google account.

The property still needs custom definitions that represent the field names of NetFlow and are applied manually. The order and spelling is important. The definitions are associated with the Hit scope (as opposed to the Product, Session, or User scopes). These include custom dimensions:

1. srcaddr

2. dstaddr

3. srcport

4. dstport

5. protocol

6. exporter_id

7. input_if

8. output_if

9. tos

and custom metrics:

1. bytes (integer)

2. packets (integer)

3. duration_sec (time)

4. duration_msec (integer)

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Improved visibility on the network
    OpenNMS collects and visualizes flows so you can discover which network devices communicate with each other and the volume of data transferred.
  • DDoS protection in the cloud
    OpenFlow and other software-defined networking controllers can discover and combat DDoS attacks, even from within your own network.
  • Virtual switching with Open vSwitch
    Virtualization with Vmware, KVM, and Xen is here to stay. But up to now, no virtual switch has supported complex scenarios. Open vSwitch supports flows, VLANS, trunking, and port aggregation just like major league switches.
  • Building a defense against DDoS attacks
    Targeted attacks such as distributed denial of service, with thousands of computers attacking your servers until one of them caves in, cannot be prevented, but they can be effectively mitigated.
  • Detecting security threats with Apache Spot
    Security vulnerabilities often remain unknown when the data they reveal is buried in the depths of logfiles. Apache Spot uses big data and machine learning technologies to sniff out known and unknown IT security threats.
comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=