Like I already mentioned in my other post about server-side visitors analytics with .NET Core middleware, if you are relying only on Google Analytics, then you are missing a good portion of data about your visitors.

Recently I discovered one more way to analyze visitors data - by using GoAccess tool (and its nice web-reports):

GoAccess dashboard

What’s especially great about this tool is that it analyzes web-server access logs, so it is the most trustworthy and “closest to reality” data about your visitors that you can possibly ever get.

GoAccess vs. your custom analytics

So when would you like to use GoAccess instead of your own custom analytics middleware (that’s about my previous post again)?

Well, while the latter certainly gives you a great flexibility/customization, not always you really need to have that. Plus you are dedicating certain resources of your .NET Core application to process every request, and that might eventually hurt your application/website performance. And aside from processing requests data, you also need to implement web-views/reports and to make them look nice.

The former, on the other hand:

  • takes data from “one level lower” - web-server access logs! It’s as low as it gets
  • doesn’t take anything from your application resources
  • already provides ready-made customizable web-reports with a very nice GUI

So after evaluating GoAccess capabilities I actually decided to drop my custom analytics middleware and to rely on GoAccess instead.

Using GoAccess

Installation/building

You can just download pre-build binaries for your platform, but in that case you won’t have certain features like GeoLocation/GeoIP (matching IP addresses with countries/places). And if you would like to have it, then you’ll need to build GoAccess from sources and provide required configuration options.

First you’ll need to install the following packages:

$ sudo apt install libncursesw5-dev libgeoip-dev libmaxminddb-dev

And then configure and build GoAccess like that (but first read about ignore-referer):

$ wget https://tar.goaccess.io/goaccess-1.4.tar.gz
$ tar -xzvf goaccess-1.4.tar.gz
$ cd goaccess-1.4/
$ ./configure --enable-utf8 --enable-geoip=mmdb
$ make

Here we are enabling better UTF-8 support and GeoIP functionality (with mmdb variant), for which to work you will also need to download the MaxMind GeoLite2 database and put GeoLite2-City.mmdb file to /usr/local/etc/goaccess/.

You can now install it, but be aware that if you are updating a version that you already had, then /usr/local/etc/goaccess/goaccess.conf will be overridden, so back it up first:

$ sudo cp /usr/local/etc/goaccess/goaccess.conf .
$ sudo make install
$ sudo mv ./goaccess.conf /usr/local/etc/goaccess/

Check if the executable works:

$ which goaccess
/usr/local/bin/goaccess

$ goaccess --version
GoAccess - 1.4.
For more details visit: http://goaccess.io
Copyright (C) 2009-2020 by Gerardo Orellana

Build configure arguments:
  --enable-utf8
  --enable-geoip=mmdb

Settings

Setting are stored in /usr/local/etc/goaccess/goaccess.conf (or /etc/goaccess/goaccess.conf, or whichever you might have set on configuration). Here are the modification I made in my config (you can find a more detailed options explanation in the documentation):

# that works with NGINX logs
log-format COMBINED

# which panes to show, which columns to hide, etc
html-prefs # this value is explained in the next section

# ignore requests from these IP addresses (our internal network)
exclude-ip 10.200.0.0-10.200.255.255

# consider requests like /fonts/fontawesome-webfont.woff?v=4.0.3
# to be static files still
all-static-files true

# ignore crawlers requests
ignore-crawlers true

# panels I am not interested in
ignore-panel VISIT_TIMES
ignore-panel REFERRERS
ignore-panel KEYPHRASES
ignore-panel GEO_LOCATION

# ignore spam/trash referrers
ignore-referer anti-crisis-seo.com
ignore-referer *.sel-hoz.com
ignore-referer volcable.ru
ignore-referer *.painting-planet.com
ignore-referer *.xn--74-jlcepmffs7i6a.xn--p1ai
ignore-referer playbox.life
ignore-referer ukrtvory.in.ua
ignore-referer vulkan-klyb.ru
# ...
# https://decovar.dev/database/trash-domains.txt

# sorting in panels
sort-panel REQUESTS,BY_VISITORS,DESC
sort-panel REQUESTS_STATIC,BY_BW,DESC
sort-panel NOT_FOUND,BY_BW,DESC
sort-panel HOSTS,BY_VISITORS,DESC
sort-panel OS,BY_VISITORS,DESC
sort-panel BROWSERS,BY_VISITORS,DESC
sort-panel REFERRING_SITES,BY_VISITORS,DESC
sort-panel STATUS_CODES,BY_DATA,DESC

# which requests should go to static files panel
static-file .css
static-file .js
static-file .jpg
static-file .png
static-file .gif
static-file .ico
static-file .jpeg
static-file .pdf
static-file .csv
static-file .woff
static-file .woff2
static-file .txt
static-file .zip
static-file .mp3
static-file .mp4
static-file .exe
static-file .gz
static-file .rar
static_file .7z
static_file .wasm
# ...
# and so on, that's totally up to you

# absolute path to the MaxMind GeoLite2 database,
# which you should have downloaded earlier
geoip-database /usr/local/etc/goaccess/GeoLite2-City.mmdb

html-prefs

The html-prefs option is a bit special, so I’m listing it separately:

{
    "details": true,
    "layout": "horizontal",
    "perPage": 25,
    "theme": "darkPurple",
    "showTables": true,
    "visitors": {
        "plot": {
            "chartType": "area-spline"
        },
        "columns": {
            "hits": {
                "hide": true
            }
        }
    },
    "hosts": {
        "plot": {
            "chartType": "area-spline"
        },
        "columns": {
            "hits": {
                "hide": true
            }
        }
    },
    "geolocation": {
        "plot": {
            "chartType": "area-spline"
        },
        "columns": {
            "hits": {
                "hide": true
            }
        }
    },
    "requests": {
        "plot": {
            "chartType": "area-spline"
        },
        "columns": {
            "hits": {
                "hide": true
            }
        }
    },
    "os": {
        "plot": {
            "chartType": "area-spline"
        },
        "columns": {
            "hits": {
                "hide": true
            }
        }
    },
    "referring_sites": {
        "plot": {
            "chartType": "area-spline"
        },
        "columns": {
            "hits": {
                "hide": true
            }
        }
    },
    "status_codes": {
        "plot": {
            "chartType": "area-spline"
        },
        "columns": {
            "hits": {
                "hide": true
            }
        }
    },
    "browsers": {
        "plot": {
            "chartType": "area-spline"
        },
        "columns": {
            "hits": {
                "hide": true
            }
        }
    },
    "static_requests": {
        "plot": {
            "chartType": "area-spline"
        },
        "columns": {
            "hits": {
                "hide": true
            }
        }
    },
    "not_found": {
        "plot": {
            "chartType": "area-spline"
        },
        "columns": {
            "hits": {
                "hide": true
            },
            "visitors": {
                "hide": true
            }
        }
    }
}

As you can see, it controls the web-report preferences: layout, theme, visible columns in tables and so on.

What’s so special about it:

  • you have to provide it as a single line, so minify your JSON before pasting it to config
  • before v1.4 this value had a limit of 512 characters, so you could customize only that much

You can also get its current value from Local Storage in your browser (AppPrefs value). And by the way, if you made changes in goaccess.conf, but the dashboard preferences did not update - delete this value from Local Storage and refresh the page.

ignore-referer

You might be surprised to learn that the variable for these values (MAX_IGNORE_REF) is limited to 64 (items, I guess) by default. So if you have more than 64 domains which you would like to blacklist, you will need to re-build GoAccess from sources, having modified this line in src/settings.h before calling the configure:

#define MAX_IGNORE_REF 1024

Web-server access logs

Before running GoAccess, you might want to check how your web-server is logging requests.

NGINX

In case of NGINX, it is controlled by /etc/logrotate.d/nginx:

/var/log/nginx/*.log {
    weekly
    missingok
    rotate 8
    maxage 90
    compress
    delaycompress
    notifempty
    create 0640 www-data adm
    sharedscripts
    prerotate
            if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
                    run-parts /etc/logrotate.d/httpd-prerotate; \
            fi \
    endscript
    postrotate
            invoke-rc.d nginx rotate >/dev/null 2>&1
    endscript
}

So logs will rotate every week (weekly) with a maximum of 8 files (rotate 8), which covers 2 months. And maxage 90 will delete logs that are 90 days old, otherwise they will keep piling up and GoAccess will still be processing them.

To apply the changes:

$ sudo kill -USR1 $(cat /var/run/nginx.pid)
$ sudo systemctl restart nginx.service

In this configuration log files split every week and rotate every 8 weeks (2 months), so here’s what you’ll eventually have in your /var/log/nginx/:

access.log
access.log.1
access.log.10.gz
access.log.11.gz
access.log.12.gz
access.log.13.gz
access.log.14.gz
access.log.2.gz
access.log.3.gz
access.log.4.gz
access.log.5.gz
access.log.6.gz
access.log.7.gz
access.log.8.gz

Running

Manually

You can run GoAccess like this:

zcat -f /var/log/nginx/access.log* | goaccess - -p /usr/local/etc/goaccess/goaccess.conf -o /var/www/YOUR-WEBSITE/admin/analytics.html

here:

  • zcat -f uncompresses and reads compressed logs (-f forces it to read uncompressed ones too)
  • -p path to GoAccess configure
  • -o where generate the web-report file

As a result you will have all-in-one self-contained (HTML/CSS/JS) web-page ready to be deployed anywhere on your website (/admin/analytics.html in my case).

Even with default settings and from pre-built binary GoAccess reports are very useful, but it’s always nice to customize something to your likings.

For example, here’s a fragment of visitors hostnames panel - that’s what MaxMind GeoLite2 database was for:

GoAccess panel hostnames

It’s much more interesting when you see not just visitors IP addresses but also their country and sometimes even city, innit.

And here’s a fragment of the panel with 404 requests - mostly it’s scumbags trying to guess the admin interface:

GoAccess panel 404

Note how report doesn’t contain a Hits column (pretty much useless information, in my opinion), and tables are sorted by different columns. Well, actually you cannot see the columns names on the screenshots, but that’s what’s happening there.

You can also black/white-list certain requests - here are some more details about that.

In real-time

Obviously, the report won’t update itself as it’s just a static HTML page, so one way would be to run GoAccess with --real-time-html option (probably as a service), and it will watch the web-server logs, live-updating the report.

Using cron

…But I prefer to generate reports periodically, for example every hour. And that way you can save them to some archive on daily basis (but actually nobody stops you from doing so in addition to running GoAccess in real-time mode).

Anyway, you will need to add a job to cron. And since cron cannot work with pipes, you’ll need to create a /root/webanalytics.sh script (also remember to use absolute paths):

#!/bin/bash

/bin/zcat -f /var/log/nginx/access.log* | /usr/local/bin/goaccess - -p /usr/local/etc/goaccess/goaccess.conf -o /var/www/YOUR-WEBSITE/admin/analytics.html

And then add a cron job to call this script:

1 * * * * /root/webanalytics.sh > /dev/null 2>&1

Keeping old reports

If you’d like to keep old reports instead of constantly overriding the same file, you can reorganize the reports folder like this:

/var/www/YOUR-WEBSITE/admin/
└── goaccess
    ├── archive
    └── index.html

And then modify the script so reports are copied to archive folder every month, for example:

#!/bin/bash

zcat -f /var/log/nginx/access.log* \
| egrep -if /path/to/requests-whitelisted.txt \
| egrep -vif /path/to/requests-blacklisted.txt \
| sudo -u www-data goaccess - -o /var/www/YOUR-WEBSITE/admin/goaccess/index.html

dayOfTheMonth=$(date '+%d')
hourOfTheDay=$(date '+%H')
monthOfTheYear=$(date '+%m')
yearOfItself=$(date '+%Y')
if [[ $dayOfTheMonth == '01' && $hourOfTheDay == '23' ]]; then
    printf "Today is the %d day of the month, the report will be saved to archive\n" $dayOfTheMonth
    cp /var/www/YOUR-WEBSITE/admin/goaccess/index.html \
       /var/www/YOUR-WEBSITE/admin/goaccess/archive/"$yearOfItself-$monthOfTheYear-$dayOfTheMonth.html"
else
    echo "Not keeping this report"
fi