Visitors analytics with GoAccess
Like I already mentioned in my other post about server-side visitors analytics with .NET Core middleware, if you are relying only on Google Analytics, then you are missing a good portion of data about your visitors.
Recently I discovered one more way to analyze visitors data - by using GoAccess tool (and its nice web-reports):
What’s especially great about this tool is that it analyzes web-server access logs, so it is the most trustworthy and “closest to reality” data about your visitors that you can possibly ever get.
GoAccess vs. your custom analytics
So when would you like to use GoAccess instead of your own custom analytics middleware (that’s about my previous post again)?
Well, while the latter certainly gives you a great flexibility/customization, not always you really need to have that. Plus you are dedicating certain resources of your .NET Core application to process every request, and that might eventually hurt your application/website performance. And aside from processing requests data, you also need to implement web-views/reports and to make them look nice.
The former, on the other hand:
- takes data from “one level lower” - web-server access logs! It’s as low as it gets
- doesn’t take anything from your application resources
- already provides ready-made customizable web-reports with a very nice GUI
So after evaluating GoAccess capabilities I actually decided to drop my custom analytics middleware and to rely on GoAccess instead.
Using GoAccess
Installation/building
You can just download pre-build binaries for your platform, but in that case you won’t have certain features like GeoLocation/GeoIP (matching IP addresses with countries/places). And if you would like to have it, then you’ll need to build GoAccess from sources and provide required configuration options.
First you’ll need to install the following packages:
$ sudo apt install libncursesw5-dev libgeoip-dev libmaxminddb-dev
And then configure and build GoAccess like that (but first read about ignore-referer):
$ wget https://tar.goaccess.io/goaccess-1.4.tar.gz
$ tar -xzvf goaccess-1.4.tar.gz
$ cd goaccess-1.4/
$ ./configure --enable-utf8 --enable-geoip=mmdb
$ make
Here we are enabling better UTF-8 support and GeoIP functionality (with mmdb
variant), for which to work you will also need to download the MaxMind GeoLite2 database and put GeoLite2-City.mmdb
file to /usr/local/etc/goaccess/
.
You can now install it, but be aware that if you are updating a version that you already had, then /usr/local/etc/goaccess/goaccess.conf
will be overridden, so back it up first:
$ sudo cp /usr/local/etc/goaccess/goaccess.conf .
$ sudo make install
$ sudo mv ./goaccess.conf /usr/local/etc/goaccess/
Check if the executable works:
$ which goaccess
/usr/local/bin/goaccess
$ goaccess --version
GoAccess - 1.4.
For more details visit: http://goaccess.io
Copyright (C) 2009-2020 by Gerardo Orellana
Build configure arguments:
--enable-utf8
--enable-geoip=mmdb
Settings
Setting are stored in /usr/local/etc/goaccess/goaccess.conf
(or /etc/goaccess/goaccess.conf
, or whichever you might have set on configuration). Here are the modification I made in my config (you can find a more detailed options explanation in the documentation):
# that works with NGINX logs
log-format COMBINED
# which panes to show, which columns to hide, etc
html-prefs # this value is explained in the next section
# ignore requests from these IP addresses (our internal network)
exclude-ip 10.200.0.0-10.200.255.255
# consider requests like /fonts/fontawesome-webfont.woff?v=4.0.3
# to be static files still
all-static-files true
# ignore crawlers requests
ignore-crawlers true
# panels I am not interested in
ignore-panel VISIT_TIMES
ignore-panel REFERRERS
ignore-panel KEYPHRASES
ignore-panel GEO_LOCATION
# ignore spam/trash referrers
ignore-referer anti-crisis-seo.com
ignore-referer *.sel-hoz.com
ignore-referer volcable.ru
ignore-referer *.painting-planet.com
ignore-referer *.xn--74-jlcepmffs7i6a.xn--p1ai
ignore-referer playbox.life
ignore-referer ukrtvory.in.ua
ignore-referer vulkan-klyb.ru
# ...
# https://decovar.dev/database/trash-domains.txt
# sorting in panels
sort-panel REQUESTS,BY_VISITORS,DESC
sort-panel REQUESTS_STATIC,BY_BW,DESC
sort-panel NOT_FOUND,BY_BW,DESC
sort-panel HOSTS,BY_VISITORS,DESC
sort-panel OS,BY_VISITORS,DESC
sort-panel BROWSERS,BY_VISITORS,DESC
sort-panel REFERRING_SITES,BY_VISITORS,DESC
sort-panel STATUS_CODES,BY_DATA,DESC
# which requests should go to static files panel
static-file .css
static-file .js
static-file .jpg
static-file .png
static-file .gif
static-file .ico
static-file .jpeg
static-file .pdf
static-file .csv
static-file .woff
static-file .woff2
static-file .txt
static-file .zip
static-file .mp3
static-file .mp4
static-file .exe
static-file .gz
static-file .rar
static_file .7z
static_file .wasm
# ...
# and so on, that's totally up to you
# absolute path to the MaxMind GeoLite2 database,
# which you should have downloaded earlier
geoip-database /usr/local/etc/goaccess/GeoLite2-City.mmdb
html-prefs
The html-prefs
option is a bit special, so I’m listing it separately:
{
"details": true,
"layout": "horizontal",
"perPage": 25,
"theme": "darkPurple",
"showTables": true,
"visitors": {
"plot": {
"chartType": "area-spline"
},
"columns": {
"hits": {
"hide": true
}
}
},
"hosts": {
"plot": {
"chartType": "area-spline"
},
"columns": {
"hits": {
"hide": true
}
}
},
"geolocation": {
"plot": {
"chartType": "area-spline"
},
"columns": {
"hits": {
"hide": true
}
}
},
"requests": {
"plot": {
"chartType": "area-spline"
},
"columns": {
"hits": {
"hide": true
}
}
},
"os": {
"plot": {
"chartType": "area-spline"
},
"columns": {
"hits": {
"hide": true
}
}
},
"referring_sites": {
"plot": {
"chartType": "area-spline"
},
"columns": {
"hits": {
"hide": true
}
}
},
"status_codes": {
"plot": {
"chartType": "area-spline"
},
"columns": {
"hits": {
"hide": true
}
}
},
"browsers": {
"plot": {
"chartType": "area-spline"
},
"columns": {
"hits": {
"hide": true
}
}
},
"static_requests": {
"plot": {
"chartType": "area-spline"
},
"columns": {
"hits": {
"hide": true
}
}
},
"not_found": {
"plot": {
"chartType": "area-spline"
},
"columns": {
"hits": {
"hide": true
},
"visitors": {
"hide": true
}
}
}
}
As you can see, it controls the web-report preferences: layout, theme, visible columns in tables and so on.
What’s so special about it:
- you have to provide it as a single line, so minify your JSON before pasting it to config
- before v1.4 this value had a limit of
512
characters, so you could customize only that much
You can also get its current value from Local Storage in your browser (AppPrefs
value). And by the way, if you made changes in goaccess.conf
, but the dashboard preferences did not update - delete this value from Local Storage and refresh the page.
ignore-referer
You might be surprised to learn that the variable for these values (MAX_IGNORE_REF
) is limited to 64
(items, I guess) by default. So if you have more than 64 domains which you would like to blacklist, you will need to re-build GoAccess from sources, having modified this line in src/settings.h
before calling the configure
:
#define MAX_IGNORE_REF 1024
Web-server access logs
Before running GoAccess, you might want to check how your web-server is logging requests.
NGINX
In case of NGINX, it is controlled by /etc/logrotate.d/nginx
:
/var/log/nginx/*.log {
weekly
missingok
rotate 8
maxage 90
compress
delaycompress
notifempty
create 0640 www-data adm
sharedscripts
prerotate
if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
run-parts /etc/logrotate.d/httpd-prerotate; \
fi \
endscript
postrotate
invoke-rc.d nginx rotate >/dev/null 2>&1
endscript
}
So logs will rotate every week (weekly
) with a maximum of 8 files (rotate 8
), which covers 2 months. And maxage 90
will delete logs that are 90 days old, otherwise they will keep piling up and GoAccess will still be processing them.
To apply the changes:
$ sudo kill -USR1 $(cat /var/run/nginx.pid)
$ sudo systemctl restart nginx.service
In this configuration log files split every week and rotate every 8 weeks (2 months), so here’s what you’ll eventually have in your /var/log/nginx/
:
access.log
access.log.1
access.log.10.gz
access.log.11.gz
access.log.12.gz
access.log.13.gz
access.log.14.gz
access.log.2.gz
access.log.3.gz
access.log.4.gz
access.log.5.gz
access.log.6.gz
access.log.7.gz
access.log.8.gz
Running
Manually
You can run GoAccess like this:
zcat -f /var/log/nginx/access.log* | goaccess - -p /usr/local/etc/goaccess/goaccess.conf -o /var/www/YOUR-WEBSITE/admin/analytics.html
here:
zcat -f
uncompresses and reads compressed logs (-f
forces it to read uncompressed ones too)-p
path to GoAccess configure-o
where generate the web-report file
As a result you will have all-in-one self-contained (HTML/CSS/JS) web-page ready to be deployed anywhere on your website (/admin/analytics.html
in my case).
Even with default settings and from pre-built binary GoAccess reports are very useful, but it’s always nice to customize something to your likings.
For example, here’s a fragment of visitors hostnames panel - that’s what MaxMind GeoLite2 database was for:
It’s much more interesting when you see not just visitors IP addresses but also their country and sometimes even city, innit.
And here’s a fragment of the panel with 404 requests - mostly it’s scumbags trying to guess the admin interface:
Note how report doesn’t contain a Hits column (pretty much useless information, in my opinion), and tables are sorted by different columns. Well, actually you cannot see the columns names on the screenshots, but that’s what’s happening there.
You can also black/white-list certain requests - here are some more details about that.
In real-time
Obviously, the report won’t update itself as it’s just a static HTML page, so one way would be to run GoAccess with --real-time-html
option (probably as a service), and it will watch the web-server logs, live-updating the report.
Using cron
…But I prefer to generate reports periodically, for example every hour. And that way you can save them to some archive on daily basis (but actually nobody stops you from doing so in addition to running GoAccess in real-time mode).
Anyway, you will need to add a job to cron. And since cron
cannot work with pipes, you’ll need to create a /root/webanalytics.sh
script (also remember to use absolute paths):
#!/bin/bash
/bin/zcat -f /var/log/nginx/access.log* | /usr/local/bin/goaccess - -p /usr/local/etc/goaccess/goaccess.conf -o /var/www/YOUR-WEBSITE/admin/analytics.html
And then add a cron job to call this script:
1 * * * * /root/webanalytics.sh > /dev/null 2>&1
Keeping old reports
If you’d like to keep old reports instead of constantly overriding the same file, you can reorganize the reports folder like this:
/var/www/YOUR-WEBSITE/admin/
└── goaccess
├── archive
└── index.html
And then modify the script so reports are copied to archive
folder every month, for example:
#!/bin/bash
zcat -f /var/log/nginx/access.log* \
| egrep -if /path/to/requests-whitelisted.txt \
| egrep -vif /path/to/requests-blacklisted.txt \
| sudo -u www-data goaccess - -o /var/www/YOUR-WEBSITE/admin/goaccess/index.html
dayOfTheMonth=$(date '+%d')
hourOfTheDay=$(date '+%H')
monthOfTheYear=$(date '+%m')
yearOfItself=$(date '+%Y')
if [[ $dayOfTheMonth == '01' && $hourOfTheDay == '23' ]]; then
printf "Today is the %d day of the month, the report will be saved to archive\n" $dayOfTheMonth
cp /var/www/YOUR-WEBSITE/admin/goaccess/index.html \
/var/www/YOUR-WEBSITE/admin/goaccess/archive/"$yearOfItself-$monthOfTheYear-$dayOfTheMonth.html"
else
echo "Not keeping this report"
fi
Social networks
Zuck: Just ask
Zuck: I have over 4,000 emails, pictures, addresses, SNS
smb: What? How'd you manage that one?
Zuck: People just submitted it.
Zuck: I don't know why.
Zuck: They "trust me"
Zuck: Dumb fucks