Malware Log Analysis

An essential component of the analysis and cleanup of websites infected with malware is viewing and evaluating the log files. However, even here there are things to consider that might seem odd at first glance.

Let's say you find a suspicious file xyz.php on a website, which contains malware. Now, a first step would be to look at the file information.
stat xyz.php. The output might then look like this in excerpts:
Access: 2022-01-25 17:49:11.033089757 +0100
Modify: 2021-12-10 17:35:44.160789235 +0100
Change: 2021-12-10 17:35:44.184790689 +0100
.

So this file was created or last modified on December 10, 2021 (But beware, even this information can be faked!).

In the next step, we look in the access.log of the corresponding web page for calls that occurred during this time. For low-traffic web pages, it's no problem to go through the whole log file line by line in the affected period, but for high-traffic pages, it's not (so easily) feasible. So we narrow it down.

grep '10/Dec/2021:17:35' access.log.

Again, this might still return too many results and be cluttered, so we try it by filtering only successful calls.

grep '10/Dec/2021:17:35' access.log | grep -E '" (20.|30.)'

The command will pick out all calls of this period that were answered with code 20x or 30x, so where the url exists or forwards.
However, this is exactly where we "lost". Let's look at the following log excerpt:

xx.xx.xx.xx - - [10/Dec/2021:17:35:13 +0100] "POST /wp/wordpress/wp-includes/css/dist/editor/themes.php HTTP/2.0" 404 49793 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36"

With our filter, we would not have detected this entry, because the response from the server was 404, meaning that the called script does not exist. 404 errors are very common in the log, as hackers and bots try a lot of urls to randomly find known entry points for vulnerabilities. The thing we need to keep in mind is this: the HTTP code has NOT been set exclusively by the web server.

Here is an example code from a PHP malware:
<?php
error_reporting(0);
http_response_code(404);
// malicious code follows
.

Oha! So this script always outputs a code 404 in the log, even if the call was successful. Ergo, in filtering the logfiles, we must not pay attention to the HTTP code (as an exclusion criterion).

There is no patent remedy for log file analysis. The time that a stat filename.php outputs may not always be correct. And even though most malware calls are made via POST request, that doesn't always have to be the case. In the end, cleaning a website is always a task that is lengthy and needs to be done in a very conscientious and focused manner. Even deleting all affected files is not always the right option, because existing files are often modified to contain or include only a few lines of malware. When the file is deleted, the website is then no longer functional.

In the best case, a backup exists with (almost) all current data that does not yet contain any compromised files. In the worst case, reinstalling the entire website with (manual) transfer of the content is the safest option.

Malware Log Analysis: Don’t Let the HTTP Code Fool You