Maximum Anti-spam Measures for WordPress

On my blog I’ve had to take what I consider maximum, or perhaps even extreme, measures to minimize spam. I use captcha’s, which I don’t particularly like, as well as mandatory approval for comments along with shutting off comments after 30 days. Oh, and I’ve also commented out the code that allows for trackbacks and pingbacks.  It was the combination of these measures that finally gave me some peace-of-mind.

Back during the month when I finally implemented all of these anti-spam methods I had received over 400 spam messages via various methods. It was filling up my mail box and getting really annoying. Mostly it was the time wasted while glancing at each message to weed out legitimate comments from junk before deleting it. Now I’m down to around six or so spam comment posts per month, and while this is still annoying is a much more manageable number.

I’ve gradually accepted that I won’t be able to completely eliminate spam (argh!). And, there are several downsides to this heavy handed approach. Now, every time I upgrade to a new version of WordPress I have to re-comment out the trackbacks and pingbacks PHP code. If I don’t, I start getting spam again within 24 hours. Also, if someone wants to post a legitimate comment after 30 days they won’t be able to.

10 hard earned hints on how to increase your website uptime

Here are some tips on how to keep your website up and running from a hardware perspective. It’s these things that are often overlooked or underfunded that bring a website down. I spend all day working with software, but the hardware is often a mystery especially to software developers.

Twitter’s recent outage brought to light an important fact about any website: they can crash. Of course, Twitter’s outage affected perhaps a hundred million people globally, but the point is that your website can go down as well.

Let’s take a look at how uptime is typically calculated on a monthly basis. 30 days x 24 hours = 720 hours. If the website is down for 8 hours during that time period, it translates into an uptime of 712 hours. And, 712 divided by 720 equals an uptime of 98.888%.

Now generally that doesn’t sound too bad, and most outages are intentional due to maintenance and therefore occur in the middle of the night, or at some other time when web traffic is the expected to be the least. When you start noticing outages is when they occur during what you would consider as normal business hours. Because when that happens then your employees or visitors productivity can take a hit.

Oftentimes, for typical businesses very little focus is placed on the following. It’s my contention, for those of us who aren’t Twitter, you can easily reach near perfect uptime using the following guidelines. Even if you only have a handful of customers that hit your website a few times a week, if they experience an outage it will negatively reflect on you even if you are contractually covered. With a little work, you can significantly reduce down time.

  1. Use real-time monitoring tools or build your own. At a minimum you need to monitor: CPU, available memory, available harddisk space, current inbound bandwidth usage, and time to complete an HTTP request to at least your home page. Assign staff members who will get emails and text messages if the monitor detects a problem. Expect them to respond within a certain time period even on holidays and weekend.
  2. If you use a system monitor, host it somewhere other than where your server farm is located. This seems obvious but I’ve seen it happen where the monitor went down with the server since they were on the same machine, or located on the same site.
  3. Always, always, always take an image backup of the previous version of your website machine. Yes, I mean an image of the entire machine and not just a copy of the website directory. It’s easy to do, it doesn’t take long, and it doesn’t take much storage space these days to do what’s called a snapshot image.  With modern software, you should be able to do this with just a couple of clicks with your mouse.
  4. If you don’t use the cloud, have a cheapo spare server machine that can be swapped out for your more expensive super, ultra redundant machine. You can buy a decent quad-core, 16GB RAM server for a roughly $400 – $500 dollars as an insurance policy.
  5. Have a cheapo spare even if you use virtual machines. I’ve personally seen two instances of complete shutdown of all virtual instances when the master server blew out its memory.  Computer memory is like the engine in your car – when it dies your forward progress will come to a complete halt. I can tell you with certainty that most servers that you and I use don’t have redundant memory capabilities.
  6. If your primary server is down for some unknown reason you have several immediate choices: restart it and hope it comes back up, spin up a copy in the cloud, or direct traffic to your backup server. Now, all off these are great strategies as long as your site isn’t under a denial of service (DoS) attack or getting overwhelming amounts of standard, non-vengeful traffic. You can find this out by using the monitoring tools mentioned in #2. Here’s one possible approach. Whatever approach you use you should test it and document it:

    – Restart primary server. If it restarts okay then you’re golden.
    – If primary crashes, then plug-in either a hot-spare or spin up the stand-by copy.
    – If the backup copy crashes then go to the rollback snapshot copy that you made.
    – If the rollback snapshot copy crashes then you have no choice but to start doing a detailed investigation of log files and other forensics.

  7. Do your investigation or forensics after you’ve restored service. This should probably be rule number one! Your website visitors will appreciate this. I’ve been in several situations over the years where the focus was on figuring out why a system went down first while the users languished with no service. Do your best to get things started again and then worry about what happened.
  8. I’m going to mention ISP redundancy because it is definitely overlooked. If you have hosted your server at a major hosting provider, they often contract with multiple internet backbone providers. The backbone providers are the primary carriers of all internet traffic. And, the hosting facility will have multiple trunks from various primary carriers coming into their facility on different sides of the building in case one trunk is accidentally cut or has an outage. Check with your hosting provider for options on this.
  9. Universally everyone will tell you to cache your static content on a CDN. I agree with this although, to me, this is more of a performance issue than an uptime issue and that’s why I have it down here at number 9. It’s possible that if you have huge amounts of static content that this could prevent your server from crashing, but for most of us this is a performance boost.
  10. If you are getting overwhelming amounts of non-vengeful traffic, one concept I used to great success was to place a simple HTML file that was served up every so many visitors that said we were experiencing high traffic volumes. The simple file took very little overhead to serve and had an immediate and significant reduction in the load on the server. Naturally, when the event was over we took the message down.

I’m not going to address Denial of Service attacks since I’m not a security expert. There are plenty of papers on the web and experts you can call. I can say I have averted limited DoS attacks in the past by filtering IP addresses and working with an experienced network service provider. But, if you are under a widespread DoS attack get expert help immediately.

[Edited 6/23 – added a tenth hint! Fixed minor typos.]

Troubleshooting multi-threading problems related to Android’s onResume()

This post is a continuation of a previous post I wrote about best practices for using onResume(). I found a particularly testy bug that caused me 2 hours of pain time to track down. The tricky part was it would only show up when there was no debugger attached. Right away this told me it was a threading problem. I suspected that the debugger slowed things down just enough that all the threads could complete in the expected order, but not the actual order that occurred when running the device in stand-alone mode.

The test case. This is actually a very common workflow, and perhaps so common that we just don’t think about it much:

  • Cold start the application without a debugger attached. By cold start I mean that the app was in a completely stopped, non-cached state.
  • Minimize the app like you are going to do some other task.
  • Open the app again to ensure that onResume() gets called.

Now, fortunately I already had good error handling built-in. I kept seeing in logcat and a toast message that a java.lang.NullPointerException was occuring. What happened next was troubleshooting a multi-threaded app without the benefit of a debugger. Not fun. I knew I had to do it because of the visibility of the use case. I couldn’t let this one go.

How to narrow down the problem. The pattern I used to hunt down the bug was to wrap each line of code or code block with Log messages like this.

Log.d("Test","Test1");
setLocationListener(true, true);
Log.d("Test","Test2");

Then I used the following methodology starting inside the method were the NullPointerException was occurring. I did this step-by-step, app rebuild by app rebuild, through the next 250 lines of related code:

  1. Click debug in Eclipse to build the new version of the app that included any new logging code as shown above, and load it on the device.
  2. Wait until the application was running, then shutdown the debug session through Eclipse.
  3. Restart the app on device. Note: debugger was shutdown so it wouldn’t re-attach.
  4. Watch the messages in Logcat.
  5. If I saw one message , such as Test1, followed by the NullPointerException with no test message after it, then I knew it was the offending code block, method or property. If it was a method, then I followed the same pattern through the individual lines of code inside that method. This looked very much like you would do with step-thru debugging, except this was done manually. Ugh.

What caused the problem? As time went on, and I was surprised that I had to keep going and going deeper in the code, I became very curious.  It turned out to be a multi-threading bug in a third party library that wasn’t fully initialized even though it had a method to check if initialization was complete. The boolean state property was plainly wrong. This one portion of the library wasn’t taken into account when declaring initialization was complete. And I was trying to access a property that wasn’t initialized. Oops…now that’s a bug.

The workaround? To work around the problem  I simply wrapped the offending property in a try/catch block. Then using the pattern I described in the previous blog post I was able to keep running verification checks until this property was either correctly initialized, or fail after a certain number of attempts. This isn’t 100% ideal, yet it let me keep going forward with the project until the vendor fixes the bug.

Lessons Learned. I’ve done kernel level debugging on Windows applications, but I really didn’t feel like learning how to do it with one or multiple Android devices. I was determined to try and narrow down the bug using the rather primitive tools at hand. The good news is it only took two hours. For me, it reaffirmed my own practice of implementing good error handling because I knew immediately where to start looking. I had multiple libraries and several thousand lines of code to work with. And, as I’ve seen before there are some bugs in Android that simply fail with little meaningful information. By doubling down and taking it step-by-step I was able to mitigate a very visible bug.