This will be a concise report regarding, the spamming of web forms by commercially driven criminal enterprises. I am forced to make this short, because my knowledge is second hand and incomplete. I have been consulting with an associate, who has been trying to counter a deluge of spam that is being thrown both into his and into his client's email. We are not in agreement as the probable path; I think it is through the form each time whereas he suspects a single pass followed by cloning the resultant emails. However, I have no access to the email or the internals of the site. He just wants the problem solved with minimum effort. Therefore, it is possible his answers are not accurate; I simply cannot be certain.
Arbitrarily, I am trying to recreate the time line of the thoughts that suggested solutions as they evolved. Perhaps not the best approach, but given the options it suffices.
I am proposing solutions that may counter either singly and both modes of dumping spam mentioned above. All the suggestions have been passed on to my associate. However, I have heard no word on their effectiveness, perhaps because he already has crossed this threshold of limited patience and ceased to expend further efforts.
It's ironic that we have exchanged action modes, he is usually the one given to careful examination of details whereas I am in a rush to push my model into action. Nonetheless, his original approach gives me the opportunity to rail against over reliance on auto generated web code and design, which hinders effective critical maintenance fixes later. For example, pernicious, external attacks are harder to counter when so much is hidden from the supposed builder as seems true in this case.
My obsession has been web forms as opening the site to break in attempts, hence, I had not considered simple email spam attacks as a concern. Now that I recognize its effects, I see too that my model could identify and easily eliminate this threat. That is, if the bot's code carelessly dropped links in all open locations as they seemed to have done on my associate's experience, they are easily seen and countered. On my sites, the solution would have been: if identified, simply dump.
I think the problem is intrinsic to relying on FrontPage (or similar page designers) and Extensions, where it is difficult to affect change in small increments [1.]. Moreover, I think the problems are compounded by too much reliance upon client side scripting (Javascript), which can be simply ignored by the intruder or used to their advantage by studying the page source [2.] to evade the testing. However, it is a bit more complicated. He also uses "Extensions [3.]" that act on the server side, but they (I am guessing here) take more effort to revise. Therefore, remedies are both hard to design, test and implement. The worst possible situation.
When you are attempting to avoid harassing casual users, the presence of Javascript identifying honest input errors is preferable to multiple circuits to the server resulting in differing sets of rejection notices. Logically, client side testing is not a flawed concept per se. It is the pernicious effect of trash purveyors that ruins the experience for many. Therefore, a mixed set of validation code, i.e. client and server side is more appropriate.
My associate said he was easily catching the string "href"s that were placing links to dubious websites into fields needing simple character inputs. He initially had reservations of simply returning a false that refreshed the form. As the first major counter method, he tried was simply deactivating the submit button if the Javascript was suppressed using a noscript tag. He claimed this had no discernible effect in lessening the flood of spam laden entries. Moreover, he claimed the emails appeared to match the data set defined by the form, hence, he concluded subsequent emails were clones using the initial view of the email address. The bots or, source was just repeating the mailings without need to process the form.
I know that sending email is near zero cost, however, if the code is written to harvest email addresses from active forms, cloning the emails would (or should) lead to rapid devaluing of the address. Processing is cheap, hence, if data is being dumped randomly on open forms the probability of the garbage reaching a valid end point is higher by passing through the form each time. But logic plays a limited role when dealing with scum, hence, cloned emails are indeed possible.
Added Inputs - Dump All Subsequent Emails Lacking Field(s)My next suggestion was based upon what I suggested in the "Set Traps" section of the LXer series running on Forms and Security issues[4.]. I made two suggestions, add the fields but make them invisible, where the bot on a new pass would enter data. That alone would flag the email as junk. The other option was having small additional fields, those emails not containing the fields would be recognized as older clones by an email filter.
It's odd but I do not remember any response other than at the time he said he read of this suggestion previously. Therefore, I have my doubts it was even attempted, this would have cut the flow if cloned emails were the major problem. However, it just may have been too difficult to accomplish given the constraints of web design he had, when he was looking for a quick fix.
Change Email AddressesMy associate told me the actual email address is hidden and generated at run time, probably by one of the FrontPage Extensions he mentioned. Moreover, these email addresses would be generated on the server. I suggested two changes. Any content known to be spam would not be sent to the site address. Furthermore, the old address would become a repository for cloned spam emails. That step would effectively stop both new attempts and ones using the harvested email address. A new email address would be created for good form content messages.
I made one further suggestion. New identified spam emails should be dumped in a junk mail box for at least some review of the techniques. I advise you to expect changed techniques, because the perpetrators know their code has a limited effective life. Until there are better tools, site owners and developers must be prepared for an ongoing battle.
First if you use a web design tool that generates code, know the output sufficiently that you can modify it without requiring a complete build and testing cycle. If you insist on skipping the understanding of the underlying code, recognize you are a gambler. Therefore, be prepared prepared to pay a price.
In the case of unequivocal discovering of spam input, give no hint of failure. At most send a copy to the webmaster or site developer. Moreover, take these actions on the server side. Spammers operate primarily on statistic probabilities; do not improve their odds by openly providing addition information.
Javascript code should be reserved to catching honest errors and making the site more inviting to those you desire to serve. Since Javascript can be bypassed, exclude testing for spam or intrusion attempts. As a developer, if you are dependent upon server code you either do not understand or cannot control you have opted to use the wrong tools.
Before assigning email addresses, think about procedures to alter the flow if spam load becomes intolerable into a specific address, either use a good filter and dump such as SpamAssassin or switch respected end point mail boxes for the good and spam loads. Whenever, possible just dump the latter into a dead end box where some can be reviewed.
I suggest at least considering using php or python for server side coding. The later has the option of mod_python if performance is an issue. So far I find php works and for personal reasons I prefer its syntax.
My associate began this effort trying to use captcha vetting as suggested by the support team of the hosting service. However, he claimed he was not successful in getting it to work. At this time I am unsure if that was due to it being ineffective or whether it was misinterpretation. Regarding the latter, if he was correct that he was being inundated by cloned emails, the captcha would not be part of the process. I also volunteered disturbing information that captcha had been broken with a report that Google email account registration had been compromised. He may have ceased his efforts as a result of that news.
As I indicated at the beginning, just about everything I have written is speculation, however, I will be meeting my associate and perhaps I can actually see the result and attempted fixes. At this stage, I have no idea if my suggestions were even tried. If they were or another was used in their place I will report by updating this piece.
Corrections, suggested extension or comments write: How-To-Guy. If the mailto does not work, use this: hcohen[-At-]bst-softwaredevs.com.
© Herschel Cohen, All Rights Reserved
____________________________________________________________________
1. I have seen such a site where I could not understand the
code. In a page designer, when viewing the design in
graphic mode, I understood the problem. There were
multiple nested html tables. I am convinced this is what
has helped give html tables such a bad reputation.
Return
2. Firefox allows Javascript to be by default, inactive.
Return
3. That is, extensions associated with the web page designer
FrontPage. Return
4. On my site's form the idea was to fool the bot to
enter data to invalidate the entire form. Humans would
have been trained to leave the fields empty.
Return