PDA

View Full Version : Spam protection for website email?


MollyM/CA
May 29th, 2007, 05:19 PM
A friend, sculptor Kira Od, has been making "Weldies." I asked if I could post the pictures she's sent and offered to embed the copyright statement she always forgets, and said if her website's e-mail contact address is spam-protected I could add the URL or the contact address to the pictures.

Well:

"My e-mail is not “spoofed.” The spam got so bad I disabled the direct link from my website’s contact page. I don’t know how to spam-protect it, or I would. My hosting company doesn’t offer any quick fixes that I know of, either. So if you have any tips, I’m listening."

I've only gotten so far as to register a domain name (with GoDaddy) and register with Domains by Proxy. Any words-of-one-syllable help, or links to same those here can provide will be for me, too, as I'll no doubt want a contact address if I ever put together a page.


Here's a Weldie -- but why did I get a message about this .png file having the wrong extension when I tried to upload the .jpg as a URL from its Zenfolio gallery?

Judy G. Russell
May 29th, 2007, 05:37 PM
Any words-of-one-syllable help, or links to same those here can provide will be for me, too, as I'll no doubt want a contact address if I ever put together a page.There are three options in order of simplicity (most to least) and security (least to most):

Encode your email address in character entities. Those are the things that start with an ampersand and pound sign and then a number to represent the character. See this list (http://www.w3schools.com/tags/ref_ascii.asp) at w3schools and note this site (http://www.golivecentral.com/pages/txttut/scramble.shtml) will do the scrambling for you.
Make it more complicated and encode your email address with a more complex system of javascript etc. See this site (http://www.unwantedlinks.com/protectemail.htm).
Use a mail form on your website. See this page (http://faq.1and1.com/scripting_languages_supported/formmail_explained/) for an explanation.

Here (http://istpub.berkeley.edu:4201/bcc/Winter2003/feat.spamharvest.html) is a good explanation of all your options.

why did I get a message about this .png file having the wrong extension when I tried to upload the .jpg as a URL from its Zenfolio gallery?Errr... is it a png wrongly labeled a jpg (or vice versa)?

davidh
May 29th, 2007, 07:04 PM
Any words-of-one-syllable help, or links to same those here can provide will be for me, too, as I'll no doubt want a contact address if I ever put together a page.


I see Judy already gave you some tips.

I don't know if those sites she linked to talk about this or not. I put my email address in a simple black and white graphic file using Windows built-in graphic program Paint Brush. Paint Brush lets you enter text into graphics. I think that most of the web crawler robots that scan the web for email addresses do not bother trying to use Optical Character Recognition (OCR) to extract email addresses from graphic files.

The difficulty of using OCR to scan incoming email is also the reason that spammers put ads for viagra and oxycontin, etc. into graphic files to evade the spam filters.

I think the new Paint Brush in XP may be all you need to do this (put text of email address into image file). I have Windows 98, so I used Paint Brush to make a BMP file with the text of the email address in the BMP and then converted the graphic BMP file to JPG using Paint Shop program that came with one of my modem cards, I think.

Disadvantage of this is that the "customer" has to re-type the email address or write it down.

I did it with the graphic file myself because I did not want to bother to learn the script or whatever needed for the other methods.

BTW
Google offers free web hosting (with your own customized domain name) for small businesses. You get up to 25 email addresses. Google email spam protection seems to be pretty good to me over the last couple years. You can keep your domain registration where it's now at and if the registrar provides a reasonable service you will still be able to host the actual web pages on google. When I uploaded my site to google, I had to use their web based software, so it was not as convenient as using FTP to upload files with. For simple sites, google hosting is probably adequate. I have not seen any ads appear on my site yet, I suppose it's possible in future, at least if it happens google ads are not very intrusive.

Google also offers domain registration (about $10?/yr), but it may be in cooperation with a 3rd party registrar.

DH

sidney
May 30th, 2007, 01:23 AM
There are three options in order of simplicity

When I was concerned about this for my web site I did some Googling around and found a study (that I'm too lazy to search for again right now) that checked how effective different methods are against spam harvesting.

The bottom line:

Entity encoding is useless as protection. Don't bother with them.

Anything that requires Javascript to read will give you effectively 100% protection against automated email address harvesters.

The mail form will work perfectly as long as you don't make the foolish mistake of embedding your email address in the HTML of your form (which some people do) instead of hidden in the script on your server. But a mail form does not perform the same function as displaying your email address in a clickable link. So this isn't really a solution to the same problem.

There is a fourth option that was found to have about 100% effectiveness, and that was saying something like (sidney AT sidney DOT com) which is easy for a human to interpret but is not common enough for spammers to bother writing software to look for. That presents more difficulty to the people you are showing to, though, since it can't by used by clicking or even copy and pasting.

So all you have to do is put a little bit of Javascript on the web page like this that I have on mine. Even this is more complicated than it has to be, saying String.fromCharCode(64) instead of '@' and splitting my email address into four parts where two would do. All you really have to do is make sure the email is not explicitly in the HTML source. Note that this handles the possibility that someone has javascript disabled. In that case they see something that they can figure out the email address from.

Look at the link on my name on the bottom line of my web page (http://www.sidney.com/) to see how this comes out:

<script language="JavaScript">
document.write('<a href="' + 'mail' + 'to:sidney' +
String.fromCharCode(64) + 'sidney.com' + '">' + 'Sidney' +
'</a>')
</script>
<noscript>Sidney (sidney AT sidney DOT com)</noscript>

ktinkel
May 30th, 2007, 03:28 PM
Entity encoding is useless as protection. Don't bother with them.I have had great success with using various codes (in combination) to cloak e-mail addresses. You can use HTML entities and ascii codes with spaces, parens, symbols (+, %, etc.) in combination. Sort of a pain to work out, but once you have it, you can save it and just plug it in. Maybe I am in a fool’s paradise, but it is okay so far (after about a year).

Anything that requires Javascript to read will give you effectively 100% protection against automated email address harvesters.The trouble with Javascript is that some people have it turned off. Of course, your method, which includes name-at-address or similar, would cover that — assuming they felt like bothering.

I should say, though, that I have no idea how many users would be affected. I gave up and turned JS on a couple of years ago, and so far have had no problems (that I know of).

sidney
May 30th, 2007, 04:42 PM
I have had great success with using various codes (in combination) to cloak e-mail addresses. You can use HTML entities and ascii codes with spaces, parens, symbols (+, %, etc.) in combination

Can you give an example of code that includes parens and '+' symbols? The problem with entities is that almost any commonly available software library that is written to read HTML from a web server already has entity decoding built in, making it difficult to write a spam harvester that doesn't automatically decode it as it is reading. I would be interested to see if there is some more obscure encoding, intermediate between entity encoding and javascript, that web browsers handle ok but common HTML parser libraries do not.

-- sidney

Judy G. Russell
May 30th, 2007, 09:30 PM
Maybe I am in a fool’s paradise, but it is okay so far (after about a year).I've had my email address entity-encoded on my website for years. Knock on wood and rub my lucky rabbit's foot (lucky for me, of course, the rabbit probably has other ideas) -- I've never had a problem.

sidney
May 30th, 2007, 11:30 PM
I've had my email address entity-encoded on my website for years. [...] -- I've never had a problem.

How would you know unless you don't get any spam at that address? In the study I read, newly made up email addresses that would not be chosen in dictionary attacks were placed on web sites and not used in any other way. The test was whether any spam showed up the mail boxes. It could be that entity-encoded mail addresses don't get harvested often, but they do get harvested. None of the javascript ones, even the simplest, got harvested at all.

-- sidney

Judy G. Russell
May 31st, 2007, 10:12 AM
How would you know unless you don't get any spam at that address? In the study I read, newly made up email addresses that would not be chosen in dictionary attacks were placed on web sites and not used in any other way. The test was whether any spam showed up the mail boxes. It could be that entity-encoded mail addresses don't get harvested often, but they do get harvested. None of the javascript ones, even the simplest, got harvested at all.Sidney, I'm not saying the javascript stuff isn't more secure, but I have gotten very very little spam at the address on the website (knock on wood), and most of that I can trace back to other uses of the address. And I can compare it to one address (sigh... my work address) which has been scarfed up by spammers: the noise-to-signal ratio on my work address is something like 1000 to 1; on the website address, perhaps 1 to 1000. I ain't gonna argue.

ktinkel
May 31st, 2007, 02:23 PM
Can you give an example of code that includes parens and '+' symbols? The problem with entities is that almost any commonly available software library that is written to read HTML from a web server already has entity decoding built in, making it difficult to write a spam harvester that doesn't automatically decode it as it is reading. I would be interested to see if there is some more obscure encoding, intermediate between entity encoding and javascript, that web browsers handle ok but common HTML parser libraries do not.

-- sidneyThis is the sample I used as a model. In a browser, it returns a linked ‘Webmaster’; clicking on that opens a new e-mail message, but in the example case the address is gibberish. I don’t have a better example right now; do remember that it took me a while to work it out. The discussion I found with it referred to Internet Message Format RFC 2822 (http://www.faqs.org/rfcs/rfc2822.html).
<a href="mailto:(Webmaster)%20webmaster+(spam%20
whammy)randomdigits12345@%20yourhost.berkeley(not% 20.com).edu">
Webmaster</a>All the other examples I have using these symbols use Javascript.

sidney
May 31st, 2007, 05:53 PM
The discussion I found with it referred to Internet Message Format RFC 2822 (http://www.faqs.org/rfcs/rfc2822.html)/QUOTE]

Ooh... tricky! I wasn't aware that was legal email address syntax, but here is the relevant portion of RFC2822 that kind of allows it:

[QUOTE]Earlier versions of this standard allowed for different (usually more liberal) syntax than is allowed in this version. Also, there have been syntactic elements used in messages on the Internet whose interpretation have never been documented. Though some of these syntactic forms MUST NOT be generated according to the grammar in section 3, they MUST be accepted and parsed by a conformant receiver.

The stuff in parenthesis is treated as comments and the %20 is whitespace which is ignored at certain positions in an email address. Mail software that is RFC2822 conformant is required to accept email addresses that are munged like that but I can see how spam harvesters would not bother writing their software to look for such a strange case. That's a great idea!

The + adds further confusion. Many mail servers allow you to tack on a '+' and any other text and delivers to the mailbox name preceding the '+' while passing on the rest for use by mail filters that you want to set up. If you aren't using the + fields for anything, on such a mail server they are effectively ignored.

The only problem with all this is that it might confuse someone who see that your email address is coming up as

(Webmaster) webmaster+(spam whammy)randomdigits12345@ yourhost.berkeley(not .com).edu

because that is what they will see in their email program as your address.

-- sidney

Judy G. Russell
May 31st, 2007, 08:45 PM
The only problem with all this is that it might confuse someone who see that your email address is coming up as

(Webmaster) webmaster+(spam whammy)randomdigits12345@ yourhost.berkeley(not .com).edu

because that is what they will see in their email program as your address.But if you wrote it as:

Webmaster+(no%20spam)@%20yourdomain.com

it'd appear as:

Webmaster+(no spam)@ yourdomain.com

and that's pretty understandable!

sidney
June 1st, 2007, 03:43 AM
it'd appear as:

Webmaster+(no spam)@ yourdomain.com

and that's pretty understandable!

I did some experimenting in Thunderbird and the basic idea doesn't work. The RFC says that a program is required to accept mail addresses that look like that, but they are also required not to send it. If I take that address and paste it into the To address of a message I compose in Thunderbird, then if I save the message as a draft and open it again, I can see that Thunderbird has transformed the To address into something that conforms to the standard. Unfortunately, this complicated mess manages to confuse Thunderbird into converting it to something other than what is intended.

no spam <""\"\"\"\"Webmaster+\"\"\"@ yourdomain.com>

The bottom line is that you can use this to obfuscate an email address to something that someone can figure out when they look at but you can't ensure that if they copy and paste into their email program, ignoring how strange it looks, that it will work.

-- sidney

Judy G. Russell
June 1st, 2007, 07:58 AM
The bottom line is that you can use this to obfuscate an email address to something that someone can figure out when they look at but you can't ensure that if they copy and paste into their email program, ignoring how strange it looks, that it will work.Rats. We'll have to get Kathleen to explain what she did and see if we can replicate it.

ktinkel
June 3rd, 2007, 03:39 PM
Rats. We'll have to get Kathleen to explain what she did and see if we can replicate it.I copied something, and it took a lot of finagling. But when I tried to do it the other day, Eudora barfed, and said no dice.

Sorry I even brought it up. But I am pretty sure I had it working for a while, a year or so back.

I will ask Marjolein. She uses some sort of mixed encoding that she swears by, and it does not use + or encoded spaces.

MollyM/CA
June 3rd, 2007, 06:27 PM
Wow. Thanks, everyone, for all the suggestions. Maybe I'll start with the simplest (cut and paste rather than click-on address) and try escalating as needed -- if I ever get around to writing up a web page (sigh).

The problem with the picture was that the gallery

http://Mollym.zenfolio.com/p269254818/

is passworded (password "Kira"). Why that made the Attachment Manager complain that the picture was a .png I can't imagine -- worked to block it anyway.

Seems fine now:

Judy G. Russell
June 3rd, 2007, 10:27 PM
Why that made the Attachment Manager complain that the picture was a .png I can't imagine -- worked to block it anyway.
Seems fine now:Don't understand it, but at least it's working now -- and that's great stuff!

Judy G. Russell
June 3rd, 2007, 10:28 PM
I will ask Marjolein. She uses some sort of mixed encoding that she swears by, and it does not use + or encoded spaces.Mar5jolein should have some interesting suggestions.

ktinkel
June 4th, 2007, 08:54 AM
Mar5jolein should have some interesting suggestions.Her basic theory is outlined in this DTP thread (http://www.desktoppublishingforum.com/bb/showthread.php?t=2673&highlight=obfuscation). Jump down to message 8 to get to the meat (the earlier discussion is anti-JavaScript, primarily).

It essentially relies on a combination of hex, decimal, and no encoding.

Jeff
June 4th, 2007, 12:51 PM
That's a right fine Wyoming jackalope. A tad stylized, but still fine. Here's the real deal:

http://www.jackalope.org/

It was a good night time hunt. Didn't see any snipe either.

- Jeff

Judy G. Russell
June 4th, 2007, 01:01 PM
It essentially relies on a combination of hex, decimal, and no encoding.That's what I already suggested.

ktinkel
June 4th, 2007, 03:55 PM
That's what I already suggested.Guess so. Marjolein is pretty compulsive about it, though — suggests rolling a die to determine what should be done with each letter!

Judy G. Russell
June 4th, 2007, 05:12 PM
Guess so. Marjolein is pretty compulsive about it, though — suggests rolling a die to determine what should be done with each letter!That is pretty compulsive, for sure.

sidney
June 5th, 2007, 01:31 AM
Her basic theory is outlined in this DTP thread (http://www.desktoppublishingforum.com/bb/showthread.php?t=2673&highlight=obfuscation)

That's very interesting. Her results directly contradict the report I read, but I trust her thoroughness more than that of the authors. She says she puts a slightly different email address on each web page so she knows exactly where spambots did their harvesting. The study I read said that encoding was not useful, but I don't know exactly how they did the encoding. It may be that entity encoding by itself does not confuse the spambots but combined with URL encoding it does. I'm still very surprised, though. Why would someone write a web spider that has to be able to read abitrary web pages and follow links and not use some common software library that is written for web development that would automatically decode code entities and url-encoded hrefs?

-- sidney

Judy G. Russell
June 5th, 2007, 12:05 PM
Why would someone write a web spider that has to be able to read abitrary web pages and follow links and not use some common software library that is written for web development that would automatically decode code entities and url-encoded hrefs?Perhaps because it's more trouble than it's worth? Think about it: how likely is it that someone who takes the time to encode an email address will fall for a spam or phishing scheme? So why bother when there are so many other suckers out there?

Lindsey
June 5th, 2007, 06:19 PM
That's a right fine Wyoming jackalope. A tad stylized, but still fine. Here's the real deal:
LOL!! I love the mounted one! Looks like something out of a Disney cartoon.

--Lindsey

ktinkel
June 5th, 2007, 08:22 PM
That's very interesting. Her results directly contradict the report I read, but I trust her thoroughness more than that of the authors. She says she puts a slightly different email address on each web page so she knows exactly where spambots did their harvesting. The study I read said that encoding was not useful, but I don't know exactly how they did the encoding. It may be that entity encoding by itself does not confuse the spambots but combined with URL encoding it does. I'm still very surprised, though. Why would someone write a web spider that has to be able to read abitrary web pages and follow links and not use some common software library that is written for web development that would automatically decode code entities and url-encoded hrefs?Like you, I tend to trust M’s pragmatic experience.

Why the spammers do not write scripts to decode any characters, I do not know. But I also do not know why they persisted for so long in attacking the DTP Forum while we had a plug-in that automatically doomed them to moderated status. And then they stopped that. Shrug.

Marjolein may be correct in assuming they are really, really lazy.

sidney
June 5th, 2007, 11:51 PM
Marjolein may be correct in assuming they are really, really lazy.

It's good programmers that are really, really lazy. Spammers are really, really stupid. If I wanted to write a spider I would start with Googling for an open source package in perl that already did most of what I needed. I'm sure that anything like that would already handle both entity and URL encoding. It would be more work for me to come up with a spider that did not handle it. But one of the rules of spamming is that spammers are stupid, and that could explain why the email harvester spiders miss the encoded addresses, if they do.

-- sidney

Judy G. Russell
June 6th, 2007, 07:15 AM
one of the rules of spamming is that spammers are stupid, and that could explain why the email harvester spiders miss the encoded addresses, if they do.I suspect that a lot more effort (and a lot more intelligence, albeit wasted in my estimation) goes into the kind of programming that will result in much more money -- cracking your bank account, for example.