Solving a cookie problem

Gallica and the cookie problem

For several months, I was having trouble getting things from Gallica.bnf.fr, the wonderful website of the French National Library. I could work my way down to the page on their site that contained a list of volumes of some journal, but when I tried to look at a particular page, I'd get a little message saying the page I wanted was not available.

At first I figured Gallica was being flaky. After all, they were down from time to time for modifications. But the trouble persisted. And there were similar problems at other large websites: usgs.gov and blackwell-synergy.com, for example. Some of them complained that I wasn't allowing cookies to be set — even if I had carefully turned on all cookies in my browser.

As usual, I turned to Google for help. I could find similar complaints from other people, but nobody seemed to have the answer. FAQ lists did not help. The documentation for the browser (I've tried both mozilla and various versions of Netscape) did not offer anything useful.

If at first you don't succeed, give up

Eventually I decided it must be some obscure bug in the browsers, which might be fixed some day. I figured I'd just have to use the machines in the library, which use a browser that isn't available for my operating system (I'm running Debian Linux).

This was a workaround I'd read of in one of those archived newsgroup discussions (Google groups is a good place to look for answers to such problems). You use the machine in the library to tell Gallica to package the pages you want as a file to obtain by anonymous ftp, then you go back to your own computer to download the file. A big pain, but it works.

Serendipity strikes again!

Last night I was feeling too stupid to do anything useful, so I thought I'd finish reading an issue of Science. The issue contained one of those blow-in advertising cards, which touted the member services offered by the American Association for the Advancement of Science to subscribers; I'd been using the card as a bookmark.

Eventually the blandishments offered on the blow-in card caught my attention, and I decided to have a look at their website. They ask you to register, which I did; then you supposedly can get to the content. But I found I couldn't get past their login page: clicking on the entry button just reloaded the same page again.

Fortunately, they have a place to click if you “Can't get past this page?”. I clicked it. It brings up a page that offers help with several types of problems — among them, “Cookie Problems.” The first item under that heading says:

You are not accepting cookies.

Make sure that your browser's preferences are set to accept cookies. If they are and you are still having problems, try the Cookie Test.

The cookie test was informative. I had in fact tried turning on acceptance of all cookies; but the cookie tester showed that the cookie was not getting sent back to their server.

The last item under Cookie Problems says:

You are accessing SCIENCE Online via a "proxy server" that is deleting cookies automatically.

You should contact your network administrator to determine whether your institution's proxy server is the source of the trouble.

OK, let's have a look at what Mozilla thinks it's using for a proxy. I click on Edit -> Preferences, which brings up a secondary window titled “Preferences." At its left side is a column headed “Category.” I go down to the “Advanced” category and expand its sub-menu by clicking on the + sign to the left of “Advanced.” In the middle of the sub-menu is the item “Proxies,” which I click.

The main area of the subwindow now says “Proxies” at the top. In the middle is “Manual proxy configuration,” which has been selected instead of “Direct connection to the internet” or “Automatic proxy configuration URL.” The last item under “Manual …” is a text window to fill in with “No proxy for:” domains.

I add ".sciencemag.org" to the comma-separated list of domains there, and click on “OK” at the bottom of the “Preferences” window, which then vanishes.

Then I go back and try to log in to the AAAS website again.

Bingo.

Interoperability

So, if it works for them, will it work for Gallica? I again bring up the Preferences window and add ".bnf.fr" — note the leading dot — to the list of unproxied domains. Then I try to download a journal article from Gallica.

It works! The problem is solved.

But it's never enough to have the problem solved. I need to understand why there was a problem in the first place.

This morning, with a fresher mind, I again Google for help, looking for "cookies" and "proxy" together. Much of what comes up is irrelevant, but a page from junkbuster catches my eye.

I use junkbuster to suppress popup ads. But of course, it does this by pretending to be a proxy web server that claims it can't find pages belonging to various notorious ad sources. That's why I was set up to use “localhost” as the proxy server. And the fake proxy-server that junkbuster set up was interfering with the cookies from Gallica and the other big websites.

So, now I understand the problem. It's a subtle interaction between the browser and junkbuster. If you don't have everything configured just right, you run into these special cases where something mysteriously doesn't work. Conversely, if you have everything configured correctly, everything works like a charm.

An important aspect of understanding the problem is to see that you can still have a restrictive cookie policy while using these big sites. Permissiveness with cookies — which is what some of them recommend as a solution — is not only not sufficient to solve the problem; it isn't even necessary. So I can keep my hard-nosed cookie-acceptance settings.

If you have this kind of problem, maybe my experience will prove useful. (Note that Netscape and Mozilla are very similar; so many Netscape Navigator users must have this problem as well — as, indeed, I did when I tried using Netscape.) I suppose a similar situation occurs when popup-ad-rejecting software other than junkbuster is in use.

Other websites

Since writing the text above, I've discovered a number of other sites whose fancy Web pages won't work unless the proxy is turned off for them. These include the domains

.usgs.gov,
.aip.org,
.google.com,
.amazon.com,
.blackwell-synergy.com,
.osa.org,
.kluweronline.com,
.abebooks.com,
.leviton.com

besides .sciencemag.org and .bnf.fr that were mentioned previously.

The symptoms of trouble differ from one site to another. Sometimes you get stuck on one Web page and can't continue. Sometimes you are supposed to see something to interact with, and it never appears.

The case of Amazon.com is particularly interesting. Their site works OK until I'm ready to check out; then, when I try to enter their secure server, instead of getting the proper encrypted webpage, I get an error page that informs me my shopping cart is now empty, and I have to go back and do everything again. So I added them to the no-proxy list in the browser's configuration, and once again, all the problems went away.

Because of the variability of the symptoms from one site to another, I've adopted a policy of trying the no-proxy trick whenever I have trouble with some large organization's web pages. Most of the time, it works.

Of course, your mileage may differ.

 

Copyright © 2004, 2005, 2006 Andrew T. Young


Back to the . . .
introduction to the library adventures

or the GF home page

or the website overview page