2008/07/23

By the Book (of law)

A few weeks ago, I was quite excited about a "good idea" I had.
I spent some time on alexa, and looked at what the most popular sites are. It seemed that anything that had to do with news got to the top audience.

So I said to myself, this looks like a good idea: something of interest, a "small" corpus and a lot to do with natural language processing.

My tools being quite modular, I had rapidly a news search engine at my fingers with about 40 different sources and even toyed with graphics comparing terms frequency in articles, to be used with anything from brands to politicians.

I thought to myself at first, well, I've got a nice idea (something a bit more elaborate than the usual news search engine) and it's going to be some sort of win/win strategy: the newspapers will attract readers to articles they might have missed and are interested in, the newspapers will be generating more revenue, and my idea will bring me traffic.
Remembering a few articles I read some while ago, I did a quick search on trials going on that were about this kind of tool. There are a few, for large sums of money. The Belgian press syndicate seems to refuse any link to their newspapers. It seemed quite ridiculous.

Then I took a look at the Berne Convention. A news search engine could fall in the category of "fair use", the result being "quotations from newspaper articles and periodicals in the form of press summaries" (Article 10). But it might not.
The French law for instance can be even more restrictive: displaying the number of words of an article can be considered a "transformation" (Article L122-4 of the Code de la Propriété Intellectuelle) and thus forbidden. Or showing the size of a document, as about any search engine does. And it goes on and on. Any f(document) could be illegal.

True, I could just contact every newspaper and wait for their answer. For whatever reason, I don't expect any answer.

If someone is interested in developing that project, that's great, the code is ready. I'm off to other territories for the time being, at least until a few trials come to an end.

Next time, things will be technical again.