Mechanical Turks, Amazon's Mechanical Turk, web services, blog aggregators, web searches, screen scraping, data and information, copyright and attribution… With the rise of “easy access to information” comes a relatively new, still untapped solution vertical: Answers

Amazon offers its Mechanical Turk API, Google and Yahoo! both offer Answer products, which in turn relies on people to find and post the answers. A number of startups such as AskMeNow have emerged in this space as well.

And with these new answer solutions also come concerns related to copyright, attribution and compensation. Let me give you an example. If I hit the web and ask the question “What is an answer?”, I get back:

      To “reply or respond to”, to “give the correct answer or solution to”.

The above are the answers I got back from a Google search.

But who's definition is this? I see no attribution other than “definitions on the web”. Someone spent the time to define and write this information. So why isn't there an attribution? Is that fair use? I am not a lawyer so I can't really say, but as an author I will say that the use of information without attribution is of great concern.

Another example – if the question that is asked is “What is the weather in Austin, Texas”, where do you think the information will be coming from? I will tell you: NOAA, or Yahoo! Weather, or Weather.com, or similar providers. And I am certain they would also like the source of the information to be attributed. Weather.com terms of use contains the following: “the contents of the site are copyrighted under the United States copyright laws. You may not modify, publish, transmit, display, participate in the transfer or sale, create derivative works, or in any way exploit, any of the content, in whole or in part.”

Content is king. And creating good and accurate content is not only hard, but it is expensive in many ways. And the web provides a shortcut to ask and find information and answers. And I am afraid the attribution problem will get worst with the new wave of Answer products that are coming out…

The following diagram shows the elements of a possible Answers engine:

Answers

Click to enlarge

Where… we have the source for most of the information/answers: the Web. The diagram also illustrates possible ways to extract information from the web: screen scraping, web searches, Mechanical Turks… Some “answer engines” may even implement a cache for performance purposes. You can see how information can easily be extracted from the web – without proper attribution and compensation. The local cache may even store this information without permission. This is a huge problem, because authors may be relying in users visiting their websites for generating revenue through advertising.

Everyone is affected… from bloggers, to authors of books and technical papers, to Wikipedia… to anyone who provides content. The end-user benefits from this, as well as the answers companies and the people behind the mturks who research (Googles) the information and delivers it. But to whose expense? To the authors and the content providers expense. Bloggers and other content providers already have to battle blog aggregators and others that cannibalize without any respect. The next battlefront might be against the answers companies.

Fair use is follow the terms of use, and provide proper attribution and fair compensation as appropriate.

Since I don't have an answer to how to enforce attribution and compensation, I will be modifying my weblog's legal terms to not allow the extraction of information by Mechanical Turks, or screen scraping, or blog aggregators, or any other method that extracts information without proper attribution and compensation.

If answers companies don't follow terms of use and don't provide proper attribution and compensation on the information they use, then they have become nothing more than sophisticated (possibly illegal) “screen scrapers”.

Web 2.0 is about collective use and collaboration… And referring to is not the same as extracting and cannibalizing from… Answers companies that uses other people's content, directly or indirectly, must do the fair thing: follow term of use, give proper attribution and show the money! :-)

Easy, fast, and now access to answers has great revenue generating potential. And it is of great benefit to end-users. In mobile handsets we need such solution; a cheaper solution that competes against the expensive (carrier-based) 411. All I ask is for fair use of information on the web.

ceo

Attributions:
* [Mechanical Turk Image source: Wikipedia]
* [Elements of an Answer Aggregator Image source: C. Enrique Ortiz]