If you haven't read the Google Blog post announcing its new project 'Knol', named a potential Wikipedia-killer, or haven't heard the following media buzz and speculation, then you've been living under a rock. In short, Google wants to create a user-generated encyclopedia site much like Wikipedia, but with a twist. Instead of allowing anyone to change any content on the site, the Knol project will host articles on just about everything that have an author who is responsible for the upkeep and quality of the article content. This might give Knol a more authoritative edge over Wikipedia. Google is also going to allow ads on the article pages, of which the author can get a cut of the ad revenue.

I won't delve further into the technical details of how Knol differs from Wikipedia or join in the argument about which one is the better model. What I have been wondering is why Google has decided to start up this project when Wikipedia is doing a decent job at this already. Sure, Google's mission is to organize the world's information and make it universally accessible and useful. If it controls (or at least host) information, it is a lot easier to organize and make accessible to the masses. Or maybe Google feels that Wikipedia lacks an authoritative quality that teachers across the country gripe about. Anyone can change anything on Wikipedia, so how can it always be right? Maybe Google wants to provide a more reliable, more truthful (and referenceable) user-generated encyclopedia model.

OR, maybe Google is in it for the money. After all, Google is a for-profit organization in the center of a capitalist society where earning another buck is the number one priority. Why would Google want to replace Wikipedia? Perhaps Google is sending Wikipedia lots of traffic. If Google found a way to retain much of that traffic, then it could greatly increase its page views, and thus its ad revenue.

So, just how much traffic is Google sending Wikipedia's way?


Terms of the Data Collection

I decided to collect data about how Google ranks Wikipedia for a wide range of search terms. I searched for a decent word list and settled on the word list 2of12 from the 12Dicts Official 12Dicts Package ver 5.0. Next, used the Google AJAX Search API to create a process to automate the retrieval of search results for each word in the list. Because of limitations of the API, I could only retrieve the first eight search results. In the results for each word, the process searched for the first result to Wikipedia and recorded what the rank of that result, whether it was first, second, third, etc. Some words did not have any results that linked to Wikipedia. I collected the following data data between 12:54 and 18:11 CST on December 17, 2007 using this method.


The Data

Below is a summary of the search results rankings in table and graph format. You can also download the full data set that I collected by following the link under the graphs. It is the word list containing the rank for each word. You can get the word list sorted alphabetically by word or sorted by rank.

Number of words in list: 41238
Rank 129.17% (12030 words)
Rank 211.30% (4658 words)
Rank 38.73% (3599 words)
Rank 45.13% (2115 words)
Rank 53.31% (1365 words)
Rank 62.23% (920 words)
Rank 71.66% (683 words)
Rank 81.20% (496 words)
Not in Top 837.28% (15372 words)


Download the word list with search results rank for the first Wikipedia result for each word:
Sorted Alphabetically by Word OR Sorted by Rank


Quick Observations

* Nearly 1 in every 3 words (29.17%) in the word list have a Wikipedia result in the top result.

* Nearly 1 in every 2 words (49.19%) in the word list have a Wikipedia result in the top 3 results.

* Nearly 2 in every 3 words (62.72%) in the word list have a Wikipedia result in the top 8 results.


Word List Selection and an Assumption on Search Patterns

I probably didn't spend enough time searching for the most appropriate word list, although I didn't pick the first one I found either. I wanted something that represented the English language well, like a good clean dictionary word list, rather than one of the many techno-jargon word lists out there. I found the 12dicts package and continued searching, then came back to it when I didn't find anything else I liked better. In hindsight, what I probably should have looked for is a word list containing a sample of real Google search terms.

Keep in mind that I have a good word list, but it is not the most appropriate word list for this study. An assumption you have to make to relate this search results data to actual search patterns is that the word list I used appropriately represents the words users search for and that the words are evenly repeated among searches for a given day. This assumption is not true, so the conclusions you make based on this assumption may not fully represent real world numbers. It is difficult to improve the accuracy of these results without a word list that more accurately represents the words people search for and weights for each word that represents how often it is searched for within a certain period of time.


Further Observations

So, given the above assumption, we can draw further conclusions:

SearchEngineWatch.com says that Google has 91 million US-based searches per day. That means that among those search results:

* 26,544,700 Google search results per day contain a Wikipedia link as the top result

* 36,827,700 Google searches per day contain a Wikipedia link within the top two results

* 44,762,900 Google searches per day contain a Wikipedia link within the top three results

* 49,431,200 Google searches per day contain a Wikipedia link within the top four results

* 52,443,300 Google searches per day contain a Wikipedia link within the top five results

* 54,472,600 Google searches per day contain a Wikipedia link within the top six results

* 55,983,200 Google searches per day contain a Wikipedia link within the top seven results

* 57,075,200 Google searches per day contain a Wikipedia link within the top eight results

* 33,924,800 Google searches per day do not contain a Wikipedia link within the top eight results

For any given search term, you are more likely than not to have a Wikipedia result within the first four results on the first page. Many people first click links near the top of the first page of results, which means that a lot of traffic is directed to Wikipedia.

I am not suggesting that Google's motivation is money, but I have to wonder if it is playing a part in its decision to create a competing service to Wikipedia, just to displace Wikipedia results with Knol entries in search results and reclaim page views for ad revenue.

MyNetworkTV

| | Comments (0) | TrackBacks (0)

MyNetworkTV is the television network operated on many CW Network stations. If you didn't already know, the CW is the network that was formed when The WB and UPN networks merged and began airing in September of this year. I don't watch any CW shows, but my wife and I do watch a quite a few current TV series.

We've been watching ABC's Grey's Anatomy and Desperate Housewives since they began airing a couple years ago. We picked up Lost at the end of season two, but we started from the first episode and caught up before watching any new episodes. We've also been watching the Sci Fi shows Eureka and Stargate Atlantis (she doesn't watch that one, just me) and most recently we started watching the Fox show Prison Break (like Lost, we started at the beginning and caught up to the most recent airing episodes). We occasionally watch the Fox show House, but we don't follow it regularly and wouldn't know if an episode is new or a re-run unless we've already seen it before. No NBC or CBS shows really interest us, even though those two networks have been at the top of the ratings list for evening shows the past few weeks.

The thing about all of the shows we watch regularly is that you have to start watching from the beginning, otherwise they don't make much sense. That's the way Lost was to me BEFORE I started watching it from the beginning. I recall when the first season of Lost was airing, and I tried to watch an episode of it, but it was the fourth or fifth episode already. It didn't make much sense, since I didn't see if form the beginning and understand the context of where they were and how they got there. I didn't understand the ongoing inter-episode stories and it didn't seem like an appealing show to me at all. I wrote Lost off as a fad and assumed it wouldn't last past its first season. Near the end of the second season, I overheard co-workers speculating about the storyline and decided to give the series another chance. I got all the episodes form the beginning up to the last one that aired and in two weekend days and one weekday evening, we watched them all back to back from the beginning to the present, something like 40 episodes.

Now I think Lost is a great show and it was an awesome experience watching the series from the beginning without the waiting from one week to the next and through commercial breaks. I know these shows are the network's bread and butter, so to speak, but I really think these shows lose a lot of punch by airing in the conventional TV-style process. I wonder if there is a viable business model around creating and releasing these type of series a whole season at a time and available on-demand (for download, or on digital cable) so they can be watched back to back. I would even settle for them airing a new episode every weekday, like some MyNetworkTV shows are doing, instead of one a week. I understand that the show aired in this way would not be able to fill the full season, but my concern isn't about network scheduling practices, but about the quality of the experience when watching a series. I've been considering not watching the shows until the current season completes, then watching them back-to-back over a weekend's time, but it's hard to ignore the shows when you know when they're playing, so I still turn on the TV and watch them each week.

What shows do you watch regularly and what are your thoughts on how shows air and how you would like to see the process change in the future?

Technorati Tags:

From: The Belfast Telegraph: Heavy use of mobile phones can lead to fertility problems in men
They found that men who were the heaviest users of mobile phones - more than four hours a day - had the lowest sperm counts at 50 million per millilitre (ml) and the least healthy sperm. In contrast, sperm counts were highest (86 million per ml), and the sperm healthiest, among those men who reported that they did not use mobile phones.
I had to stop half-way through reading this and take my cell phone out of my pocket! Here's a related article: So just how dangerous is your mobile?

Find recent content on the main index or look in the archives to find all content.

Categories

Pages

Powered by Movable Type 4.1