The perception by the majority of search users that they gain high quality results is the core value in Google's crown jewel, i.e. its search engine. Should Google persist in seeking the lowest cost solution to its being gamed, via software algorithms, while at the same time scrounging more user data gratis, it risks alienating a significant fraction of its user base. Its financial model depends upon its search engine retaining its high perceived value. Google's current counter attacks are insufficient, ineffective and sometimes misdirected. Therefore, I think Google is putting its financial model in jeopardy.
Some time ago, I noticed comments by users of Google services complain about being banned without notice nor explanation. Those banned described in detail the arduous efforts required to regain their lost status. This implied that Google has been looking for anomalous usage patterns that skewed ranking. Moreover, I concluded that this detection and punishment routines were autonomous. I based this upon the observation of how difficult it was to reach any human with authority to reverse the obvious miscarriage.
It has to be obvious that Google has been constantly tweaking its model in the effort to plug obvious ranking holes and in the pursuit of its own larger financial reward. When I saw most of my articles receded quickly to oblivion, I assumed that either my content did not meet sufficient quality standards, or simply was too insignificant to attract sufficient readership. Despite my doubts, where personal ego is involved I opted to consider the error mine. However, in several instances futile searches (where my content played no role) I knew to exist were unobtainable. Again I faulted my search skills. Nonetheless, repeated search failures have raised my doubts.
Recently from my own experience where I could no longer find superior content [1.] and actions by Google itself [2.] indicate too clearly that many searches are yielding substandard, gamed results. By the latter I mean inferior, older instructions that seemed to be overly linked given their quality with questionable voices attesting to the veracity of the solutions when they are too easily seen to be repetitive and non-working. Google is obviously aware given the links in footnote 2, however, if they see it as another opportunity to milk the volunteer users (as suggested by a poster on slashdot) they may end up wounding themselves.
Feel free to reject my guesses. What I am presenting are those factors I think I have seen implicitly to be the basis of Google's ranking methods. However, admittedly I lack access, time and temperament to attempt to document my suspicions [3.] [4.], hence, there is no pretense my assertions are to be taken as definitive or do I expect those to be cited by others as proof. The purpose is to help me structure my basic argument, i.e. Google is making significant errors in evaluating search result content.
Too Heavy Reliance on Link Count
Perhaps wrongly I sense Google started using the academic model, primarily used in scientific publications. In that type of publication, usually, those sources that are heavily referenced are taken to mean the content has high value, hence, tends to build reputations for the authors. At one time reputation and receiving awards by peers was the paramount driver. Therefore, citation (c.f. links) was a good measure of the perceived value and over time, most times, brought the best work to the fore.
However, even when gaining an out sized reputation is the driver in truly scientific endeavors, given the wrong environment and personalities it can go horribly wrong. As in the human embryonic stem cell cloning reports, which were deemed to be false. Here is a partial description on how it was allowed to happen.
Here is another recent case where the proponent of sonic bubble fusion was punished for falsifying experimental data and its supposed confirmation. The problem these two examples show, is both took years of investigations, which still failed to prove (in the latter case) gross misconduct, meaning purposeful deception. There are many more cases I have read, where no action was taken, because the individuals that were stealing ideas and results from defenseless students. Moreover, the individual(s) in question were "big name(s)" known to publish results and findings of others without attribution.
Nonetheless, in academics, over the longer term citation (linking) has worked, despite too many instances of unethical self aggrandizement even without direct financial incentives to subvert the ideal. At the onset, Google's model might have thought that on average page readings and links would average out to bring the better content to the top. Indeed, for a limited period this seemed to have worked. Moreover, the constant tweaking of the algorithms used to deflect skewing of the results, in many cases, kept ahead of the dishonest. But I think that run of expertise (and luck) has run its course due in part by Google's own actions.
The problem areas are where specialized topics can be subverted by relatively small groups working concert. Another failure stems from financial incentives to do so. Moreover, seeing Google seem to make its return paramount may make those so inclined less likely to consider their subversion not to be unethical. Finally, Google changed the game by monetizing the ranking results, opened the flood gates to fraud.
Google's Counter Attacks
As I mentioned, it is obvious that Google has been attempting to monitor and remove bad actors, however, obviously hitting the innocent at times. Moreover, just observing the same search will show changes in ranked results for active topics. However, the changes do not alway reflect the best available discussions. Furthermore, I have heard that Google warns some users of its intent to ban writers due to content choices. Some might conclude that this might suffice, because Google is making the effort. I think it fall short of the need precisely because Google's actions and standards are opaque to most that use its search engine. A bit of transparency might entice more cooperation from those seeking quality research results.
If we were using the scientific publication model, even highly cited papers could lose credibility rapidly when the methods and/or the data were shown to be spurious. A significant fraction of the peer group would cease to cite that group of papers. In an extreme case, the publications might even be withdrawn. I see no apparent mechanism in operation in the searches I perform. Indeed, in several cases it is the better results that have disappeared. The question becomes, does Google have anything akin to that corrective mechanism?
I think too that Google errs when it detects what it believes to be fraudulent skewing on its ranking algorithms, by extracting punishment behind the scenes. I doubt that anonymous removal of the offending material is of sufficient deterrent to those that have the incentive to cheat. Public denunciation could have undesirable legal consequences, however, to allow the perpetrators free reign may just embolden them to corrupt the ranking system further. To be effective those gaming the system have to be identified and the spurious link sites and their backers have to be identified to properly excised. The idea is to inflict the maximum pain upon the proper target. If this is happening, it certainly is not obvious to me.
Cheap Circumvention
Unfortunately for those with the incentive, a relatively small group can find the means to cross link inferior content by means of creation of their own sites or references from seemingly independent sources, such as forum comments with links, citation and even sample portions of the content. Most algorithms will miss the spurious value, indeed, in the skilled hands the subterfuge will fool the cursorary glance of the human eye. I think that Google lacks the means to detect the frustration level of those enticed by catchy titles and page ranking when each open instance is shown to be closely related content that is of questionable value and repetitive citing of the essentially the same source.
I have run into such an experience, that I will attempt to describe below. However, had I not gathered better code samples earlier I might not have realized how poorly the rankings were. Moreover, had I not tried multiple times to reach those better samples that I had found earlier on a known site, I would have passed this off as another instance of my being insufficiently skilled in construction of search queries. Generally the areas most likely to maintain result skewing are those with limited readership, where some expertise is required to recognize the poor quality.
Incentives to Cheat
Implicitly Google monetized search rankings to improve its own sales pitch on ad placements. Hence, sites (and individuals) having high page rankings could profit directly from enhanced traffic volume directly via advertisements or indirectly by gaining reputation (e.g. consulting, custom coding, etc.) via views. Therefore, the incentive to attract attention by enhancing the probability of falling near the top, for some, has a significant financial pay back.
Seems to be more of the same. Mining using free labor while gathering more detailed information [5.] on Google search users to enhance further Google's cash flow from it advertising. At least some of which will identify fraudulent rankings. In my opinion this alone will not suffice, the incentives are too great and for some Google's use is rationalization for their dishonesty.
I have more instances where I am dubious of the result set. However, in these other cases, despite many subsequent attempts, I could believe my query writing skills might be the explanation. For instance, my searches consistently failed to bring up the long article that described Microsoft's confiscating the name Internet Explorer [6.] from it rightful owner. Another is the VA's free medical records program, Vista [7.]. Nonetheless, that explanation does not suffice when inferior Javascript code cross referenced from too many sources pushed junk above better quality content.
Limitations
Since there were multiple searches attempting to obtain better set of results over a considerable period of time, it is impossible to reproduce the rationale. Moreover, at best I can only cite a fraction of the result sets, because over time many were dumped.
Dubious, Early Results
The matchHeight function code, initially seems to be a reasonable attack of the problem. However, its syntax seems, at least partially, to have been depreciated. For example, the better syntax that I find works is:
function matchHeight() {...}
not
matchHeight=function() {...}
I found it disconcerting too that so many of these more recent searches persisted in bringing up older citations when I had already seen simpler, better code. Nonetheless, here is a near reproduction of the same code (even citing the author above, though not linking) while claiming the code works. This latter example is more recent, however, I have my doubts, since I am not the only person that has had problems getting the syntax to run.
This should be seen as a subset of the section immediately above, because these results probably came using sitepoint.com name [8a.] as part of the search criteria. While the justification for the work cited and changes seem perfectly reasonable, I am put off by its inclusion of the sitepoint site as part of the page title [8b.]. However, it too might have been innocent were it not appearing on a site differing from its supposed location, i.e. miamiconservanc.org not Cross-Browser.org as it states in the text [9.]. Nonetheless, what finally put me off was the pushing of an unseen library for which I had limited use. Had I not had problems locating and downloading a copy I still might have taken this route.
Regarding the library, I doubted that I would code extensively using Javascript, hence, hiding critical code in a library made it more difficult to understand the workings of the language made little sense. Moreover, the functions required, particularly for my more limited use seemed to be overkill. However, the actual code using the X library had a clean appearance. Thus, at least initially I was attracted towards using the library [10.] [11.].
Wrap Up
I ran many searches in attempts to find Javascript code solving the differing column heights problem. I rejected those that focused on use of cascading style sheets, either using explicit static or dynamic changes, out of hand. I broadened the search that focused on Javascript itself and other variations. Not all results were deficient, indeed, some I neglected I regret now not reading at the time. However, many seemed to include the cross-browser site and its library. Thus, one of the last, sitepoint's "Quick Tip" on Equalizing Column Heights has the same X library code I mentioned above and explicitly links back it and the Cross-Browser site.
Suspicions
Personally, I think, at least partially, given the quality of some of the highly ranked search results, Google has been gamed. However, I would not assert that my citations prove conclusively, every instance is unequivocal. Indeed, the X library was rejected due to my predilection for understanding the code fully and my not seeing the need given my current limited use of Javascript. Nonetheless, the scale of cross referencing seems to be well beyond reason for some of the highly ranked results, particularly when it fails to function. I also advise the point that these results are of interest to a small subset of Google search users. Therefore, it takes a smaller group to skew the results.
Despite my personal certainty, with the Javascript case I am less certain on content that disappears that has wider interest, e.g. the theft of the Internet Explorer name and the use of the name Vista for Microsoft's latest OS when a free, open source version of a medical records system predated MS's use. I may be creating poor search strings, particular with respect to the latter, however, the former should have brought something up if Google searches are to remain the best available. Add to my own suspicions that Google is allowing users to rate results and my suspicions can be seen to warrant some credibility.
I have seen, at least, two inflammatory headlines promising a new killer search engine. I tend to discount these site provocateurs, since I know they are just seeking to maximize their view count. Indeed, the first supposed killer was Ask.com, that I found failed on a mundane, non-technical topic. Ask responses were dead wrong, substituting rumor and innuendo for fact. Never bothered to retest. Another was Cuil.com that was no where near the state it was reported to have obtained with many results being no more than dead links and led to nothing. Moreover, on first viewings even the result search pages did not function properly [12.]. Thus, in both cases, premature claims of superiority essentially poisoned the well for me and I suspect for others.
However, what I found surprising was the vehemence with which some posters greeted the supposed new found search champion, despite the recognition the new combatant still lacked the full attributes to effectively compete with Google. I think this shows an under current of dissatisfaction with Google that could be mined. That is, should an effective competitor appear that came close to matching the PR and glowing attributes from a too adoring internet press.
Recent Searches & Results
Beside the searches pertaining to the Internet Explorer and Vista name origin, I had a few unrelated Google search result sets that elicited differing responses from me. One seemed to hold obvious paid placement results that I skipped over, whereas the other search yielding surprisingly good information. That is, well beyond my hopes. Therefore, while I expect to look an new entrants, I expect to return to Google if the new competitors are not markedly superior.
My Case
I wish that I could have built a stronger with a more complete set of suspicious actions. Indeed, were I the reader I am not sure I would be convinced. However, it was not my original goal to research the veracity of Google search results, I was merely seeking to re-find better results I knew to exist. For a long period I assumed the fault lay solely with my deficient search skills. I ran repeated attempts to recover the better code many times throwing away both the failed search strings and results . It took too long to recognize that the code I was running across were near duplicates. Furthermore, a longer period passed until I sensed the results were being subverted. Therefore, I inadvertently dispensed with the best examples that would have built a stronger case.
Indirect Support of My Premise
Recently I noticed a story with an official confirmation that Google is likely to be using some users' custom search preferences [13.]. I took this to imply that Google too thinks it has to find other means than software algorithms to evaluate tampering attempts. For the skeptical this will not suffice, however, I think that subclass of individuals are obligated to suggest some more likely driving force. Google knows it could be displaced by a better search engine, hence, it is its paramount interest to maintain its status as the best tool available.
Suggestions for Better Google Results
Up front and direct, I am going to punt on this. I have already expended too much time and effort that yielded only marginal results. This effort diverted me from more important issues and held up my completing the article on the column heights. Moreover, I now recognize this article to be a sink that spent too much of my time and energy. However, the biggest reason I will not put further efforts further is the realization that I am an unlikely candidate to consider pursuing the ideas I might broach. Finally similar ideas have failed elsewhere, even where direct financial rewards were absent.
Whenever one seeks corrections, the negative response by the receiver must be expected. The standard argument is that it takes time to identify those that can be trusted. I think too often those vetted are the wrong individuals, hence, it is likely my suggestions would be doomed to fail. Let me cite just a few instances where with the best of intentions the process has run amuck. Wikipedia, where moderation failed to the point where individuals cannot even correct (update) their own pages. Another instance is Slashdot moderation and comments. Too many moderators have their own agenda, hence, in too many instances I have seen preference for suppression and for misinformation. In most instances, the noise to signal ration just became too high. Finally, eBay, with its direct financial rewards, initial high ranking could and have devolved into gross cheating. The inescapable conclusion is that those attempting to subvert a system inherently have the greater incentive to participate [14.].
My Criteria for Google
Personally, from past experiences I will require another search engine to significantly surpass the quality of the search results to consider changing my default choice. That is, as good as ..., is not sufficient. In the past I found to my surprise, Google phone number and location search better than Yahoo or Whitepages. eMail, I find gmail is better than my cable ISP. My spam load has always been less on gmail than via cable. Moreover, my ISP ignores all spam that originates off its network, even when the spam is spoofing its corporate name! Recent experience has given me zero spam reaching my inbox in Thunderbird for gmail, whereas it has been light on my ISP (my own filters have played a major role in that success) but has been rising lately. My use of Google maps and directions has been on a steady uptick as I note its monotonic improvements. These are all Google applications I will not easily abandon.
There are, however, Google products I avoid. For example, Chrome being released as a Windows only product was not the coup de grace, to justify my determination to avoid its use. Firefox gives preference to its Windows product in the development cycle. There are, however, two big "howevers" here. The first is upon release Firefox for Linux works as well or better than the Windows version. Second, I do not believe the overly extravagant claims of ownership by Google were unintended overstepping of bounds. In too many other areas, Google has pushed its entitlement over user rights to their (the user's) creations and data. The second is Android, which I must immediately admit is moot in my case, since I seem to be immune to the attraction of a smart cell phone technology. In these two instances, for differing reasons I am unlikely to become a buyer for the reasons cited.
Final Conclusions
While, admittedly, my arguments are not air tight, I think Google could lose its advantage if too many perceive the search results being tweaked to push inferior results. So too can obvious buyers of commercial considerations given a preponderance over better competitors could devalue Google's credibility. I do think the Google is aware that its results are being gamed and sees itself in a war against these adversaries. However, if it holds its narrow interests as paramount, then its actions could seen as too self serving. That pattern could cause disaffection among many clients it would like to retain. It may be hard, but I think Google needs human allies to document and identify those that purposely corrupt the search engine results. It may have to share part of the wealth too, if Google wishes to maintain its leading status is search. I suspect Google may be focusing too much on milking all sides. Nonetheless, if Google search remains the best available there will be no cost. But if it loses that position, even in a niche it may have longer term negative consequences.
It should be obvious to any reader that made it this far into the text, I am disappointed with the results of my efforts. Therefore, I invite any reader, who is sufficiently motivated to better my efforts. Publish your take wherever you please, however, I will gladly post it under your name on this site, provided the text is comprised of more than an opinion. Thus, I would expect citation and proof that it is not simply a personal view.
There is a second part to my invitation, for those interested. Send my an email with a summary of your near the publication article and the length of your text. That is, I am offering to do critical reading, of course, you are free to reject my offer and suggestions. However, this offer is absolutely contingent upon my having the time and interest in your article. You need not agree with my premise, I make no claim my suspicions are uniformly true. Therefore, feel free to contact me if you think I can offer a useful input, independent of where you intend to publish.
Corrections, suggested extension or comments write: H. Cohen.
© Herschel Cohen, All Rights Reserved
____________________________________________________________________
1. See my last article, where I described how I lost
better javascript code. Return
2. This is an incomplete list, however, it shows Google was
worried:
On the Wires, one of my earlest I noticed.
This is a replacement for an AP article that vanished from
the NYT web pages.
A more technical source where Google was still claiming
only for personal use, which now has been admitted
is no longer true.
Return
3. Unfortunately, the whole exercise has been a costly
diversion from more constructive tasks. It took too
long to recognize long to recognize the core problem.
Return
4. I think too that I have thrown away some search results
that could have made a stronger case. Return
5. The cynicism comes from this quote: "... Is this a cunning
way to encourage people to sign in while they search, thus
providing Google with a richer set of data that can be
mapped to specific user accounts?" Return
6. I still retain the opinion that some early mention should
have been given to the original Internet Explorer and its
subsequent theft by Microsoft. Return
7. Again, In my opinion, the U.S. Veterans Administration
program for medical records should have shown up in any
search using the name Vista. However, I suspect had I
used VA as partof the search it might have surfaced.
Return
8a. Originally a combination of sitepoint.com and slashdot
ledme to the better code. All subsequent attempts failed.
Return
8b. If my memory is correct, this item ranked higher than the
having the sitepoint name included. It appears some
search engine optimization, at minimum has been employed.
In addition there is a direct link to the site makes
title more suspect. Return
9. I might be too suspicious, where the latter is just playing
the role of hosting multiple sites. Return
10. To show that I am not overly suspicious, here is the link to
the real cross-brower.com site.
Return
11. I have never used the library, hance, nothing said here is
meant to be taken as attack on its quality. Return
12. More recent view has shown marked imporvement, though I
still am not sufficently enticed to check the links. First
impressions are important, while I have made some return
visits I still am unlikely to switch. Return
13. If you wish to view Google's Wiki this was an early
link. More recent indications are some users'
assessments will become part of Googles' search results.
[Repeated, in case the link in the text was missed.]
Return
14. I now think this is an unlikely project for me, because
counter moves to all my suggestions are too easy. Moreover,
I suspect the efforts will not be appreciated. Return
____________________________________________________________________