Is a link analysis algorithm used by the Google Internet search engine. The algorithm assigns a numerical weighting to each element of hyperlinked documents on the World Wide Web with the purpose of “measuring” its relative importance within it. According to the Google theory if Page A links to Page B, then Page A is saying that Page B is an important page. If a page has more important links to it, then its links to other pages also become more important.
PageRank was developed at the Stanford University by Larry Page (thus the term PageRank is after him) and Sergey Brin as part of a research project about a new kind of a search engine. Now the “PageRank” is a trademark of Google. The PageRank process has been patented and assigned to the Stanford University, not to Google. Google has exclusive license rights on this patent from the university. The university received 1.8 million shares of Google in exchange for use of the patent; the shares were sold in 2005 for $336 million.
The first paper about the project, describing PageRank and the initial prototype of the Google search engine, was published in 1998: shortly after, Page and Brin founded the company Google Inc. Even if PageRank now is one of about 200 factors that determine the ranking of Google search results, it continues to provide the basis for all of Google web search tools.
Since 1996 a small search engine called “RankDex” designed by Robin Li has already been exploring a similar strategy for site-scoring and page ranking. This technology was patented by 1999 and was used later by Li when he founded Baidu in China.
There is some basic information, which is needed to know for understanding PageRank.
First, PageRank is a number that only evaluates the voting ability of all incoming (inbound) links to a page.
Second, every unique page of a site that is indexed in Google has its own PageRank.
Third, internal site links interact in passing PageRank to other pages of the site.
Forth, the PageRank stands on its own. It is not tied in with the anchor text of links.
Fifth, there are two values of the PageRank that should be distinguished:
a. PageRank which you can get from the Internet Explorer toolbar (http://toolbar.google.com);
b. Actual or real PageRank that is used by Google for calculation of ranking web pages.
PageRank from the toolbar (sometimes called the Nominal Pagerank) has value from zero to ten. It is not very accurate information about site pages, but it is the only thing that gives you any idea about the value. It is updated approximately once every three months, more or less, while the real PageRank is calculated permanently as the Google bots crawl the web finding new web pages and new backlinks.
Thus, in the following text the term actual PageRank is employed to deal with the actual PageRank value stored by Google, and the term Toolbar PageRank concerns the evaluation of the value that you see on the Google Toolbar.
The Toolbar value is just a representation of the actual PageRank. While real PageRank is linear, Google uses a non-linear graph to show its representation. So on the toolbar, moving from a PageRank of 2 to a PageRank of 3 takes less of an increase than moving from a PageRank of 3 to a PageRank of 4.
This is illustrated by a comparison table (from PageRank Explained by Chris Ridings). The actual figures are kept secret, so for demonstration purposes some guessed figures were used:
If the actual PageRank is between
The Toolbar Shows
|0.00000001 and 5 6 and 25 25 and 125 126 and 625 626 and 3125 3126 and 15625 15626 and 78125 78126 and 390625 390626 and 1953125 1953126 and infinity||1 2 3 4 5 6 7 8 9 10|
Lawrence Page and Sergey Brin have published two different versions of their PageRank algorithm in different papers.
First version (so called the Random Surfer Model) was published on the Stanford research paper titled The Anatomy of a Large-Scale Hypertextual Web Search Engine in 1998:
PR(A) = (1-d) + d(PR(T1)/C(T1) + … + PR(Tn)/C(Tn))Where PR(A) is the PageRank of page A. d is a damping factor, which is set between 0 and 1, nominally it is set to 0.85. PR(T1) is the PageRank of a site page pointing to page A. C(T1) is the number of outgoing links on page T1.
In the second version of the algorithm, the PageRank of page A is given as:
PR(A) = (1-d) / N + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))Where N is the total number of all pages on the Web.
The first model is based on a very simple intuitive concept. The PageRank is put down as a model of user behaviour, where a surfer clicks on links at random. The probability that the surfer visits a page is the page PageRank. The probability that the surfer clicks on one link at the page is given by the number of links at the page. The probability at each page that the surfer will get bored and will jump to another random page is the damping factor d.
The second notation considers PageRank of a page the actual probability for a surfer reaching that page after clicking on many links. The PageRanks then form a probability distribution over web pages, so the sum of all pages PageRanks will be one.
As for calculating PageRank the calculations by means of its first model are easier to compute because the total number of web pages is disregarded.
Dear friend of technical English,
Do you want to improve your professional English?
Do you want at the same time to gain comprehensive information about the Internet and Web?