Our Blog


SEO, Part V: Pagerank Doesn’t Matter

Wait, how can I tell you that pagerank doesn’t matter? Wasn’t I just talking about the importance of getting links from high-ranking sites? Didn’t I say that pagerank is a measure of how important your site is?

The above is all true. You do need to get links from high-ranking sites, and Google does use PR as a measure of usefulness. What it doesn’t tell you, however, is how relevant the page is. Let me explain.

Suppose you do a search for “how to shampoo the dog”. What would be more useful to you: a PR0 site with detailed dog-washing instructions, or a PR6 site selling dog shampoo? Obviously the former is a lot more useful to you, because it’s relevant to your query.

 

Similarly, while high pagerank is nice to have, your users don’t really care about it; they just want your site to give them what they need. Similarly, you just want your site to come up at the top of the SERPs (Search Engine Results Pages) so your customers can find you. Thus, you want to get links from high-ranking, relevant sites because they count as a strong vote that your page is useful.. provided they’re done right.

Next question: suppose you’re the owner of that webpage on how to shampoo a dog. Which would be more useful to you: a link from a PR2 page using the anchor text (that’s the words you click on) “how to shampoo your dog” or one from a PR6 page with the anchor text “click here”? Again, the former is more helpful; while the latter is better for increasing your pagerank, the anchor text only helps you to rank well for the term “click here” (which Adobe dominates), while the former tells Google what your site is about and helps them to return relevant results.

So do you care about getting a high PR on your sites? Well, yes – more pagerank is always preferable to less pagerank, and having a high-ranking site also speaks to the usability of new pages on that site that don’t yet have their own backlinks. Just remember: relevance trumps numbers!

In fact, Google has explicitly said that the PageRank system is one over more than 200 signals they use to index and rank pages. Remember, what they’re trying to do is find relevant, high-quality content; everything else is just a means to that end.

LAMP, WAMP, MAMP

If you’ve been involved with internet content for any length of times, you’ve probably seen the term LAMP, which is an acronym for Linux, Apache, MySQL, and PHP. (For WAMP amd MAMP, replace linux with windows and mac, respectively).

What’s so special about these four things? They’re all you need to build a general purpose web server. (Sometimes Perl or Python will be used instead of PHP). Along with HTML/CSS and AJAX, you have all the tools you need to build interactive websites. So why do we use these technologies?

 

For one, LAMP is cheap. Linux, Apache, MySQL, and PHP are all free to download and use, and tend to be available on any halfway-decent web host. In most cases it’ll all be set up for you in advance, making it easy to get started.

Linux, of course, is your operating system; generally you’ll have the choice of hosting on either a linux or a windows box, but computer geeks tend to prefer linux.

Apache is a web server (actually, the most popular one in the world). It’s been around for over 15 years.

MySQL is a multiuser database management system using SQL, which is available under the GNU license. SQL is an easy-to-use relational database that forms the basis for most modern database systems.

PHP, which recursively stands for PHP Hypertext Processor, is a server-side scripting language that can be embedded into HTML; it’s generally used to create dynamic webpages.

When hiring a web design expert to create your dynamic website, you’ll want to be sure to find someone who’s familiar with all of the above technologies, as well as HTML (preferably HTML5), CSS, and possibly AJAX; together, these technologies allow you to create a clean, fully-functional interactive website.

Building a Blog, Part VI: The Importance of Permalinks

If you’re reading this from the main site page, go ahead and click through to the article. Now look up at the url for this page. What do you see? The link looks something like this:

http://blog.oneearproductions.com/2010/07/building-a-blog-part-vi-the-importance-of-permalinks

Now, if we were using the default WordPress settings, it would actually look like this:

http://blog.oneearproductions.com/?p=316

Which version tells Google what the page is about? Bingo! You always want to use the first type of link because (assuming you picked a good post title), it tells people and search engines what your post is about.

 

Doing this in WordPress is easy; just go to the settings dropdown and click permalinks. You’ll see a half-dozen options; we’re using year, month, and post title, but anything works as long as it includes the title. If you want to make your own custom style, use the last option; just be sure to include %postname% in your url template.

How does this affect your SEO results? Suppose Google is indexing this page (which will happen about two minutes after I hit “publish”). First it reads the url and sees the keywords blog and permalinks. Then it reads the page and sees those same words repeated again. Bingo: Google concludes that this webpage (that is, this post) must be related to those terms, and this is a relevant result for people who want to read more about them!

If I was trying to  make this page show up in the first SERP, I’d start by making the page search-engine friendly (as well as user-friendly); a relevant url is one way to do that. After that, I’d get started building links to this page from relevant websites. Notice that the link in that last sentence uses anchor text that includes the keyword (links) for the page it’s linking to; that page contains the same keyword in the url and title (and, of course, in the body of the post). At this point, Google has a really good idea what that page is about!

Remember, the whole point of search engines is to help users find the most relevant results; accurately describing the content of your pages helps them match the correct page with the correct user. While some people try to abuse the system to get as many visitors as possible to worthless spam pages in the hopes of getting advertising money, when running a legitimate site you want to attract exactly those users that are looking for the content you can provide. Why waste bandwidth (and people’s time) on users who don’t want what you’re offering? Google, of course, is always on the lookout for spam sites (and has human evaluators, as well as software, to help find them), and will happily remove them from the search results. Stick with white hat SEO; don’t let delisting happen to you!

SEO, Part IV: More on Links

In SEO Part II: Links, we discussed the importance of incoming links from high-ranking sites in the same area as yours. What other factors influence the value of incoming links?

Aside from relative popularity, some sites are considered by Google to be particularly authoritative; Wikipedia is one example, but there are others. Pages on Wikipedia can outrank pages on other sites that have a lot more useful information, due to the site’s importance. A vote from these “expert sites” is thus worth more than most. You can read more about this by Googling ‘Hilltop Algorithm’.

Other good sites to get backlinks from are those with .edu or .gov TLDs. While these pages don’t actually get a pagerank boost (at least, so says Google), they do tend to be considered more authoritative, so backlinks from them increase Google’s estimation of your site’s trustworthiness.

The value of links from web directories varies; generally you want to get links from human-created directories that are well organized and link to other good sites, while avoiding those that are automatically created. Of course, you’ll also want to check that the links aren’t using the nofollow attribute, so they pass along link juice.

As previously mentioned, it’s helpful to be one of just a few links rather than one of thousands, so as to get the largest possible share of link juice. Also, remember that Google considers content closer to the top of the page to be more important; thus, it’s good if your link shows up close to the beginning of the HTML file (which is not the same as showing up at the top of the page when displayed – see the note on CSS in that last link!)

It can be discouraging to go to a lot of effort getting new links without seeing a corresponding increase in your pagerank, but remember that, like sites, links need to “age”; a backlink that’s been around for a while is worth a lot more than a new link that could be gone tomorrow. Slowly building links over time tends to lead to a gradual increase in the pagerank of your site.  But be sure to read part five about when pagerank doesn’t matter!

SEO, Part III: What Comes First?

When Google indexes a page on your site, it assumes that whatever is at the top of the page is most important. So what’s at the top of the page? Probably your navigation bar!

Obviously, we’re more interested in having Google index the page content than the page navigation, but on the other hand, the navigation bar is important for users. How do we resolve this?

Remember when I said that your site generally shouldn’t look different to users and search engines? One exception is when you’re displaying the same content, but they read that content differently. Enter CSS!

Suppose your content is first in the html file, followed by your navigation, but your CSS style forces the navigation bar to be rendered at the top of the page. Your users see the page as you intended; Google, however, ignores the style and parses the page in the order it appears in your HTML file. Everybody wins!

To make this work, your HTML body will look something like this:

<body>
<div id=”content”>Your important stuff goes here</div>
<div id=”navigation”>Your nav bar goes here</div>
</body>

Meanwhile, your CSS style will define where each div goes in absolute terms:

#navigation {
position: absolute;
top:  10px;
<!– other alignment code –>
}

#content {
position: absolute;
top: (10+size of nav bar)px;
<!– other alignment code –>
}

Congratulations! You now have the layout you want, while still putting your most important content where Google wants to see it!

Next time: more on keywords.

SEO, Part II: Links

So now that we’ve mentioned a few of the things you shouldn’t do, what are some search-engine-approved ways of driving traffic to your site? We’ll focus on Google, since they have the majority of the search market in western countries, but most of what we’ll cover applies to other search engines as well. Today we’ll talk about links and their effect on how Google views your website.

Everyone knows that incoming links are key, but what people don’t always realize is that the quality of those links are also important. Let’s take a quick look at the Google pagerank algorithm.

Every web page indexed by Google has an ever-changing rank between 0 and 10 that indicates how important, useful, and trustworthy it is; of the top 100 US sites according to Alexa, the average pagerank is about 7.5, and with the exception of the adult sites, all have PR at least 6; on the entire internet, there are fewer than two dozen sites that have a PR of 10. (By the site’s PR, we mean the PR of the home page; the PR10 sites generally have multiple pages with that rank). Pagerank uses a logarithmic scale, so as the value decreases, the number of websites with that value increases exponentially. Note that the integer pagerank is only an approximation and can change whenever Google does a PR update, which generally happens at least once a quarter.

Google considers each link from a webpage (except those with the nofollow attribute) to be a “vote” for its destination; the total PR of the page is divided by the number of dofollow links. For example, if a PR6 page has 30 dofollow links, that page contributes a 0.2 “vote’ to each page it links to.

People often try to abuse this system by buying hundreds or thousands of links, a practice frowned upon by Google; their official policy is that sponsored links should use the nofollow attribute. “Organic” links, however, can be very valuable. Let’s look at what does and doesn’t work.

Obviously, links from PR0 sites, particularly those that have a lot of outgoing links, aren’t worth much; ideally you want to have links from higher PR (4 or above) sites that don’t have many other links on the same page. Additionally, links are worth more if they come from a related site; if your site is about X and you get a link from a highly-rated site that is also about X, that link is worth more because Google considers the other site to be an authority on the subject. This can be a problem for ecommerce sites because related sites are likely to be your competitors and their owners won’t be interested in helping to increase your pagerank!

One thing Google watches for is the speed at which links are acquired; a site that goes from 8 incoming links to 20,000 incoming links overnight probably is paying for those links, so they’ll be discounted. Again, Google wants to see “organic” links – those that come about naturally because your site is useful to users of the referring site – and tries to reward them. Google also looks at the “neighborhood” your links are coming from; links from spammy sites tend to be discounted.

Another thing to watch for is the anchor text used for your link; a link of the form <a href=”yoursite”>relevant text</a> is worth a lot more than one that looks like <a href=”yoursite”>click here</a> (although Google does have algorithms in place to try to catch Googlebombs). Of course, if every incoming link to your site uses the same anchor text, that’s another sign Google can use to catch paid links; it’s also a wasted opportunity since you want to rank highly for multiple ways to phrase the same thing. Notice that webpages can even rank highly for terms that don’t appear on them (as in the George Bush / miserable failure Googlebomb) if enough links use that phrase; for example, searching Google for the phrase “click here” will bring up the page to download Adobe Reader, due to the many, many sites that tell their users to “click here” to download it.

More recently, Google also started looking at a site’s outgoing links; a site that mostly links to trustworthy sites such as wikipedia is likely to be regarded more highly than one that links to spammy sites. In other words, Google judges you by the company you keep!

Tomorrow: how to code your webpages to make Google happy.

SEO, Part I: White Hat vs Black Hat

Today we’ll take a brief intermission from HTML5 to start our discussion of Search Engine Optimization, commonly known as SEO. SEO is the art of getting your website highly ranked in search engines, in order to increase traffic. Getting 50% or better clickthrough from users requires being one of the first seven results; in this series, we’ll discuss how to do that.

SEO is generally broken down into two types, white hat and black hat. White hat techniques are search-engine approved methods of improving your site’s rating; black hat techniques are likely to lead to a brief improvement, followed by a quick ban. In general, black hat techniques are those that increase traffic to your site but create a poor user experience, and are often considered unethical. Today we’ll discuss black hat techniques so that you can be aware of them. Again, you should NOT use these methods to increase traffic; the search engines have ways to detect them and will penalize your site.

Keyword spam
One popular technique is to place a number of keywords on the page such that the search engine will see them but users will not. Usually this is done by setting the words to be the same color as the page background or a very small font size, although there are a number of other methods. Spammers use either keywords that are related to the page (to artificially increase it’s rating) or unrelated (to suck in people for whom the page is not useful). Search engines hate this. As a general rule of thumb, don’t do anything to make the page look different for humans and computers (but there are exceptions – we’ll cover one in part III).

Cloaking
Cloaking is when, rather than display the page differently to users and search engines, the webmaster actually shows them different pages using methods such as javascript redirects and 100% frame.


Doorway pages
A doorway page is when the main page appears to direct the user into one of a number of other pages based on some criteria, but every destination is actually the same. For example, the main page may contain a link to another page of the site for each state, but the pages do not actually contain any state-specific information; the point is just to get users to load more advertisements. Again, search engines consider this to be spam; you shouldn’t have duplicate pages on your site. The term is also used for pages what are created to be visible only to search engines.


Computer-generated pages
You’ve probably seen webpages that seemed to be on-topic, but when you read them they were poorly written and didn’t make a lot of sense. Often these pages are computer-generated, pulling content from various sources. Not only do these pages tend to be fairly worthless to the user, they’re a strong spam signal for the search engine.
This isn’t even close to being a comprehensive list, but hopefully you’re getting the idea: don’t try to trick the search engines. Next time: Google-approved SEO.

HTML5, Part III: Local Storage

In the beginning, there were cookies. The cookies were tasty, and removed the need to log in to websites every time, although some people didn’t like them due to privacy concerns. But cookies were sent with every HTTP request, usually unencrypted, slowing down the session and possibly exposing sensitive information. Plus they could only hold 4 kb of data, which is fine for remembering someone’s username but not much more.

Then HTML5 appeared, and it had in its specification a new thing, Web Storage. Web storage was like cookies, but it was never sent to the remote server unless specifically requested, didn’t require a third party add-on, and was big enough to hold useful information. And this was good.

HTML5 storage is currently supported in the latest version of every major browser, but as we discussed yesterday, you can check if it’s available using the Modernizr script and the Modernizr.localstorage bool. Your size can use up to 5 megabytes to store your information, though you unfortunately have to store it as strings, which can greatly expand the size of your data.

There are two storage options you can use: sessionStorage and localStorage. SessionStorage information will be lost when the window is closed, while localStorage info is permanent (until deleted). The code is fairly simple:

localStorage.setItem(‘itemname’, ‘data’); // define and set the variable

localStorage.getItem(‘itemname’); // return the data; notice that this isn’t a complete statement as you haven’t done anything with the data. A complete statement would be along the lines of

alert(“Your data is ” + localStorage.getItem(‘userdata’);

If you don’t need the data anymore, you can get rid of it withlocalStorage.removeItem(‘itemname’);

SessionStorage works the same way; just replace “local” with “session” in the code above. You can also store content in an SQLite database and use standard SQL inside of your script:

database.executeSql(SQL command) {
// do something with the results
}

As with cookies, local storage comes with some privacy concerns. Anyone putting content on multiple sites (most likely an advertiser) could use the store to track a user across multiple sessions, building a fairly detailed profile. As with cookies, the user has the option of refusing to allow data to be stored or expiring it after a given amount of time. Additionally, it may be possible to access another site’s information through spoofing if SSL is not used. One restriction to be aware of is that multiple pages on one hostname (e.g., free website hosting such as the now-defunct geocities) share a local storage object, so any page can access and overwrite information stored by other pages on the same host.

On a side note, Google previously implemented similar functionality with Google Gears, but they are now abandoning it in favor of the local storage and geolocation options in HTML5.

HTML5, Part II: What Features Does My Browser Support?

Last week we dived right in to our discussion of  HTML5 with a quick tutorial on how to embed video, but we also mentioned that modern browsers support a subset of the new markup – no browser is yet 100% HTML5 compatible. So how can you tell (aside from running around looking up references) which features are supported?

When your computer downloads a webpage, it starts by creating the Domain Object Model (DOM), which is a collection of objects representing all elements on the page. If a browser supports a particular element or feature in HTML5, its DOM will contain that information. While you can check for each element you need individually, MIT has provided a javascript library that detects support automatically. After downloading it, you’ll need to include the following element inside the page head:

<script src=”modernizr.min.js”></script>

 

The script runs automatically when the page loads and creates an object called Modernizr that you can test to see which features it detects by checking the truth value of the appropriate boolean.  For example, to check whether the browser supports HTML5 Audio, you could use a line in your javascript saying:

if (Modernizr.audio) {

The website contains sample usage for each of the items it checks. Additionally, it helps keep things from breaking in Internet Explorer. By using Modernizr, you can serve the appropriate HTML5 code to modern browsers, while providing other content that will make sense to older browers.

Remember last week when we talked about  putting three different <source> tags inside a <video> element? Another option would be to use Modernizr to check what type of video the browser is capable of playing:

if (Modernizr.video) { // Does the browser support HTML5 video?

if (Modernizr.video.ogg) {

//play Ogg Theora video / Vorbis audio in an Ogg container
} else if (Modernizr.video.h264){

// play H.264 video / AAC audio in an MP4 container

}

}

Now that we’ve seen how to figure out what features the user’s browser supports, let’s get back to things that most browsers do. Next up, something every modern browser (even IE 8) supports: local storage!

HTML5, Part I: Video

I had originally planned to finish off my Web 2.0 Overview before getting into the specifics of each technology, but yesterday’s introduction to HTML5 got a lot of attention, so we’ll move it up a bit.

HTML5 supersedes HTML 4.0, XHTML 1.0, and XHTML 1.1; it provides new tags for handling many common web design elements that are currently handled with third party applications such as Flash, and standardizes elements that have never been formally documented.

What do you need to get started?
The latest versions of Chrome, Firefox, Opera, and Safari all support some but not all HTML5 features; Internet Explorer 8 has a few features and IE 9 is expected to add more. As always, when designing for Internet Explorer, be particularly careful about elements that compete with Microsoft’s own offerings – for example, IE 8 does not support the canvas tag, which would replace Microsoft’s Silverlight technology. In this series, I’ll attempt to focus on those elements that are supported by all of the major browsers, or at least mention the exceptions.

Show me the video!
Let’s start with how to embed video in a webpage. Traditionally this has been done with a third-party plugin such as QuickTime, RealPlayer, or Flash; now we have a standards-based method that should be supported on all platforms. Currently the <video> tag works in Chrome, Firefox 3.5, IE 9, Opera, and Safari 3; however, each browser supports different codecs and containers. Unfortunately, there is no one container/codec combo that works for all of these browsers, so if you want your video to show up for everyone using an HTML5 browser, you’ll need to encode it multiple times. Fortunately, HTML5 allows you to specify several videos in one video tag, leaving it to the browser to choose which one to display. To have the video display on all browsers, you’ll need the following encodings: (Converting video into the correct formats is one service a professional website designer will provide)

  • Theora video and Vorbis audio in an Ogg container
  • H.265 video and AAC audio in an MP4 container
  • VP8 video and Vorbis audio in a WebM container

Finally, to accommodate browsers that do not support the <video> tag, you’ll want to fall back to a Flash video.

So give me the tag details, already!
When you only have one video file, the tag works pretty much as expected; however, it has a number of optional (but desirable) attributes:

<video src=”videoname” width=”XXX” height=”YYY” controls> </video>

So what’s all this? The src part, width, and height should be fairly self-explanatory;  width and height are optional but should be used anyway. By default, the video tag does not offer the user any type of control over the video; the controls option adds a built-in set. Alternatively, you can write your own; the video element has methods play() and pause() and read/write properties volume, muted, and currentTime.

If you expect every visitor to watch the video,  the preload option will start it downloading as soon as the page loads; setting preload=”none” will tell the browser not to load the video until it’s requested. Finally, the autoplay option does exactly what you would expect; it downloads the video and starts it playing as soon as possible.

Several videos, one tag
What if you went ahead and made three encodings of your video so everyone using an HTML5 browser can see it? In that case, your code will look something like this:

<video width=”xxx” height=”yyy” controls>

<source src=”video.mp4″ type=’video/mp4; codecs=”avc1.42E01E, mp4a.40.2″‘>

<source src=”video.webm” type=’video/webm; codecs=”vp8, vorbis”‘>

<source src=”video.ogv” type=’video/ogg; codecs=”theora, vorbis”‘>

<object width=”XXX” height=”YYY” type=”application/x-shockwave-flash” data=”__FLASH__.SWF”>

<param name=”movie” value=”__FLASH__.SWF” />

<param name=”flashvars” value=”image=__POSTER__.JPG&amp;file=__VIDEO__.MP4″ />

</object>

</video>

Do you really need all that? Well, no…but if you don’t specify exactly how each video is encoded, the browser will find out whether it can play a video by downloading it and trying to play it, wasting your bandwidth and the user’s time. Better to put in a little extra effort beforehand! The object tags will be ignored by HTML5 browsers (as will any non-source tags inside the video tag) but will cause older browsers to display the flash movie.

Side note: the MP4 file format should be listed first, as the iPad only notices the first source listed and will fail to play your video otherwise.

Acknowledgements:
Video for Everybody offers a more in-depth discussion of setting up your videos to be readable by everyone,

HTML5: Up and Running (due out next month from O’Reilly) contains a fascinating discussion of the history of HTML and instructions on encoding video.