Monday, July 30, 2007

what do search engine spammers look like?

You may think that search engine spammers look pretty much the same as anyone else and that is probably true, unless of course you are a spam detection algorithm.

At last weeks ACM SIGIR conference in the Netherlands an interesting paper was presented with the title “Know your Neighbors: Web Spam Detection using the Web Topology”.

Essentially this describes a spam detection system that uses the link structure of web pages and their content to identify spam. Or as the abstract puts it “In this paper we present a spam detection system that uses the topology of the Web graph by exploiting the link dependencies among the Web pages, and the content of the pages themselves.

The following impressive diagram appears in the paper:



This is a graphical depiction (for a very small part of the web) of domains with a connection of over 100 links between them, black nodes are spam and white nodes are non-spam.

Most of the spammers are clustered together in the upper-right of the center portion and here is a magnified view of that section:



The other domains are either in spam clusters or non-spam clusters. Here is a typical spam cluster and it shows what spammers, who indulge in nepotistic linking, may look like to a spam detection algorithm.

Of course this is only one line of research into spam detection but you don’t need to be clairvoyant to know that the major search engines have been including similar components in their ranking algorithms for some time. Good search engine optimizers avoid unnatural linking patterns and all site owners are well advised to do the same.

You can read the full paper here: http://research.yahoo.com/node/398/2821

Source: http://www.seo-blog.com/search-engine-spammers.php

Sunday, July 29, 2007

Search Engine Optimization for Baidu

Reveals Baidu Optimization secrets

Introduction

Baidu is most popular search engine in China. Google China is second to Baidu. Lots of Internet marketers in the West do not know much about Baidu, and assume its’ ranking algorithm is similar to Google China. It is purely based on the assumption that when it can beat Google in a large market, its’ ranking algorithm should be comparable to Google. Unfortunately, it is wrong. Baidu’s search result is mixed up with lots of paid links and natural links. It is difficult to distinguish them. In addition, the natural ranking algorithm is not very sophisticated and many spamming and illogical results can be found. According to a study by China Internet Network Information Center (CNNIC) in year end of 2006, Chinese users perceived that Google’s search relevancy is much better than Baidu.

Why Baidu can be No.1 in China?

If its’ ranking algorithm is much inferior to Google, how come Baidu can be more popular in China market? There are 4 possible reasons I can think of:

1. According to CNNIC, Google China has server down time and frustrated users.
2. Baidu is developed by local Chinese, driven by patriotism and language friendliness, general Chinese users tend to use Baidu.
3. Baidu was established before Google entered into China market. Thus, Baidu has first mover advantage.
4. Baidu is famous for its’ strong MP3 search and lots of young internet users are searching for songs in MP3 format daily.

However, CNNIC also revealed that white-collar urban professional in major China cities, and citizens with overseas study background tend to use and love using Google China. Usually, their spending power is much higher, and Google China has advantage in this niche.


You can also click here to learn more about Internet usage in China, the information is provided by CNNIC in year 2007.



Optimization Tips

1. Title and Meta Tags

Like Google, Title tag is also very important. Unlike Google, Meta description and Meta keywords tags are still very useful in improving ranking in Baidu. As always, we do recommend clients add meaningful Meta description and keyword tags because they are still important for some popular localized search engines.

2. Content

It is similar to other popular search engines. Your website copies should have keywords you want to optimize. The higher the keyword density, the better is the result. If your keyword density is too high, it can adversely affect search engine rankings in other search engines, however. Therefore, we recommend 6-12% for Baidu optimization.

3. Linking

Unlike Google, Baidu does not have a sophisticated algorithm to determine link relevancy and link quality. Quantity seems more important than link quality. Incorporating keywords in internal anchor text has some positive effect on Baidu ranking.

4. Content Language

Since Baidu is developed in Mainland China, if your site has simplified Chinese, you are easier to get exposure in Baidu.

People may also wonder if English keywords are used in China. From our experience, it really depends on your industry and targeted visitors. For example, English keywords are used by high income office workers, manufacturing and trading firms, or banking professional. If your target is general mass market, Chinese keywords are dominant in frequency of use.

5. Alt Tag

Alt tag with keywords incorporated into Alt text is good for Baidu optimization. However, it is not advised to stuff too many keywords inside.

6. Server

If your site is mainly targeted for Mainland China, we recommend you hosting your site in Mainland China. It helps your Baidu ranking significantly.

It is not essential to get .com.cn or .cn domain names, however.

7. Geographical Market

China is a very big company. Internet marketers are difficult to target every province and city of China. You must determine the location of your high value customers. If they are mainly based in Mainland China, your site should use simplified Chinese. If you are targeting Hong Kong and Taiwan, your site should use traditional Chinese.
Of course, it does no harm if you include both Chinese versions. If the wordings can be more localized to the city or province you target for, it can yield better conversion rate.

Also, Baidu is only popular in Mainland China, particularly in the northern part. In Hong Kong and Taiwan, Baidu is an insignificant search engine player.


Keyword Research for Organic SEO

So you have decided to venture out into the world of SEO. The first thing you will need to do is determine the direction of your campaign in relation to the key phrases you are choosing to target. This article will focus on how to find keywords for your organic campaign, as the process is slightly different for PPC.

Many site owners know immediately what phrases they want. If you feel like you know what you want, before you start take a brief step back and assess if this really is the best phrase for your site. Yes, it just may very well be the perfect phrase, but if it isn’t, you could wind up spending a lot of time and money pursuing a ranking that either will never happen, or will provide very little value to your site.

There are a few key areas to look at when choosing a target phrase:

1. Relevance – Is this phrase even relevant to your site and its content?
2. Search Frequency – Are people even searching for this phrase?
3. Competition – How competitive is this field? Is it even a feasible target?


Where to start – Create a List of Phrases
So where do you even start with all this keyword research. Before looking up search frequencies and competition you need to create a list of relevant phrases. Open up an excel sheet and type out all relevant phrases that come to mind, do a little brainstorming as there are no wrong answers at this state.

After you have exhausted your thoughts, move over to your website. Open it up and navigate throughout recording any keyword phrase ideas that spring up checking your title tags and body content. Once this is done, do the same thing with your competition. Visit some sites that you know are in direct competition with you and go through them recording any relevant phrases you see.

By now you should have a long list of potential targets, a list that will grow further as you look into their search frequencies.

Find a Keyword Tool
The next step is to open up your favorite keyword research tool. There are many to choose from, two of the more popular being WordTracker and Keyword Discovery, although many still use the free, Overture tool. It is important to note that no keyword tools give you 100% accurate search figures. In most cases you will get numbers representing a sampling from various search engines. These numbers are best used in comparing one phrase to another to find out which is more popular, rather than determining specifically how much traffic to expect.

Check the Search Frequency
Once you’ve opened up a keyword tool, begin entering your keyword phrases and record their noted search frequency. Be sure to scroll through the results recording any additional phrases that are both relevant and have acceptable search frequencies. The exact number of searches required to make a phrase acceptable depends widely on industry, and even the search tool being used. A phrase with only 100 searches per month may be perfect for a secondary target, but in most cases may not be the best bet for a primary phrase.

Sorting Your List
You now should have a very exhaustive list of potential target phrases and their corresponding search frequencies. Sort this list in descending order based on the number of searches, so that the most popular phrase is at the very top. In many industries, the top few phrases may be completely impractical to target due to the competition, but we’ll determine that a bit later.

Check the Competition
The next step is to get a feel for h
ow competitive these phrases are. In the next column in your spreadsheet, place the number of results returned by Google for each individual phrase. The lower the number of competing pages, in most cases, the easier it may be to achieve rankings. (Note: this is not always the case, but it is an indicator).

At this point, you will have a long list sorted by search frequency, along with the number of competing pages. If you are fortunate, you will see one phrase immediately that jumps out – solid searches with low competition. This just may be the most ideal target phrase.

Does this phrase fit well with the theme of your site? If so, go to Google and take a closer look at the ranking websites. Does your site fit in with the general feel of these results? In some cases it may not, as your phrase could have different meanings (especially true if using acronyms). This phrase may represent a completely different part of the world if geographically targeted, or simply may be littered with mega competitors such as eBay, Amazon, WikiPedia, and others. If you can see your site fitting in with these results, it’s time to assess the general feasibility of this phrase.

Take a look at the number of back links, and indexed pages each site has. Do your numbers compare? If you find that the top 10 ranking sites all have back links well into the tens of thousands, and your site has a dozen or so, you may want to consider a different phrase. If the ranking sites are in the high tens, or low hundreds, and your site has a dozen links, then you have something to work with, if you are willing to work on increasing your link counts. The number of pages indexed is less important than links, but if you have a 6 page site and you are planning on competing with thousand page sites, your chances of success will be much lower.

The real key is to try to find a phrase that offers relevance, decent searches, and competition that is not way out of your league.

Pick a Phrase to Drive Qualified Traffic
For organic SEO it is usually best to focus on one primary phrase that best suits your site, while targeting more specific secondary phrases for relevant sections of your site. With organic SEO, how many phrases you should target is somewhat limited by the size of your site, the larger the site, the more phrases you will have the ability to work towards.

The phrase with the most searches is not always the best fit. This is largely true with the real estate market.

Because everyone has free access, I will use the Overture Keyword Selector Tool for an example. The phrase “real estate” saw 3,057,037 searches in January of 07. On the surface this phrase seems like a dream come true, but you have to consider the geographic issues.

If your office serves the Seattle area, is someone searching in Orlando likely to be a qualified visitor to your site? In most cases no. Targeting the phrase “Seattle real estate” with 12,441 searches, seems like a much better choice as it would deliver more qualified traffic. While this phrase is still quite competitive, it is not nearly as difficult as simply “real estate”. Take a look at the big picture and determine not only how likely it is that you may achieve rankings, but whether the traffic generated from such a ranking would actually have a positive impact on sales.




Conclusion
Doing some research to find the best target phrase is the groundwork for your SEO campaign. Without it you’ll be flying blind with no clear direction on goals. Take the time up front to do a little research and determine whether the dream phrase you have in mind is a worthwhile target or not. If it turns out that it’s not, its better to find out before you invest your time and money on an SEO campaign. Knowing the level of competition and search frequencies for a target phrase beforehand will help you make informed decisions and give you the best chances for success.

Source: http://www.isedb.com/

The robots.txt file and search engine optimization



On how to tell the search engine spiders and crawlers which directories and files to include, and which to avoid.

Search engines find your web pages and files by sending out robots (also called bots, spiders or crawlers) that follow the links found on your site, read the pages they find and store the content in the search engine databases.

Dan Crow of Google puts it this way: “Usually when the Googlebot finds a page, it reads all the links on that page and then fetches those pages and indexes them. This is the basic process by which Googlebot “crawls” the web.”

But you may have directories and files you would prefer the search engine robots not to index. You may, for instance, have different versions of the same text, and you would like to tell the search engines which is the authoritative one (see: How to avoid duplicate content in search engine promotion).

How do you stop the robots?

the robots.txt file

If you are serious about search engine optimization you should make use of the Robots Exclusion Standard adding a robots.txt file to the root of you domain.

By using the robots.txt file you can tell the search engines what directories and files they should spider and include in their search results, and what directories and files to avoid.

This file must be uploaded to the root accessible directory of your site, not to a sub directory. Hence Pandia’s robots.txt file is found at http://www.pandia.com/robots.txt.

Plain ASCII please!

robots.txt should be a plain ASCII text file.

Use a text editor or text HTML editor to write it, not word processors like Word.

Pandia’s robots.txt file gives a good example of an uncomplicated file of this type:

User-agent: *
Disallow: /ads/
Disallow: /banners/
Disallow: /cgi-local/
Disallow: /cgi-script/
Disallow: /graphics/

The first line tells the robots which robots are to follow the “commands” given below this line. In this case the commands are for all search engines.
The next lines tells the robots which Pandia directories to avoid (disallow).

Lets take a closer look at the syntax for disallowing directories and files.

Blocking an entire site

To block the entire site, you include a forward slash, like this.

Disallow: /

This is not a procedure we recommend! If you want to block search engine spiders from crawling your site, you should make it password protected. The search engines have been known not to respect the robots.txt files from time to time.

Blocking directories

To block a directory and all its files, put a slash in front of and after the directory name.

Disallow: /images/
Disallow: /private/photos/

Blocking single files

To stop the search engine(s) from including one file, write the file name after a slash, like this:

Disallow: /private_file.html

If the file is found in a subdirectory, use the following syntax:

Disallow: /private/conflict.html

Note that there are no trailing slashes in these instances.

Note also that the URLs are case sensitive. /letters/ToMum.html is not the same as /letters/tomum.html!

Identifying robots

The first line User-agent: * says that the the following lines are for all robots.

You may also make different rules for different robots, like this:

User-agent: Googlebot
Disallow: /graphics/

Most web sites do not need to identify the different robots or crawlers in this way.

These are the names of the most common “bots”:
Googlebot (for Google web search)
Slurp (for Yahoo! web search)
msnbot (for Live Search web search)
Teoma (for Ask web search)

Source : http://www.pandia.com/sew/489-robots-txt.html

unavailable_after tag - Google Robots Exclusion Protocol

The ‘unavailable_after’ meta tag will soon be recognized by Google according to Dan Crow, Director of Crawl Systems at Google. from Loren Baker

Google is coming out with a new tag called “unavailable_after” which will allow people to tell Google when a particular page will no longer be available for crawling. For instance, if you have a special offer on your site that expires on a particular date, you might want to use the unavailable_after tag to let Google know when to stop indexing it. Or perhaps you write articles that are free for a particular amount of time, but then get moved to a paid-subscription area of your site.

Two new features added to the protocol will help webmasters govern
when an item should stop showing up in Google’s web search, as well
as providing some control over the indexing of other data
types.

One of the features, support for the unavailable_after tag, has
been mentioned previously. Google’s Dan Crow made that initial
disclosure.

He has followed that up with a full-fledged post on the official
Google blog about the new tag. The unavailable_after META tag
informs the Googlebot when a page should be removed from Google’s
search results:

“This information is treated as a removal request: it will take
about a day after the removal date passes for the page to disappear
from the search results. We currently only support unavailable_after
for Google web search results.”

“After the removal, the page stops showing in Google search results
but it is not removed from our system.”
(Email from: David A. Utter)

One of the major issues plaguing search engines right now is the growing list of web documents available online. While no exact numbers are available, there are billions of search results to sort through. But, they can’t all be relevant both on material content and time — can they?

Of course they’re not, and Google is hoping to solve this problem through the adoption of the unavailable_after META tag. more here
(From Sujan Patel: SEO Impact of Google’s unavailable_after META Tag)

Source :http://www.searchengineoptimizationcompany.ca

Things to Avoid in Search Engine Optimization




There are a few things you must avoid (or fix accordingly) when considering to optimize your site for search engine submission. These include the following and more:

Dead Links - As search engines index your entire site by crawling through hypertext links, you must make sure you check for dead links before submitting.

Graphics and Image Maps - Search engines cannot read images, be sure to include Alternative Text tags.
I recently had someone ask me why their site couldn’t get indexed on the search engines. I wasn’t surprised when I looked at their site - 41 pages of pure images only - not a shred of text on the site. That is the worst case scenario of course, but you should keep pages under 64k (max) total graphics and text. Anything else, your losing your search engine food, and the load time is driving away users before the page ever loads.

Frames - Many Search engines aren’t frames compatible. Meta tags and the tags are important in this instance.Only AltaVista, Google, and Northern Light understand frames. If you use frames, make sure that your first content page is search engine friendly, and that it’s linked to the main pages of your site by standard text links. Submit this page to the search engines, not your frameset page

Password protection - Most search engines cannot index content available behind a password-protected page unless you make special arrangement to provide password access.

Dynamic Pages - If your site uses database generated requests or CGI scripts etc, consider submitting pointer pages with the search engines.

SPAMMING - Avoid resubmitting your pages repeatedly to search engines if your site does not get listed in the first few weeks. Allow atleast 6 weeks before resubmission. Continual resubmission (such as those caused by automatic submission software) can cause your site to be penalized.

No Flashing! Nothing drives users away, never to return, like flashing text, or abuse of animated gifs can. That scrolling banner text ranks right up their too.

Ban Those Banner Exchanges! Link Exchange is the great modern Internet myth of our time. I’ve talked to hundreds of people in-the-know about this subject, and the facts are simple - banner exchanges cost you repeat visitors in the short run, the medium run, and the long run. Its like putting a DO NOT ENTER sign with a big skull and crossbones on your front door. Nothing spells Trailer Park like Link Exchange - your left wondering why your hit rate slowly fades away. It is one thing if you are getting paid for it - it is another entirely if you are giving it away.

Cloaking, Door way Pages, Mini sites. It’s all the same. This and some of the tricks mentioned below are considered “SPAMMING” by search engines.

Hiding Text. Padding your page with “hidden” text, using fonts the same color as your background, will prompt search engines not to index those pages.

Tiny Text. Visible text, whose only purpose is to pad the page, will be penalized the same as using hidden text.

Banners and Links. If banners or links are the first things that the search engine spider comes across, it may leave your site and follow the link. Place banners and links to other web sites after your own content, or on a dedicated links page.

Source: http://bill-ray.com/?p=22
Google
 
Zeus Internet Marketing Robot
NNXH
Cyber-Robotics - ZEUS INTERNET MARKETING ROBOT