Group Admins

WordPress Troubleshooting and Support

Public Group active 1 year, 5 months ago ago

WordPress support from our community

semalt & baidu

Tagged: ,

Viewing 7 posts - 1 through 7 (of 7 total)
  • Author
  • #2955
    Vincent Gentile

    Hello WPNYC – I have a client that has these 2 sites ( & crawling their site several times a day. I have added code that I found online to the htacess file however they continue hit the site.

    Has anyone had this problem…?

    Below is my htacess file:
    Options -Indexes

    # BEGIN WPSuperCache
    # END WPSuperCache

    # BEGIN WordPress
    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
    RewriteRule ^index\.php$ – [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]

    # END WordPress

    # block visitors referred from
    RewriteEngine on
    RewriteCond %{HTTP_REFERER} semalt\.com [NC]
    RewriteRule .* – [F]

    # block visitors referred from
    RewriteEngine on
    RewriteCond %{HTTP_REFERER} baidu\.com [NC]
    RewriteRule .* – [F]

    RewriteCond %{HTTP_USER_AGENT} ^Baiduspider.* [NC]

    RewriteCond %{HTTP_USER_AGENT} ^.*(semalt|baidu)…. [NC]
    RewriteRule . – [F,L]

    D.K. Smith

    Hi Vincent,

    I assume you disallowed them in robots.txt.

    Semalt’s removal page,

    Most effective is to check the sites’ server logs and get the IP addresses semalt & baidu are using.

    1. Block those IPs in htaccess:

    order allow,deny
    deny from 222.333.44.555 [replace with actual IPs]
    deny from 666.777.88.999
    allow from all [tells Apache to allow everyone else]

    2. block a range of IPs:

    order allow,deny
    deny from 222.333. [replace with first two blocks in actual IPs]
    deny from 10.0.0. [use as written]
    allow from all

    3. This may work depending on Apache configuration:

    RewriteEngine on
    RewriteCond %{REMOTE_ADDR} ^22\.333\.44\.555 [ ^22\period333\etc]
    RewriteRule ^ – [F]

    4. PM me if you’d like to block all of China and I’ll send the file, which is too large to post here.

    We often have to use all of the above dePending on the bot (baidu is among the worst) and the server’s Apache config.

    Good luck!


    You also received a reply from Twitter >

    Vincent Gentile

    Hi DK – Thank You for responding. They always come from different IP’s and there isn’t a range.

    I did not disallow them in the robots.txt. I have never done this, but (I guess) I need to create the robots.txt file and upload it to the host. If this is correct, do I up-load it to the public_html….?

    Do you think the below code would work….?

    User-agent: *
    Disallow: /

    # Some bots are known to be trouble, particularly those designed to copy
    # entire sites. Please obey robots.txt.

    Disallow: /

    Vincent Gentile

    Steve – I saw you twitter message…..My client is paranoid about submitting to the semalt crawler….?

    D.K. Smith

    If baidu is coming from all over it’s not baidu, they’re spam bots – harvesters spoofing baidu.

    Baidu used to run on – – not sure if those IPs are still valid.

    1. Try this on top of htacess, before WP rewrite:

    RewriteEngine on
    RewriteCond %{HTTP_REFERER} !^*$ [NC]
    RewriteCond %{HTTP_REFERER} !^$ [NC]
    RewriteCond %{HTTP_REFERER} !^*$ [NC]
    RewriteCond %{HTTP_REFERER} !^$ [NC]
    RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ – [F,NC]
    SetEnvIfNoCase User-Agent “^baiduspider” bad_bot
    <Limit GET POST>
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot

    2. Or this on top:

    SetEnvIfNoCase User-agent “Baidu” spammer=yes
    SetEnvIfNoCase User-agent “semalt” spammer=yes

    <Limit GET POST>
    order deny,allow
    deny from env=spammer

    3. Robots.txt file should be in EVERY WordPress root folder:

    User-agent: Baiduspider
    Disallow: /

    User-agent: semaltspider
    Disallow: /

    4. The only method that’s consistently effective is blocking IPs.

    PM your email and I’ll send our block-China list. There’s nothing to lose by trying it unless the owner wants traffic from China. If that doesn’t stop it, most likely they’re harvester bots spoofing baidu.

    D.K. Smith

    Vincent – How did the stop-baidu effort go?

    Was it Baidu or spambots?

Viewing 7 posts - 1 through 7 (of 7 total)
  • You must be logged in to reply to this topic.