WordPress Troubleshooting and Support
Public Group active 2 years, 2 months ago agoWordPress support from our community
semalt & baidu
- This topic has 6 replies, 3 voices, and was last updated 6 years, 11 months ago by
D.K. Smith.
-
AuthorPosts
-
April 3, 2014 at 4:59 pm #2955
Vincent Gentile
ParticipantHello WPNYC – I have a client that has these 2 sites (semalt.com & baidu.com) crawling their site several times a day. I have added code that I found online to the htacess file however they continue hit the site.
Has anyone had this problem…?
Below is my htacess file:
__________________________________________________________________________
Options -Indexes# BEGIN WPSuperCache
# END WPSuperCache# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ – [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule># END WordPress
# block visitors referred from semalt.com
RewriteEngine on
RewriteCond %{HTTP_REFERER} semalt\.com [NC]
RewriteRule .* – [F]# block visitors referred from baidu.com
RewriteEngine on
RewriteCond %{HTTP_REFERER} baidu\.com [NC]
RewriteRule .* – [F]RewriteCond %{HTTP_USER_AGENT} ^Baiduspider.* [NC]
RewriteCond %{HTTP_USER_AGENT} ^.*(semalt|baidu)…. [NC]
RewriteRule . – [F,L]April 4, 2014 at 7:49 am #2956D.K. Smith
ParticipantHi Vincent,
I assume you disallowed them in robots.txt.
Semalt’s removal page, http://semalt.com/project_crawler.php
Most effective is to check the sites’ server logs and get the IP addresses semalt & baidu are using.
1. Block those IPs in htaccess:
order allow,deny
deny from 222.333.44.555 [replace with actual IPs]
deny from 666.777.88.999
allow from all [tells Apache to allow everyone else]2. block a range of IPs:
order allow,deny
deny from 222.333. [replace with first two blocks in actual IPs]
deny from 10.0.0. [use as written]
allow from all3. This may work depending on Apache configuration:
RewriteEngine on
RewriteCond %{REMOTE_ADDR} ^22\.333\.44\.555 [ ^22\period333\etc]
RewriteRule ^ – [F]4. PM me if you’d like to block all of China and I’ll send the file, which is too large to post here.
We often have to use all of the above dePending on the bot (baidu is among the worst) and the server’s Apache config.
Good luck!
April 4, 2014 at 9:19 am #2957April 4, 2014 at 9:19 am #2958Vincent Gentile
ParticipantHi DK – Thank You for responding. They always come from different IP’s and there isn’t a range.
I did not disallow them in the robots.txt. I have never done this, but (I guess) I need to create the robots.txt file and upload it to the host. If this is correct, do I up-load it to the public_html….?
Do you think the below code would work….?
User-agent: *
Disallow: /# Some bots are known to be trouble, particularly those designed to copy
# entire sites. Please obey robots.txt.User-agent: semalt.com
Disallow: /April 4, 2014 at 9:22 am #2959Vincent Gentile
ParticipantSteve – I saw you twitter message…..My client is paranoid about submitting to the semalt crawler….?
April 4, 2014 at 11:23 am #2960D.K. Smith
ParticipantIf baidu is coming from all over it’s not baidu, they’re spam bots – harvesters spoofing baidu.
Baidu used to run on 119.63.192.0 – 119.63.199.255 – not sure if those IPs are still valid.
1. Try this on top of htacess, before WP rewrite:
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://www.yoursite.net/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.yoursite.net$ [NC]
RewriteCond %{HTTP_REFERER} !^http://yoursite.net/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://yoursite.net$ [NC]
RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ – [F,NC]
SetEnvIfNoCase User-Agent “^baiduspider” bad_bot
<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>2. Or this on top:
SetEnvIfNoCase User-agent “Baidu” spammer=yes
SetEnvIfNoCase User-agent “semalt” spammer=yes<Limit GET POST>
order deny,allow
deny from env=spammer
</Limit>3. Robots.txt file should be in EVERY WordPress root folder:
#Baidu
User-agent: Baiduspider
Disallow: /#Semalt
User-agent: semaltspider
Disallow: /4. The only method that’s consistently effective is blocking IPs.
PM your email and I’ll send our block-China list. There’s nothing to lose by trying it unless the owner wants traffic from China. If that doesn’t stop it, most likely they’re harvester bots spoofing baidu.
May 4, 2014 at 4:16 am #2977D.K. Smith
ParticipantVincent – How did the stop-baidu effort go?
Was it Baidu or spambots?
-
AuthorPosts
- You must be logged in to reply to this topic.