@Per Yngve Berg & Mr. Wimpy
The 3.4.1 update to the robots.txt file still blocks cache, components, modules, and plugins. The only lines they removed were templates, media, and images. Extensions which store local versions of their CSS & JS, or server a minimized one from cache, cannot be read by Google unless these additional folders are unblocked.
My worry is that unblocking these folders is going to generate even more multiple urls to the same content (duplicate content issues) than before. For example, just from the templates folder, if you have multiple templates installed (the default installation), the Google bot can crawl those and generate the same content with the different layouts provided by the template. I don't think its smart enough to figure out which templates, plugins, modules, etc., are enabled and set to pages because it doesn't have direct access to the database, and it's not using curl to generate the pages.
@genegun
I'm not sure why you want to allow templates just for Google. Is your thought that you want to block it for security reasons? If so, the robots.txt file isn't for that. It doesn't force any blocking, only politely requests it. It's up to the bot to respect the robots file or not. For additional security, you can beef up your .htaccess file as in here: http://ift.tt/1FQIgJg(security). There you can allow or block specific folders to specific domains (eg. Google & not Google). If you use a CDN, they will provide an up to date filtering of bots for you. A CDN is the easiest route to go, plus it serves up pages with faster load times, which helps increase your SEO.
via Joomla! http://ift.tt/1FQIgZC
The 3.4.1 update to the robots.txt file still blocks cache, components, modules, and plugins. The only lines they removed were templates, media, and images. Extensions which store local versions of their CSS & JS, or server a minimized one from cache, cannot be read by Google unless these additional folders are unblocked.
My worry is that unblocking these folders is going to generate even more multiple urls to the same content (duplicate content issues) than before. For example, just from the templates folder, if you have multiple templates installed (the default installation), the Google bot can crawl those and generate the same content with the different layouts provided by the template. I don't think its smart enough to figure out which templates, plugins, modules, etc., are enabled and set to pages because it doesn't have direct access to the database, and it's not using curl to generate the pages.
@genegun
I'm not sure why you want to allow templates just for Google. Is your thought that you want to block it for security reasons? If so, the robots.txt file isn't for that. It doesn't force any blocking, only politely requests it. It's up to the bot to respect the robots file or not. For additional security, you can beef up your .htaccess file as in here: http://ift.tt/1FQIgJg(security). There you can allow or block specific folders to specific domains (eg. Google & not Google). If you use a CDN, they will provide an up to date filtering of bots for you. A CDN is the easiest route to go, plus it serves up pages with faster load times, which helps increase your SEO.
Statistics: Posted by qrusnell — Fri Apr 17, 2015 4:20 am
via Joomla! http://ift.tt/1FQIgZC
No comments:
Post a Comment