View Issue Details

IDProjectCategoryView StatusLast Update
0005660OXID eShop (all versions)4.08. Cachepublic2014-08-28 14:47
Reportermichael_keiluweit 
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
Product Version4.7.11 / 5.0.11 
Target Version4.7.14 / 5.0.14Fixed in Version4.9.0_5.2.0_RC1 
Summary0005660: Varnish Cache default config.vcl removes parameters from URL but not the ROBOTS tag from the sourecode
DescriptionIf something calls the Shop with an GET request and it has parameters added to the url, the shop will always print "<meta name="ROBOTS" content="NOINDEX, FOLLOW">" in the sourecode.

Our standard varnish configuration will remove some parameters from the URL, but not this ROBOTS tag from the sourcecode. Under circumstances the page, containing this tag, will be cached and always delivered.

For example: Google calls the shop with parameters and the Varnish put this site - with the tag - into the cache folder.
Google calls later the shop again and get, for example on the startpage, the ROBOTS tag to not index it.

Steps To Reproduce1. Call http://demoshop.oxid-esales.com/professional-edition/
2. Have a look into the sourcecode and search after "<meta name="ROBOTS" content="NOINDEX, FOLLOW">". You will not find it.

3. Call http://demoshop.oxid-esales.com/professional-edition/?foo=bar
4. Have a look into the sourcecode and search after "<meta name="ROBOTS" content="NOINDEX, FOLLOW">". You will find it.

So there is a chance that Varnish will cache this page and all search engines are rejected.
TagsNo tags attached.
ThemeAzure
BrowserAll
PHP Versionany
Database Versionany

Activities

martinwegele

2014-02-19 12:00

reporter   ~0009523

This is probably related to the behaviour of oxUBase::noIndex().

henrik.steffen

2014-02-19 15:00

reporter   ~0009529

This will not happen for random parameters like ?foobar=1

It's just those parameters, defined in the varnish vcl, like gclid, utm_source, etc.

mantas.vaitkunas

2014-08-28 14:47

reporter   ~0010106

eShop prints "<meta name="ROBOTS" content="NOINDEX, FOLLOW">" because of search engine, which should not crawl same page with 2 different URLs for example http://demoshop.oxid-esales.com/professional-edition and http://demoshop.oxid-esales.com/professional-edition/?foo=bar .
Pages with URLs(for example start page) which have parameters like gclid, utm_source, etc. were cached to the same page as start page without parameters. In this case we could have a problem, cache entry might have wrong content. To solve it we separated cache from one cache entry to two cache entries: start page without parameters and start page with parameter campaign=1.