Of Typo3, Site Crawlers and Compression

Reading Time: < 1 minute

A Typo3-Installation I currently maintain uses the sitecrawler extension to heat up the page cache every morning before the users are visiting our site. We encountered the problem that all pages that were cached didn’t use gzipped CSS & JavaScript-Files, only the non gzipped versions.

Typo3 first generates both the gzipped and non-gzipped versions of the CSS & JS files, and then checks the HTTP_ACCEPT_ENCODING setting whether gzip is supported and decides which of both versions is referenced in the HTML.

Since the Crawler is not sending the HTTP_ACCEPT_ENCODING flag when crawling the pages, Typo3 thus renders the Page HTML referencing the non-gzipped files. I tried to add the “Accept Encoding” to the crawlers requestUrl()-Function, but of course Typo3 now returned the HTML as gzip, which totally broke the crawlers logic …

Since as said Typo3 generates both versions of the files, eventually we came up with these two redirects:

[code lang=”shell”]
RewriteCond %{HTTP:Accept-Encoding} .*gzip.*
RewriteRule ^typo3temp/compressor/(.*)\.js$ typo3temp/compressor/$1.js.gzip?%{QUERY_STRING} [L]
RewriteCond %{HTTP:Accept-Encoding} .*gzip.*
RewriteRule ^typo3temp/compressor/(.*)\.css$ typo3temp/compressor/$1.css.gzip?%{QUERY_STRING} [L]
[/code]

In case the page is cached without referencing to the gzipped versions (which is the default state every morning), we just redirect all request to the gzipped versions.

Leave a Reply

Your email address will not be published. Required fields are marked *

I accept the Privacy Policy

This site uses Akismet to reduce spam. Learn how your comment data is processed.