The crawler should open pages with the same setup in order to get full results. In my case an authentication cookie is needed, to properly open the page and see its full content (including crawlable links).
* Add option for crawler to ignore robots.txt
For example we have an internal test site (a sort of showcase of all our modules), that has a noFollow rule on all its pages. With that the crawler refuses to discover any pages. However there is an option in the crawler to ignore the robots.txt. This is basically my attempt at passing that option through. I have this currently running as a patched version on our site.
Lay the foundation for grouping data from multiple urls. Tag all messages originating from a single url (browsertime.pageSummary, coach.pageSummary etc.) with a group. Aggregations based on group will be a breaking change, so that will follow in a later changeset.
Urls passed directly on the command line will be tagged with a group based on the domain. When passing urls via text files, the group will be generated from the file name.