sitespeed.io

Commit Graph

Author	SHA1	Message	Date
Peter Hedenskog	b9456eef6e	Replace intel with sitespeed.io/log (#4381 ) * Replace intel with sitespeed.io/log	2025-01-07 08:53:48 +01:00
Peter Hedenskog	8890a9b256	Update latest eslint and dependencies (#4345 )	2024-12-22 15:20:16 +01:00
Peter Hedenskog	3741366d45	Upgrade to eslint/unicorn 54 (#4213 )	2024-07-08 08:19:41 +02:00
Peter Hedenskog	f85e54941b	Fix broken crawler (#3820 )	2023-04-23 05:56:46 +02:00
Peter Hedenskog	631271126f	New plugins structure and esmodule (#3769 ) * New plugins structure and esmodule	2023-02-25 11:16:58 +01:00
Peter Hedenskog	f46a366752	If you set a user agent for Browsertime, also use it for the crawler (#3652 )	2022-05-17 05:12:54 +02:00
Peter Hedenskog	426fb42bca	Tune the cookie handling to handle = in the cookie (#3473 ) * Tune the cookie handling to handle = in the cookie * fix path	2021-10-08 18:43:36 +02:00
dammg	ad44d6290d	Allow crawler to also send the configured cookies (#3472 ) The crawler should open pages with the same setup in order to get full results. In my case an authentication cookie is needed, to properly open the page and see its full content (including crawlable links).	2021-10-07 20:19:00 +02:00
dammg	094f9fda56	Add option for crawler to ignore robots.txt (#3454 ) * Add option for crawler to ignore robots.txt For example we have an internal test site (a sort of showcase of all our modules), that has a noFollow rule on all its pages. With that the crawler refuses to discover any pages. However there is an option in the crawler to ignore the robots.txt. This is basically my attempt at passing that option through. I have this currently running as a patched version on our site.	2021-09-03 21:16:30 +02:00
Peter Hedenskog	caddb34d65	Verify that depth is set when you crawl #2806 (#2807 )	2019-11-29 10:10:03 +01:00
Samuli Reijonen	b97dce509e	Add --crawler.include (#2763 )	2019-11-09 21:55:03 +01:00
Ferdinand Holzer	3c5ccc338c	Add support for crawler exclude patterns (#2319 ) * Add support for excluding patterns from crawling. Resolves #1929 * Make eslint happy, fix error handling issue	2019-02-17 17:38:33 +01:00
Peter Hedenskog	7cc5562204	Remove Bluebird promises and use await/sync where we can. (#2205 )	2018-11-20 09:14:05 +01:00
Peter Hedenskog	da98a06cb6	first go at basic auth for crawl (#1845 )	2017-12-05 08:59:11 +01:00
Peter Hedenskog	e81be5d689	Feed plugins with messageMaker (#1760 )	2017-10-29 09:22:27 +01:00
Tobias Lidskog	3debfec0b4	Format code using the Prettier formatter. (#1677 )	2017-07-20 21:24:12 +02:00
soulgalore	e5db4be248	info log crawler setup and when we stop	2017-04-12 12:49:43 +02:00
Peter Hedenskog	1e528f65fd	set sitespeedio as root name of all loggers (#1545 )	2017-03-23 12:21:11 +01:00
Peter Hedenskog	e46a7026eb	Add log channel names per plugin thank you @jpvincent (#1544 )	2017-03-23 08:57:03 +01:00
Tobias Lidskog	720d3b93c2	Set plugin name by default when loading it	2017-03-13 17:40:29 +01:00
Tobias Lidskog	47dce74074	Upgrade simplecrawler to 1.0.1.	2016-08-27 17:09:23 +02:00
Tobias Lidskog	fae6b8ba3d	Tag messages with group, based on url or filename. (#1157 ) Lay the foundation for grouping data from multiple urls. Tag all messages originating from a single url (browsertime.pageSummary, coach.pageSummary etc.) with a group. Aggregations based on group will be a breaking change, so that will follow in a later changeset. Urls passed directly on the command line will be tagged with a group based on the domain. When passing urls via text files, the group will be generated from the file name.	2016-08-25 09:26:26 +02:00
Tobias Lidskog	1779d75693	Fix spinning crawl when using maxPages. Turns out the 'complete' event wasn't being sent when the parser was explicitly stopped.	2016-05-13 21:55:51 +02:00
Tobias Lidskog	315ae102e1	Implement crawler.maxPages to limit pages in crawl	2016-05-13 18:16:35 +02:00
soulgalore	616dbab278	skip links in HTML comments #896	2016-05-13 08:09:12 +02:00
Tobias Lidskog	06e9933db4	Rename crawler.maxDepth to crawler.depth.	2016-05-10 22:06:29 +02:00
Tobias Lidskog	dad7546e95	Filter out non-html pages from crawler.	2016-05-10 22:06:29 +02:00
Tobias Lidskog	e52b4a8503	Make url crawl much more functional.	2016-05-08 07:32:53 +02:00
Tobias Lidskog	551ef59297	Skip crawl if depth is 0 or 1.	2016-05-08 07:32:53 +02:00
soulgalore	9842b57831	debug log each URL the crawler finds	2016-04-25 14:51:45 +02:00
soulgalore	92503dc909	more logs	2016-04-14 11:47:58 +02:00
Tobias Lidskog	c840d65c55	Initial draft of node based crawler.	2016-03-23 00:39:46 +01:00

32 Commits