#AiCrawler
Leverage Ai design patterns by using heuristics with the Symfony DOMCrawler.
Please crawl on over to the docs which are also available as a gitbook.
The AiCrawler package has the responsibility of making boolean assertions on a node in the HTML DOM. It comes with a straight-forward data point trait which will record the results of your heuristics (rules) for a given "item" or context.
$ composer require dan/aicrawler dev-master
$crawler = new AiCrawler('<html>...</html>');
$node = $crawler->filter('div[id="content-start"]');
$args = ['words' => 15];
// Does the content have at least 15 words?
$assertion = Heuristics::words($node, $args); // true / false
$crawler = new AiCrawler("<html>...</html>");
$args = [
'elements' => [
"elements" => "/p/ /blockquote/ /(u|o)l/ /h[1-6]/",
"regex" => true,
'words' => [
'words' => 15,
'descendants' => true,
'words2' => [
'words' => "/(cod(ing|ed|e)|program|language|php)/",
'regex' => true,
'descendants' => true
]
]
],
'matches' => 3
]
/**
* Do at least 3 of this div's children which are p, blockquote, ul, ol or any
* h element AND contain at least 15 words (including text from the child's
* descendants) AND words such as coding, coded, code, program, language, php
* (including text from the child's descendants).
*/
$crawler->filter("div")->each(function(&$node) use ($args) {
if (Heuristics::children($node, $args) {
$node->setDataPoint("example", "words", 1);
}
});
Sound interested? Read on about the Heuristics
class or go right to a similar example with complete notes.
- A
Heuristics
class with some cool rules to get you started. - A
Scorable
trait is on ourAiCrawler
class so there is a pattern for data points. - A
Extra
trait is on ourAiCrawler
class so there is a pattern for storing extra data.
- Search Github
- Finish related projects. See AiResponders, AiScrapers, and Larascrape.
- Fork this project on GitHub.
- Existing unit tests must pass.
- Contributions must be unit tested.
- New heuristics should be portable (have few or no dependencies).
- New heuristics should have helpful doc blocks.
- Submit a pull request.
- See guide on extending
Heuristics
for special heuristics.
- Follow PSR-2.
- Add PHPDoc blocks for all classes, methods, and functions
- Omit the
@return
tag if the method does not return anything - Add a blank line before
@param
,@return
or@throws
Any issues, please report here
AiCrawler is free software distributed under the terms of the MIT license.