New collection node: the first step is to set the basic information and the website index page rules
Node basic information
The name of the node: Target page coding: GB2312 UTF8 BIG5
Regional matching model: Regular expression string Content import order: Agree with the target station Opposite the target station
The following options only need to be set on the anti-hotlinking mode. If the target site has no anti-hotlinking function, please do not open it, otherwise it will reduce the collection speed.
Anti-hotlinking mode: Don't open open Resource download timeout: seconds
Reference site: (A web site for one of the posts on the target site)
List url for rules
The source attribute: Batch generate list url Manually specify the list url Get it from RSS
Batch generate address Settings:
Match website:
(Such as: http://www.dedecms.com/html/test/list_ (*). The HTML, if you can't match all site, can enter the url specified in the manual place to add url)
(*)from to (page number or regular number) The increment per page: Enable multi-column distribution(#)
Manual address:
Some unmatched urls can be specified here after specifying the rules of distribution.
Article url matching rules
Content address matching model: Specify the area that contains the url of the article (you can access the url, title, image, etc.) of the site. Specify the url regular expression (only access to the url information)
Contains the locale of the article url:
The beginning of the HTML in the region:
HTML for the end of the region:
If the link contains pictures: Don't deal with Gather as a thumbnail
Refilter the regional web site:
(using regular expressions)
Must include: (priority over the latter)
Cannot include: