Modify the acquisition node
Site index
The content configuration
Site index
The content configuration
Node basic information
The name of the node:
Target page coding:
GetAtt('sourcelang')=='gb2312') echo " checked='1'"; ?>/> GB2312
GetAtt('sourcelang')=='utf-8') echo " checked='1'"; ?>/> UTF8
GetAtt('sourcelang')=='big5') echo " checked='1'"; ?>/> BIG5
Regional matching model:
GetAtt('macthtype')=='regex') echo " checked='1'"; ?>/> Regular expression
GetAtt('macthtype')=='string') echo " checked='1'"; ?>/> string
Content import order:
GetAtt('cosort')=='asc') echo " checked='1'"; ?>/> Agree with the target station
GetAtt('cosort')=='desc') echo " checked='1'"; ?>/> Opposite the target station
The following options only need to be set on the anti-hotlinking mode. If the target site has no anti-hotlinking function, please do not open it, otherwise it will reduce the collection speed.
Anti-hotlinking mode:
GetAtt('isref')=='no') echo " checked='1'"; ?>/> Don't open
GetAtt('isref')=='yes') echo " checked='1'"; ?>/> open
Resource download timeout:
seconds
Reference site:
(A web site for one of the posts on the target site)
List url for rules
The source attribute:
GetAtt('sourcetype')=='batch') echo " checked='checked'"; ?>/> Batch generate list url
GetAtt('sourcetype')=='hand') echo " checked='checked'"; ?>/> Manually specify the list url
GetAtt('sourcetype')=='rss') echo " checked='checked'"; ?>/> Get it from RSS
RSSThe url:
Batch generate address Settings:
Match the url:
(Such as:http://www.dedecms.com/html/test/list_(*).html,If you can't match all the urls, you can type in the additional url in the place where the url is manually specified)
(*)from
to
(fill in the page number or regular increment) & NBSP; Increment per page:
/> Enable multi-column distribution (#)
'>
Manual address:
Some unmatched urls can be specified here after specifying the rules of distribution.
GetInnerText(); ?>
'>
Multi-column distribution rules:
If the target site USES a single template, you can use "(#)" in the matching url to indicate the difference in the approximate url, then set the set in the general distribution rule, and you can specify the export column.
GetInnerText()) : ''); ?>
The sample format:[(#)=>labs/list_3; (*)=>1-25; typeid=>7] Match the url:http://www.aaa.com/(#)_(*).html
Article url matching rules
Content url matching mode:
GetAtt('urlrule')=='area') echo " checked='1'"; ?>/> Specify the area that contains the url of the article (you can access the url, title, image, etc.) of the site.
GetAtt('urlrule')=='regx') echo " checked='1'"; ?>/> Specify the url regular expression (only access to the url information)
Regular expressions of urls:
GetInnerText(); ?>
Include the area Settings for the url of the article:
The beginning of the HTML in the region:
GetInnerText(); ?>
HTML for the end of the region:
GetInnerText(); ?>
If the link contains pictures:
GetAtt('listpic')=='0') echo " checked='1'"; ?>/> Don't deal with
GetAtt('listpic')=='1') echo " checked='1'"; ?>/> Gather as a thumbnail
Refilter the regional web site:
(Use regular expressions)
Must contain:
(The priority is higher than the latter)
Can't contain:
Web page content acquisition rules
1. Matching rules: in matching regional rules, rules are generally"
Start without repeating HTML
[content]
The end has no duplicate HTML
”(normal match, not regular).
2. Field value: if the specified field does not specify a region matching rule, use this value as the default value.
3. Filtering rules: if there are multiple rules, use them
{dede:trim replace=""}Rule one{/dede:trim}
{dede:trim replace=""}Rule 2{/dede:trim}
...
If you want to replace it with the specified value, in replace=""I'm going to set it up
The preview site:
The area matching rule for content paging navigation:
GetInnerText()); ?>
GetAtt('sptype')=='full') echo " checked='1'"; ?>/> All listed paged lists
GetAtt('sptype')=='next') echo " checked='1'"; ?>/> Up and down pages or incomplete pagination lists
GetAtt('sptype')=='diyrule') echo " checked='1'"; ?>/> The paging list rule starts:
结束:
If the paging list rule is set, you can use the address rule (regular), where {p} is the increment variable, starting at 1 each time, for example:{path}{file}_{p}{ext}
Rule description:{path}
Address + directory
{file}
file
{ext}
File extension
{p}
Pagination list number
The following is a fixed collection project:
(The project can be expanded/hidden, and the content summary, keyword, and thumbnail system will be automatically matched with the regular)
Keyword filtering content:
Filter content:
The article title
Match rule:
Filtering rules:
The authors
Match rule:
Filtering rules:
The article source
Match rule:
Filtering rules:
Release time
Match rule:
Filtering rules:
The following is a collection project for model Settings:
GetOne("Select * From `#@__channeltype` where id='{$channelid}' "); $dtp = new DedeTagParse(); $dtp->SetNameSpace('field','<','>'); $dtp->LoadString($row['fieldset']); foreach($dtp->CTags as $ctag) { //Collect disabled fields $notsend = $ctag->GetAtt('notsend'); if($notsend==1) continue; $fieldtype = $ctag->GetAtt('type'); $tname = $ctag->GetTagName(); $iname = $ctag->GetAtt('itemname'); if(isset($notes[$tname]['item'])) { $tvalue = $notes[$tname]['item']->GetAtt('value'); $tisunit = $notes[$tname]['item']->GetAtt('isunit'); $tisdown = $notes[$tname]['item']->GetAtt('isdown'); $tmatch = $notes[$tname]['match']; $ttrim = $notes[$tname]['trim']; $tfunction = $notes[$tname]['function']; }else { $tvalue = $tisunit = $tisdown = $tmatch = $ttrim = $tfunction = ''; } ?>
The field values:
Match rule:
/> Paged content fields (only single type fields are allowed in the rule)
/> Download the multimedia resources in the field
Filtering rules:
Customize the processing interface:
The variable of a function or program
@body Original web page @litpic The thumbnail
@me Represents the current mark value and the final result