WEBNEW(1) WEBNEW(1)
NAME
webnew - Retrieve modification times of HTTP documents
SYNOPSIS
webnew [-PRVadinrvx] [-A username:password] [-c type] [-e
email] [-t title] URL
DESCRIPTION
webnew produces a listing of URLs (web documents) sorted
by the last modification time as reported by the HTTP
server. It produces by default a HTML 2.0 document on
standard output. The URL on the command line is used as a
starting point.
By default the URLs to include in the listing are extrac-
tred from the document specified by the URL. For a recur-
sive search of URLs to include, please see the -R and -r
options.
OPTIONS
-A Use the provided username and password using basic
authentication. This is only needed for password
protected documents.
-P Do not use proxies to access the documents. By
default proxy definitions are used from the stan
dard environment variables.
-R Become a "robot" and turn on -r. To restrict the
retrieval of documents, you can use a "/robots.txt"
file on your server (the user agent name for webnew
is "webnew").
-V Print the version of webnew and exit.
-a Use the text of the first anchor found pointing to
each URL as the acnhor text in the produced list
ing. The default is to prefer the title specified
in the document. Using this option will consider
ably speed up non-recursive listings, as the indi
vidual documents will not be retrieved at all.
-c Specify a regular expression to match for the con-
tent-type of documents included in the listing.
Default is "text".
-d Output a trace of the stack of URLs to retrieve.
Automatically turns on -v.
-e Use the given email address in the HTTP requests.
Also causes a <LINK REV=MADE> tag to be included in
the HTML output.
-i Only output the unordered URL items. This produces
HTML that should not be served as a standalone doc-
ument. It is intended for including the output
inside another HTML file.
-n Report URLs that no modification date was retrieved
for.
-r Use the specified URL as the initial URL to include
in the listing. Then retrieve that document and
extract URLs from it to be further included and
retrieved. Only URLs beginning with the initial
URL will be retrieved (to avoid infinite listings).
This is very useful for completely automatic
"what's new" listings.
-t Set the title and top level heading to the given
text. The default title is "What's new".
-v Show retrieved document URLs, their modification
times (if it was reported by the server). If the
URL was not searched for more links, the reason is
reported in parentheses.
-x Exclude pointers to the home page of webnew from
the output. If you use this option, please make
sure you provide a pointer to the home page in some
other fashion. The URL for webnew is
http://www.tac.nyc.ny.us/kim/webnew/ and it will
always contain a pointer to the most recent version
of the software as well as installation and use
instructions.
EXAMPLES
mv new.html old.html
webnew -a http://www.tac.nyc.ny.us/kim/old.html > new.html
webnew -r http://www.tac.nyc.ny.us/kim/ > new.html
BUGS
No known bugs.
AUTHOR
Kimmo Suominen <kim at tac.nyc.ny.us>
SEE ALSO
urlget(1)
Please read the document "A Standard for Robot Exclusion"
for more information on restricting robots.
http://www.robotstxt.org/wc/norobots.html
TAC 1.2 6 May 1996 1