webnew(1) - Kimmo Suominen


WEBNEW(1)                                               WEBNEW(1)

NAME
       webnew - Retrieve modification times of HTTP documents

SYNOPSIS
       webnew  [-PRVadinrvx] [-A username:password] [-c type] [-e
       email] [-t title] URL

DESCRIPTION
       webnew produces a listing of URLs (web  documents)  sorted
       by  the  last  modification  time  as reported by the HTTP
       server.  It produces by default a  HTML  2.0  document  on
       standard output.  The URL on the command line is used as a
       starting point.

       By default the URLs to include in the listing are  extrac-
       tred from the document specified by the URL.  For a recur-
       sive search of URLs to include, please see the -R  and  -r
       options.

OPTIONS
       -A     Use  the provided username and password using basic
              authentication.  This is only needed  for  password
              protected documents.

       -P     Do  not  use  proxies  to access the documents.  By
              default proxy definitions are used from  the  stan
              dard environment variables.

       -R     Become  a  "robot" and turn on -r.  To restrict the
              retrieval of documents, you can use a "/robots.txt"
              file on your server (the user agent name for webnew
              is "webnew").

       -V     Print the version of webnew and exit.

       -a     Use the text of the first anchor found pointing  to
              each  URL  as the acnhor text in the produced list
              ing.  The default is to prefer the title  specified
              in  the document.  Using this option will consider
              ably speed up non-recursive listings, as the  indi
              vidual documents will not be retrieved at all.

       -c     Specify a regular expression to match for the  con-
              tent-type  of  documents  included  in the listing.
              Default is "text".

       -d     Output a trace of the stack of  URLs  to  retrieve.
              Automatically turns on -v.

       -e     Use  the  given email address in the HTTP requests.
              Also causes a <LINK REV=MADE> tag to be included in
              the HTML output.

       -i     Only output the unordered URL items.  This produces
              HTML that should not be served as a standalone doc-
              ument.   It  is  intended  for including the output
              inside another HTML file.

       -n     Report URLs that no modification date was retrieved
              for.

       -r     Use the specified URL as the initial URL to include
              in the listing.  Then retrieve  that  document  and
              extract  URLs  from  it  to be further included and
              retrieved.  Only URLs beginning  with  the  initial
              URL will be retrieved (to avoid infinite listings).
              This  is  very  useful  for  completely   automatic
              "what's new" listings.

       -t     Set  the  title  and top level heading to the given
              text.  The default title is "What's new".

       -v     Show retrieved document  URLs,  their  modification
              times  (if  it was reported by the server).  If the
              URL was not searched for more links, the reason  is
              reported in parentheses.

       -x     Exclude  pointers  to  the home page of webnew from
              the output.  If you use this  option,  please  make
              sure you provide a pointer to the home page in some
              other   fashion.    The   URL   for    webnew    is
              http://www.tac.nyc.ny.us/kim/webnew/  and  it  will
              always contain a pointer to the most recent version
              of  the  software  as  well as installation and use
              instructions.

EXAMPLES
       mv new.html old.html
       webnew -a http://www.tac.nyc.ny.us/kim/old.html > new.html

       webnew -r http://www.tac.nyc.ny.us/kim/ > new.html

BUGS
       No known bugs.

AUTHOR
       Kimmo Suominen <kim at tac.nyc.ny.us>

SEE ALSO
       urlget(1)

       Please read the document "A Standard for Robot Exclusion"
       for more information on restricting robots.
       http://www.robotstxt.org/wc/norobots.html

TAC 1.2                     6 May 1996                          1