.

KGRKJGETMRETU895U-589TY5MIGM5JGB5SDFESFREWTGR54TY
Server : Apache/2.4.62
System : FreeBSD fbsdweb2.web.rcn.net 14.1-RELEASE FreeBSD 14.1-RELEASE releng/14.1-n267679-10e31f0946d8 GENERIC amd64
User : www ( 80)
PHP Version : 8.3.8
Disable Function : NONE
Directory : /domains/svo3oda/weblog/
Upload File :
Current File : /domains/svo3oda/weblog/docs.txt
                             DOCUMENTATION

         WebLog 2.00 by Darryl C. Burgdorf ([email protected])

                    http://awsd.com/scripts/weblog/

WebLog is a comprehensive access log analysis tool.  It allows you to
keep track of activity on your site by month, week, day and hour, to
monitor total hits, bytes transferred and page views, and to keep track
of your most popular pages.  It can also print out secondary reports to
track "user sessions," showing the paths taken through your site by your
visitors and giving you a rough idea of how long they spent looking at
your pages, and to provide you with information on referring sites, the
search engine keywords which brought your visitors and the agents and
platforms they used while visiting.

              ===========================================

I.  THE REPORTS

The primary WebLog access report provides the following information:

    A.  Long-Term Statistics

        1.  Monthly Statistics:  An overview of site activity (number
            of hits, number of bytes transferred, and approximate number
            of visitors) per month for each month since you started
            running WebLog.
        2.  Daily Statistics (Past Five Weeks):  An overview of site
            activity per day for the past five weeks.
        3.  Day of Week Statistics:  An overview of site activity by
            weekday, maintained as a running total since you started
            running WebLog.
        4.  Hourly Statistics:  An overview of site activity by hour of
            the day, maintained as a running total.
        5.  "Record Book":  A simple listing of the days on which your
            site had the most hits, transferred the most data and saw
            the most visitors.

        Each of the "Long-Term Statistics" reports (except the "record
        book," of course) lists four pieces of information:  Hits,
        Bytes, Visits and PViews.  The number of "hits" is the total
        number of files requested from the server.  For example, if a
        visitor loads a page which includes four inline graphics, a
        total of six hits will be recorded in the access log.  The
        number of bytes represents the total amount of information
        transferred by the server in filling those requests.  (Note that
        WebLog automatically factors in a bit extra in its calculations
        to allow for the fact that "header" information -- which is not
        recorded in the server access log -- is sent by the server along
        with each file.)  The number of "visits" is an approximation of
        the number of actual individual visitors to your Web site.  This
        is only a *very* rough approximation, and should be regarded as
        such.  The number of "pview" shows the number of Web pages
        viewed by your visitors.  Each of the "Long-Term Statistics"
        reports also includes a simple "bar graph" representation; the
        graph can be configured to reflect whichever of the four items
        you're most interested in being able to track "at a glance."

    B.  Statistics for The Current Month

        1.  Top N Files by Number of Hits (optional):  A list of the
            pages most frequently requested.
        2.  Top N Files by Volume (optional):  A list of the pages which
            resulted in the greatest number of bytes transferred.
        3.  Complete File Statistics (optional):  A list of all pages
            accessed in the current calendar month, with the date of
            last access, number of times requested, and total number of
            bytes transferred.
        4.  Top N Most Frequently Requested 404 Files (optional):  A
            list of the pages people are requesting most often which
            don't actually exist on your site.
        5.  Complete 404 File Not Found Statistics (optional):  A
            complete list of those nonexistent files.
        6.  User ID Statistics (optional):  A complete list of user IDs
            (and the associated second-level domains) utilized by the
            visitors to your Web site.  Note that this report can, of
            course, only be generated if at least part of your Web site
            is password protected through your server's default system.
        7.  "Top Level" Domains:  A breakdown of how many visits you've
            had from each type of domain (.com, .net, .edu, etc.)
        8.  Top N Domains by Number of Hits (optional):  A list of the
            IP addresses (domains) from which people have visited your
            site most often.
        9.  Top N Domains by Volume (optional):  A list of the IP
            addresses from which people have requested the greatest
            amount of information.
       10.  Complete Domain Statistics (optional):  A complete list of
            the IP addresses from which people have visited your site
            since the beginning of the current calendar month.

        Each of the "Current Month" reports resets automatically at the
        beginning of each month.  This allows you to easily keep track
        of things while preventing the report file from reaching too
        ridiculous a size over time.

The optional access details report keeps track of "user sessions."  It
will show you detailed "tracks" of the paths taken through your site by
visitors for however many days you specify, and will give you overview
information regarding how many unique visitors you've had each day and
how long they seem to be staying around.  If logging of referring URLs
is enabled, it will also show you, where possible, where your visitors
came from.  Please note that precise tracking of the number of visitors
is impossible; the information in this report is at best a reasonably
close approximation based on the information in your server access log.

The optional referring URL report logs the URLs reported by browsers as
the "referers" directing them to the various listed pages.  You should
be aware that this information is far from perfect.  Many browsers do
not provide any information on the referring page; even those that do
can at times provide false or misleading data.  And the fact that a page
is listed as the referer to a given page does *not* necessarily mean
that it actually contains a link to that page.  Of course, this report
is only available if your server log contains the necessary information.

The optional keywords report logs the keywords used by your visitors
to find you in the various Internet search engines and directories.  The
major search engines are each listed individually.  (Note that not all
search engines provide search keywords in their URLs, and so some are
not listed here.)  Again, this report is only available if your server
log contains the necessary information.

The optional agent and platform reports list the agents (browsers)
and platforms (operating systems) utilized by visitors to your pages.
(Browsers which "spoof" other browsers -- such as MSIE or newer versions
of AOL's browser claiming to be Netscape, or WebTV claiming to be MSIE
claiming to be Netscape -- are identified as what they really are,
rather than as what they claim to be.)  The first report details the
agents utilized; the second, the platforms.  The third report combines
the data from the first two.  The fourth report is a complete and
essentially unprocessed listing of the raw data from the agent log.
Again, of course, this report is only available if your server log
contains the necessary information.

The referring URL, keywords and agent/platform reports will not
automatically reset, so you'll want to keep an eye on their sizes and
delete them periodically when they start to get too large to handle.

    (CAVEAT:

    (Like any log analysis software, WebLog is based squarely upon
    several unfortunately questionable assumptions.  Chief among these
    is the assumption that any accesses from a specific IP address
    within a reasonably short period of time belong to a single user,
    and the assumption that analysis of access logs can actually tell
    you anything useful about site visitors, anyway.

    (It is possible for different users to access your site with the
    same IP address, so a single "user session" might actually reflect
    visits from multiple users.  As well, thanks to the number of
    systems which now employ local caching, it is quite likely that some
    of the pages which seem to be accessed only once are in actuality
    viewed many times by many different users.

    (For more information on these problems, you might want to take
    a look at some or all of the following articles:

        Getting Real About Usage Statistics
          - Tim Stehle
          <http://www.wprc.com/wpl/stats.html>
        Making Sense of Web Usage Statistics
          - Dana Noonan
          <http://www.piperinfo.com/pl01/usage.html>
        Interpreting WWW Statistics
          - Doug Linder
          <http://gopher.nara.gov:70/0h/what/stats/webanal.html>
        Why Web Usage Statistics are (Worse Than) Meaningless
          - Jeff Goldberg)
          <http://www.cranfield.ac.uk/stats/>

    (WebLog also assumes that the time between the loading of one page
    and the loading of the next, so long as it is less than 30 minutes,
    is actually spent looking at the first page.  This is clearly not
    necessarily the case.  The user could have gotten up to fix himself
    lunch or use the bathroom.  He could have reloaded another page
    already in his browser's cache, or could even have gone to look at
    pages on other sites before returning to yours.  There is no way of
    knowing.

    (Finally, WebLog assumes that the average length of time spent
    viewing the last -- or only -- page visited in a user session is 30
    seconds.  Again, there is obviously no way to check the validity of
    this assumption.)

              ===========================================

II.  SETTING UP AND RUNNING WEBLOG

The files that you need are as follows:

weblog.pl:  This is the main program file.  You don't actually need to
  do anything to it; in fact, you don't even have to execute it.

config.pl:  This is the configuration file.  Everything you need to
  change or modify is contained here.  This is also the file that you
  will execute.  (Things are set up this way so that you can effectively
  maintain multiple versions of the script, for example if you want to
  run separate log analyses for different sites, just by keeping
  separate config files for each.)

bar1.gif, bar2.gif, bar3.gif, bar4.gif, bar5.gif and bar6.gif:  These
  six small graphics files are used to create the bar graphs in the main
  access report.

As noted above, the WebLog configuration file, and not the WebLog
program itself, should be executed.  (And please note that it should
be executed from the telnet command prompt rather than your browser;
WebLog is *not* a CGI script, and most likely won't run correctly if you
try to access it from your browser.)  The configuration file should, of
course, be set executable.  Make sure that the first line of the script
matches the location of your system's Perl interpreter.  As well, the
following variables need to be defined:

$LogFile:  The path (not URL) of the NCSA-format access log file from
  which the log reports will be generated.  Note that this file is
  generated by your server; if you're not sure where to find it or what
  it's called, check with your system administrators.  It is possible,
  though not likely, that you don't actually have access to log data.
  If that is the case, then you won't be able to use WebLog at all.
  The script can read either standard (aka "common format") log files
  or extended (aka "combined") log files.  You don't need to specify
  the type, as WebLog determines it automatically when it reads the
  file.  Obviously, if you're dealing only with a standard format log
  file, WebLog won't be able to generate agent or referer reports.

$IPLog:  The path to an optional file in which resolved IP/domain
  pairs will be stored.  Logging this information will allow WebLog to
  run much faster, especially if you're running multiple reports from a
  single log file.  However, especially on a busy site, the log file
  could become *very* large.  If you define an IP log file, keep an eye
  on its size.

$FileDir:  The path of the directory in which the various report files
  will be created.

$ReportFile, $DetailsFile, $RefsFile, $KeywordsFile and $AgentsFile:
  The file names to be used for each of the five reports WebLog can
  generate.  All but the first are optional; if you don't assign a
  file name, the report simply won't be generated.

$SystemName:  The name or description which you want to appear at the
  top of your reports (e.g., "WebScripts").

$OrgName and $OrgDomain:  The name and domain of the "host" organization
  (e.g., ISP and isp.com).  If these variables are defined, accesses
  from this organization/domain will be counted separately from other
  accesses in the details report.

$GraphURL:  The URL of the directory containing the bar graph images
  (e.g., "http://awsd.com/graphs").  Do NOT include a trailing slash!

$GraphBase:  This variable defines the information on which you want the
  bar graphs in the main report to be based.  It can be set either as
  "hits", "visits", "pviews" or "bytes"; if left undefined (or defined
  incorrectly), graphs will be based on bytes transferred.

$IncludeOnlyRefsTo and $ExcludeRefsTo:  Regexs specifying files or
  directories to include or ignore in the files lists.  For example, to
  include only files in a "scripts" subdirectory, $IncludeOnlyRefsTo =
  "^/scripts" would suffice.  Multiple entries should be "OR"ed
  (e.g., $IncludeOnlyRefsTo = "(^/dir1|^/dir2)").

$IncludeOnlyDomain and $ExcludeDomain:  Regexs specifying domains to
  include or ignore in the log file.  If you want your log analysis to
  ignore any visits by you to your own site, for example, set the
  $ExcludeDomain variable to your own IP address.  (Note that even if
  you don't ignore your own visits completely, you can still track them
  separately in the details report by using the $OrgName and $OrgDomain
  variables.)

$IncludeQuery:  If this variable is set to "0" any query information
  contained in a URL will be stripped as the log file is processed.  If
  it is set to "1" the information will be retained.

$PrintFiles:  A flag specifying whether the lists of accessed files
  should be generated.  (Normally, of course, you'd want to do so.
  However, for example, if you generate a separate access report for
  each site on a server, and also a report for the server as a whole,
  you might want to suppress the files listings on the server-wide
  report.)  0 = no; 1 = yes.

$Print404:  A flag specifying whether the "Code 404" file lists should
  be printed.  0 = no; 1 = yes.

$PrintUserIDs:  A flag specifying whether the User ID list should be
  generated.  If no portion of your site is password protected, or if
  you use a password system other than that which is integral to your
  server software (.htaccess in the case of most UNIX systems), then
  this list can be turned off, as your log file won't contain any user
  IDs, anyway.

$PrintDomains:  A flag specifying whether or not to print lists of
  visiting IP addresses.  0 = no; 1 = yes.  This variable can also be
  set to "2" to indicate that you want only second-level domains
  tracked.  (In other words, for example, one hit each from
  user1.foo.com and user2.foo.com will show up simply as two hits
  from foo.com, which can greatly reduce the size of your log file,
  especially if your site is busy!)

$PrintTopNFiles:  The number of files to include in the "Top N Files"
  lists.  Set to 0 if you don't want to print the lists.

$TopFileListFilter:  Regex defining files to exclude from the "Top N
  Files" lists.  The default value of "(\.gif|\.jpg|\.jpeg|Code 404)"
  will filter out most image files and any frequently-requested but non-
  existing files.

$PrintTopNDomains:  The number of domains to include in the "Top N
  Domains" lists.  (This, of course, is irrelevant if you're not
  printing domain lists.)

$LogOnlyNew:  Setting this variable to "1" will instruct WebLog to
  ignore any entries in the log file being analyzed which date from
  before the end of the last log file analyzed.  If you're afraid that
  you might accidentally run the script with the same log file twice in
  a row, setting this to "1" will prevent any data duplication.  If, on
  the other hand, you won't necessarily be analyzing log files in strict
  chronological order, you will want to keep this set to "0" so that all
  information is parsed.

$NoSessions:  If set to "1" this variable will instruct WebLog *not* to
  include visitor counts on the monthly, daily and day-of-week lists.
  It will also disable creation of the details report.

$NoResolve:  By default, WebLog will attempt to resolve any IP numbers
  in the log file to domain names.  This can take a while, especially
  with larger log files.  If you don't want the script to bother -- if,
  for example, you don't care whether visitors came from ".com", ".net"
  or ".jp" sites, or if your log file already contains resolved domain
  names wherever possible, anyway -- just set this variable to "1".

$DetailsFilter:  A regex defining files to exclude from the details
  report.  (It's also used to determine what qualifies as a "page view"
  in the main report.)  The default value of "(\.gif|\.jpg|\.jpeg)" will
  filter out most image files, making it easier to follow which actual
  pages were viewed, and allowing a (theoretically) more accurate
  tracking of the time spent on each page.

$DetailsDays:  The number of days past to include in the details report.
  (This, of course, is only relevant if you're actually printing the
  details report.)  The number cannot be greater than 35.

$refsexcludefrom and $refsexcludeto:  If you want references to or from
  certain files ignored in the referring URLs report, define them here.
  You might want to exclude any references from within the same domain,
  for example, so that you can more easily see what *outside* locations
  are sending visitors to your site.

$RefsStripWWW:  Setting this variable to "1" will instruct the script to
  remove the "www" prefix from URLs.  If you don't strip those, the same
  URL could end up appearing twice in your referring URL list, both as
  "www.foo.com" and as "foo.com"; if you *do* strip the prefix, though,
  while the lists will be a bit easier to read and interpret, you'll end
  up with some URLs which you can't actually follow unless you manually
  put the "www" back.  (On some systems, for whatever reason, it's
  mandatory.)

$RefsMinHits:  This variable defines the minimum number of references
  that must come from a particular page before that page is included in
  the final report.  If you have a very busy site, and just want to know
  where *most* people are coming from, set it relatively high.  On the
  other hand, if you have a fairly quiet site, or if you're interested
  in tracking all accesses, set it low.

$AgentsIgnore:  If you wish to ignore references to particular files in
  your agents/platforms report, list them here.  Eliminating references
  to graphic images, for example, will prevent your report from
  indicating an overly-high percentage of graphical browsers, since
  only hits to actual pages will be included.

              ===========================================

This documentation assumes that you have at least a general familiarity
with setting up Perl scripts.  If you need more specific assistance,
check with your system administrators, consult the WebScripts FAQs
(frequently-asked questions) file <http://awsd.com/scripts/faqs.shtml>,
or ask on the WebScripts Forum <http://awsd.com/scripts/forum/>.

-- Darryl C. Burgdorf
Anon7 - 2021