Dashboard > Bouncer > ... > Bouncer 2 > Rethinking Sentry Scheduling
Bouncer Log In   View a printable version of the current page.
Rethinking Sentry Scheduling
Added by K Lars Lohn, last edited by K Lars Lohn on May 11, 2005  (view change)
Labels: 
(None)

The Problem:

Now that the new loader program can rapidly initialize the database, we have, for the first time, the ability to do real tests of the Bouncer system. It seems that Sentry will take much longer to run full hash tests than initially thought.

Sentry has two mandates: test the integrity of the files served by the mirrors and guide the bouncing script as to which mirrors are currently viable. The thought was to schedule the running of Sentry using cron. Simple mirror tests would take place every ten minutes with more thorough tests at less frequent intervals. Sentry does not allow multiple copies of itself to run simultaneously. This means that if a test takes more than ten minutes, any scheduled runs of other tests during that time will not execute.

A full hash test of all files on all mirrors takes over six hours to run. Depending on network bandwidth and congestion, Sentry will finish with the fastest mirrors long before the slowest mirrors. During that time between the first finishing and the last finishing, the finished mirrors are not being mirror tested. Say, for example, mirror A takes one hour to complete its hash test. Mirror B, however, takes the full six hours. If mirror A does down at hour 2, Sentry will not detect it and flag the problem until the next mirror test after mirror B finishes its hash test.

This is contrary to Sentry's mandate to keep tabs on the status of the mirrors in a timely manner.

Discussion

Why does Sentry prevent multiple copies of itself from running simultaneously? The primary reason is for report generation. When Sentry runs a test, it saves the result of the each individual file and mirror test in a persistent file. This file is then used to generate a chart in HTML at the end of each test run. Initially, this charting capability was for debugging and was expected to be dropped from the final product. Yet it produced such compelling output, that the consensus was to keep it in the product. The Sentry visualization and persistent history file does not handle the interleaving of test types that would occur if multiple tests were to run simultaneously.

A second reason is to avoid overloading slower mirrors. We wanted to avoid the possibility that multiple Sentries would be hitting the same mirror for complete downloads of files.

Accepted Solution

Rework Sentry to allow multiple instances to run simultaneously and rework the existing HTML report writer to work properly. This involves dropping the report writer's history file and replacing it with tables in the database. Moving the history data to the database allows no-pain concurrency. The difficult thing is setting the report to reflect the concurrency of the tests without getting unreadable.

Previous Proposals

Proposal 1

Throw away the HTML report generating component of Sentry and allow multiple copies of Sentry to run simultaneously. Log the individual test results in the database. Produce a new program that could generate the report taking into account interleaving of test types.

discussion:

This throwing away of Sentry's report generator has already been discussed and was postponed to the next version. Do we really want to embark on a new idea this late in the game?
advantages: it's the right solution to separate the backed code from the report generation.
disadvantages: development on a new module at the same time that we're supposed to be delivering the code

Proposal 2

Rework Sentry to allow for the basic mirror test to run at the same time as other tests. Ether turn off the HTML report generation for these types of tests (the email reports will still go without a problem) or setup two separate reports directories

discussion:

This is a trivial modification to the program. The HTML reports can already be turned off or redirected with launch time switches.
advantages: not much work
disadvantages: having two places to look for reports or having one type of test not generate reports.

Proposal 3

Rework Sentry to allow multiple instances to run simultaneously and rework the existing HTML report writer to work.

discussion

What would a reworked HTML report look like? The exisiting tabular report is useful because the x axis represents both time and the type of test done. Having multiple test running simultaneously breaks that linkage between time and test type.

Forcing the report to use test type only as the x axis would disrupt the appearance of “cause and effect”. Say for a particular file, test 1 ran before test 2. Test 1 found that the file was fine, but test 2 showed that the server had gone down. For another file, the temporal order of the tests may be reversed. By looking at the report, it would not be possible to determine the file state because it would not be possible to glean which happened first.

If the x axis were time only, then the individual test results would be interleaved. It would no longer be possible to easily see failure patterns because a single column no longer represents a single type of test.
advantages:
disadvantages: this is perhaps even more work than Proposal 1.

Site powered by a free Open Source Project / Non-profit License (more) of Confluence - the Enterprise wiki.
Learn more or evaluate Confluence for your organisation.
Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.2.7 Build:#524 Jul 28, 2006) - Bug/feature request - Contact Administrators