Fix replay job to work as advertised#6
Conversation
In addition: Use direct method of retrieving Access from database. Do not retry ping if there is not enough information to form a ping. Actually reschedule for 24 hours' time: a return code of 0 is treated by EPrints::DataObj::EventQueue::execute the same as HTTP_NOT_FOUND, which means changes to the event are not committed, and if 'cleanup' is True (default), the event is immediately deleted. HTTP_RESET_CONTENT is the correct return value to do what is promised in the $fail_message.
Also, EPrints::DataObj::EventQueue::create_unique computes `eventqueueid` itself, discarded any value passed in, but will use `params` if given.
|
Hi @alex-ball , Currently the PIRUS integration would create one 'replay' job - that's the purpose of the Lines 90 to 91 in 0f0561a The event queue job stores the failed access ID. When the 'replay' was run, it would replay relevant items from the access dataset - from the stored accessid to the most recent access record. Does your PR change this behaviour? |
|
Please do correct me if I'm wrong, but the way I'm reading it, the current workflow is as follows:
It won't be possible to know which accessid failed unless you have As far as I can see, sub create_unique
{
my( $class, $session, $data, $dataset ) = @_;
$dataset ||= $session->dataset( $class->get_dataset_id );
my $md5 = Digest::MD5->new;
$md5->add( $data->{pluginid} );
$md5->add( $data->{action} );
$md5->add( EPrints::MetaField::Storable->freeze( $session, $data->{params} ) )
if EPrints::Utils::is_set( $data->{params} );
$data->{eventqueueid} = $md5->hexdigest;
[...]
}This PR does indeed change the behaviour so there is one replay job per failed request.
Benefits:
Downsides:
|
|
Another way of handling it could be for the initial retry to be postponed for 24 hours (as opposed to ASAP), and then if it fails again return |
|
That's a good point about the eventqueueid not being referenced - although passing it a set of unchanging parameters each time also led to the same outcome. I think your changes seem reasonable and sensible. Have you spoken to the folk at IRUS at all about this? I think they'd welcome the change, but would be good to get their feedback. |
|
For many years my indexer logs have been full to bursting with error messages from the PIRUS replay event. I have been in contact with IRUS a few times to try and work out what's going on, but as far as they were concerned they weren't seeing any problems at their end. It was also a bit of a problem that (a) I didn't understand where I finally tracked down my problems to the User Agent not picking up the proxy settings properly, but since some data at least has been getting through to IRUS, I think that means the proxy issue was only affecting the Indexer. I am still not sure why the original pings were failing, but since fixing the proxy settings and trying out these changes in our repository we have only had successful pings, no failed ones yet. |
Event params are not changed, and attempting to set them to the same value again triggers an error.
This PR fixes several problems with the implementation:
$request_url) is a required part of the ping to IRUS (svc_dat), but is not available to thereplayjob (it is not recorded in the Access instance). This PR adds it as a parameter to the job so it can be passed on toPIRUS::log.replayjob, the steps are taken to reschedule the job for 24 hours' time, and this is reported in the error message. However, the job then returns a0value, which causesEPrints::DataObj::EventQueue::executeto log an additional error and then delete the job; so the ping is not in fact retried. This PR restores the HTTP_RESET_CONTENT return value that allows the job to be retried._archive_idmethod is out of sync withEPrints::OpenArchives::archive_id, since it does not fall back tosecurehost.NB. I have also simplified the idioms used for loading an Access instance and creating the
replayjob. These work for EPrints v3.3+ but I don't have access to the source code for v3.2 to be able to check compatibility there.