Skip to content

Save extracted URL's in Wayback Machine #73

@finnless

Description

@finnless

URL's should be preserved as soon as they are found.

Reference code:

pragma.archivelab.org/pragma/api/pragmas.py
Lines 51 to 72 in ArchiveLabs/pragma.archivelab.org@7587f6d

def save(url):
    url = url if '://' in url else 'http://' + url
    r = requests.get('http://web.archive.org/save/%s' % url)    
    if 'X-Archive-Wayback-Runtime-Error' in r.headers:
        return {
            'error': r.headers['X-Archive-Wayback-Runtime-Error']
        }
    print(r.headers)
    content_location = r.headers.get('content-location', url)
    if 'x-archive-wayback-liveweb-error' in r.headers:
        raise core.HTTPException(r.headers['x-archive-wayback-liveweb-error'],
                                 r.status_code)
    protocol = 'https' if 'https://' in content_location else 'http'
    uri = content_location.split("://")[1] 
    path = uri[uri.index('/'):] if uri.index('/') is not None else '/';
    return {
        'date': r.headers['date'],
        'protocol': protocol,
        'domain': uri.split('/')[0],
        'path': path,
        'id': content_location
    }

Scope

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions