Save extracted URL's in Wayback Machine

URL's should be preserved as soon as they are found.

Reference code:

**[pragma.archivelab.org/pragma/api/pragmas.py](https://github.com/ArchiveLabs/pragma.archivelab.org/blob/7587f6de4cd380bfb10ba56cea376e38d7124ec5/pragma/api/pragmas.py#L51-L72)**
Lines 51 to 72 in https://github.com/ArchiveLabs/pragma.archivelab.org/commit/7587f6de4cd380bfb10ba56cea376e38d7124ec5

```python
def save(url):
    url = url if '://' in url else 'http://' + url
    r = requests.get('http://web.archive.org/save/%s' % url)    
    if 'X-Archive-Wayback-Runtime-Error' in r.headers:
        return {
            'error': r.headers['X-Archive-Wayback-Runtime-Error']
        }
    print(r.headers)
    content_location = r.headers.get('content-location', url)
    if 'x-archive-wayback-liveweb-error' in r.headers:
        raise core.HTTPException(r.headers['x-archive-wayback-liveweb-error'],
                                 r.status_code)
    protocol = 'https' if 'https://' in content_location else 'http'
    uri = content_location.split("://")[1] 
    path = uri[uri.index('/'):] if uri.index('/') is not None else '/';
    return {
        'date': r.headers['date'],
        'protocol': protocol,
        'domain': uri.split('/')[0],
        'path': path,
        'id': content_location
    }
```

# Scope

* Daily cron that runs @jjjake's derive module to generate genomes for books, for the resulting genome.json, test curl any urls, and then archive in wayback. This is also related to #51 and and https://github.com/internetarchive/openlibrary/issues/8756 as the same job should/could likely also handle TOC identification and extraction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save extracted URL's in Wayback Machine #73

Scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Save extracted URL's in Wayback Machine #73

Description

Scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions