Skip to content

mauprogramador/scopus-survey-api

Repository files navigation

Scopus Survey API

Logo IFMS Scopus

Web API for bibliographic survey of Scopus articles

Lint & Test Documentation Coverage Latest Release Python3 version

FastAPI Pydantic Pandas AIOHTTP Bootstrap Poetry Pytest MkDocs

Black MyPy Pylint Bandit Pip-Audit


🌐 Idiomas: Leia em Português [pt-BR]

Instituto Federal de Educação, Ciência e Tecnologia de Mato Grosso do Sul  •  IFMS Campus Três Lagoas
Tecnologia em Análise e Desenvolvimento de Sistemas  •  TADS

Federal Institute of Education, Science and Technology of Mato Grosso do Sul
Technology in Systems Analysis and Development

Data provided by Scopus®  •  © Elsevier


1. Overview

This web API is designed, within its limitations, to perform systematic bibliographic surveys using data from the Scopus database, promoting access to relevant and high-quality bibliographic sources through a simple and well-documented interface, thus reducing the initial barrier to entry for students and academics.

As a free, non-commercial academic automation tool, the application integrates multiple selection criteria, including multiple query parameters, keyword combinations, and Boolean search, with mechanisms for retrieval, validation, serialization, and customized filtering of large volumes of data from the Scopus APIs.

This way, only the most relevant and recent data will be retained and returned in a CSV file, making it suitable bibliometric studies and surveys, research, systematic reviews, etc., allowing students to quickly gather a set of peer-reviewed literature sources for a thesis or project.


2. Configuration

Create an .env file to configure the following options:

Parameter Description Default
HOST Sets the host address to listen on 127.0.0.1
PORT Sets the server port on which the application will run 8000
RELOAD Enable auto-reload on file changes for local development false
WORKERS Sets multiple worker processes 1
LOGGING_FILE Enable saving logs to files false
DEBUG Enable the debug mode and debug logs false
PROGRESS_BAR Displays the progress bar of the request process true
  • The RELOAD and WORKERS options are mutually exclusive.

  • Setting the HOST to 0.0.0.0 makes the application externally available.

Note

The address 0.0.0.0 is not a valid domain for the Cross-Origin-Opener-Policy, use localhost instead.

  • Set WORKERS, maximum 4, to start multiple server processes.

  • In production, RELOAD, DEBUG, and PROGRESS_BAR are automatically disabled.

Tip

Take a look at the .env.example file.


3. Run

3.1. Set Up a Python Venv

You will need Python3.12 with Pip and Venv installed.

# Create new Venv (.venv)
make venv

# Activate Venv
source .venv/bin/activate

3.2. Development (Poetry)

Install Poetry with all dependencies: app, dev, tests, docs, and run with Uvicorn.

# Install all dependencies groups from pyproject.toml with Poetry
(.venv) make install-dev

# Run with Poetry
(.venv) make run-dev

3.3. Production (Pip)

Install only the main dependencies: app and run with Gunicorn.

# Install only main dependencies from requirements.txt with Pip
(.venv) make install-prod

# Run with Gunicorn
(.venv) make run-prod

3.4. Docker

You will need Docker installed. Build the scopus-survey-api image from the Dockerfile, install only the main dependencies from requirements.txt with Pip, and run with Uvicorn.

# Run in Docker Container from Dockerfile
make docker

# Follow and show the last logs
make docker-logs

4. Important Information

4.1. Data Source

We declare that all use of the Scopus® database and its APIs, owned and maintained by © Elsevier B.V., is intended only for non-commercial academic research, without implying endorsement or affiliation, and is subject to our Terms, as well as Elsevier's Terms and Scopus's Policy. All data we handle is retrieved and obtained "AS IS" and, therefore, despite its known reliability, we do not guarantee or assume responsibility for any errors or inaccuracies in the data in the Scopus database.

Caution

You are strictly prohibited from misuse or attempt to misuse data obtained from the Scopus APIs in violation of Elsevier API Service Agreement.

4.2. Data Manipulation

In general, the data will be preserved without any direct alteration; however, since they are obtained "AS IS", it will need to be properly validated based on the HTTP response fields from the APIs:

  • Those that returned a value will be kept as is;
  • Those that did not return any value will be set to "null" by default;
  • The "authors" field will be set to the first author ("dc:creator") or all authors ("authors") concatenated, depending on what is returned.

Finally, the documents will be filtered and removed in the following order:

  1. Exact duplicates, where the first one will be kept.
  2. Exactly the same title and same author(s), where the first one will be kept.
  3. Same author(s) with similar titles, where the one with the most recent publication date will be kept.

4.3. Search

In accordance with the API Service Agreement and Use Policies, Elsevier will issue you an API Key that grants you a limited license to use the Scopus APIs, so that you can properly authenticate to query the Scopus database. It can be obtained by accessing the Elsevier Developer Portal and registering. If you are part of an educational institution, you can try to signing in using your organization's or academic email.

About the fields we use in the search to produce more relevant results:

  1. The combined field "TITLE-ABS-KEY" to simultaneously search for keyword combinations in abstracts, keywords, and titles, and retrieve the literature where they are found.
  2. The "date" and "sort" fields to delimit the period of interest for publications, and sort by year and date of publication and by relevance.
  3. Other optional additional fields that we can send by combining them with the Boolean operator "AND", such as subject area and language.

The searches will be conducted as follows:

  1. Retrieve the total number of results found for each keyword combination, concatenating them with the Boolean operator "AND".
  2. Effectively perform the final search with the selected combination and obtain the Scopus ID of each result in the pagination.
  3. Retrieve a complete dataset with comprehensive metadata by Scopus ID, obtaining all fields with relevant bibliographic information for each result of the previous search.

4.4. Institutional Network Required

Please be aware that the API Key will only authenticate correctly if you submit it while using your academic institution's network, which must be registered with Elsevier. This does not include VPN or proxy access. Therefore, if you are fully remote and off-campus, some data may not be returned.

4.5. Quota and Rate Limits Important

There's a maximum limit to the number of requests we can make to Scopus APIs using your API Key. This request quota resets every seven days, is unique to each API, and you can check its availability in the details panel after each operation. If requests exceed the quota or throttling rate, an error will be returned. See the API Key Settings.

Scopus API Weekly Quota Rate Limit
Search API 20,000 9req/s
Abstract Retrieval API 10,000 9req/s

4.6. Async HTTP Client

To avoid exceeding the API's request Rate Limit and Quota when making several requests, we built an asynchronous HTTP client with flow control and error handling mechanisms to handle this large volume of requests concurrently, while respecting the API limits based on the total number of requests to be made. We employ:

  • asyncio.Semaphore and asyncio.sleep to control concurrency and insert additional delays;
  • aiohttp.ClientSession and aiohttp.ClientTimeout to manage the client session, timeout, and connection;
  • aiolimiter.AsyncLimiter for rate limiting;
  • aiohttp_retry.RetryClient and aiohttp_retry.JitterRetry for automatic retry mechanisms, with jitter, backoff, and timeout.

For retries, up to 3 attempts will be made, and for rate limiting, a dynamic strategy will be used based on the number of requests to be made.

Requests Rate Limit Backoff Factor Sleep Concurrent Requests
100 8.0 req/s 2.0 0.0s 10
500 6.0 req/s 3.0 0.15s 5
1000 5.0 req/s 3.5 0.25s 3
2000 4.0 req/s 4.5 0.35s 2

5. Fields

5.1. Form Multi-Steps

  • STEP 1 - API Key

In this step, you will need to enter the API Key, issued by Elsevier. This is the main parameter without which the application cannot be run, as it is necessary for correct authentication and use of the Scopus APIs. If you have already conducted a survey before, you can also try downloading the previously generated CSV file, which may still be stored.

  • STEP 2 - Additional Params

In this step, you can enter and select multiple fields that will be combined using the AND operator and sent, when filled in, as parameters to perform a Boolean query in the Scopus database and produce more relevant results.

  • STEP 3 - Keyword Combination

In this step, you must select keywords based on the theme or subject of your research. These keywords will be concatenated using the Boolean operator AND, generating all possible combinations. Finally, the total number of documents found in Scopus will be retrieved, searching abstracts, keywords, and titles for each combination, thus narrowing the scope of your search based on the chosen combination and the total results returned.

  • STEP 4 - Final Survey

In this final step, all filled fields will be submitted for systematic information survey, removing duplicates and filtering similar documents, leaving only the most relevant and recent data.

5.2. Required Fields Required

  • API Key: The API key issued by Elsevier, obtained by accessing the Elsevier Portal and registering.
  • Keywords: The Keywords, with a minimum of two (required) and a maximum of four, that the documents you are searching for contain. They must be written in English, with a maximum of 70 characters, and can contain letters, numbers, spaces, hyphens, underscores, phrases, wildcards, and the Boolean operators OR and AND NOT.

Warning

Because AND NOT can generate unexpected results, it should be used in the last field.

  • Combination: The keyword combination option that best suits your needs based on the total number of documents found.

5.3. Optional Fields Optional

  • Date: The range of years, from the last ten years to the current year, as the target of interest for published articles. By default, the last three years are considered.
  • Doctype: The type in which the document is classified.
  • Pubstage: The publication stage of the document.
  • Language: The language in which the original document was written.
  • Open Access: Whether the indexed content is open access or not.
  • Srctype: The type of source from which the document originates.
  • Subjarea: The subject area in which the document is classified.
  • Pages: Whether the document is short (up to 4 pages, such as research notes) or complete (5 pages or more), by the number of pages.
  • Similarity Threshold: The threshold value, in the range of 0 to 100, used to filter documents with the same author(s) and similar titles, keeping the one with the most recent publication date.

6. Results

6.1. Fields Retrieved

Mapped fields of the CSV file

Field Column Description
link ref=scopus Article Preview Page URL Scopus article preview page URL
dc:identifier Scopus ID Article Scopus ID
authors or dc:creator Authors Complete author list or only the first author
dc:title Title Article title
prism:publicationName Publication Name Source title
dc:description Abstract Article complete abstract
prism:coverDate Date Article complete abstract
eid Electronic ID Article Electronic ID
prism:doi DOI Document Object Identifier
prism:volume Volume Identifier for a serial publication
citedby-count Citations Cited-by count

6.2. CSV Metadada

Since the result of the survey is a CSV file, which is essentially a dataset obtained from the Scopus APIs, we must acknowledge both Scopus and Elsevier as data sources. Therefore, we will add some metadata at the top of the file (4 lines) as comments indicating the parameters used, survey details, and the date the data was obtained.

Example:

# GeneratedBy: ScopusSurveyAPI https://github.com/mauprogramador/scopus-survey-api
# Params: api_key=..., date=2022-2025, keywords=['Python', 'Web API', 'Scopus', 'bibliographic survey'], combination=Python AND Web API AND Scopus, ratio=80
# Survey: total=5, items_per_page=5, pages_count=1, loss=0doc / 0.00%
# Source: data retrieved from Scopus APIs on November 22, 2025 via http://api.elsevier.com and http://www.scopus.com.

It is possible to perform more than one survey using the keyword combinations from the table. Therefore, in order to avoid confusion, we will save the CSV files with the combination used to produce that result by default.

Example:

Combination Filename
Python AND Scopus [API Key]_python-scopus_docs.csv
Python AND Scopus AND Web API [API Key]_python-scopus-web-api_docs.csv

6.3. Visualization

To visualize the documents (at least the preview), you can use:

6.4. Performance

Keywords Total gather Process time Loss
Web API AND Scopus 25 3.63s 0doc / 0.00%
Python AND Scopus 141 19.26s 1doc / 0.71%
Bibliographic Survey 1073 246.38s (4.10m) 7doc / 0.65%

For questions or concerns please contact me at sir.silvabmauricio@gmail.com.

Terms of Service  •  Privacy Policy  •  Cookie Policy  •  Attributions

License  •  Translations  •  Latest Release  •  Changelog