Web API for bibliographic survey of Scopus articles
🌐 Idiomas: Leia em Português [pt-BR]
Instituto Federal de Educação, Ciência e Tecnologia de Mato Grosso do Sul • IFMS Campus Três Lagoas
Tecnologia em Análise e Desenvolvimento de Sistemas • TADS
Federal Institute of Education, Science and Technology of Mato Grosso do Sul
Technology in Systems Analysis and Development
Data provided by Scopus® • © Elsevier
- Documentation: https://mauprogramador.github.io/scopus-survey-api/
- Web API: http://127.0.0.1:8000/v2/scopus-survey/en-US/search-articles
- Swagger UI: http://127.0.0.1:8000/
This web API is designed, within its limitations, to perform systematic bibliographic surveys using data from the Scopus database, promoting access to relevant and high-quality bibliographic sources through a simple and well-documented interface, thus reducing the initial barrier to entry for students and academics.
As a free, non-commercial academic automation tool, the application integrates multiple selection criteria, including multiple query parameters, keyword combinations, and Boolean search, with mechanisms for retrieval, validation, serialization, and customized filtering of large volumes of data from the Scopus APIs.
This way, only the most relevant and recent data will be retained and returned in a CSV file, making it suitable bibliometric studies and surveys, research, systematic reviews, etc., allowing students to quickly gather a set of peer-reviewed literature sources for a thesis or project.
Create an .env file to configure the following options:
| Parameter | Description | Default |
|---|---|---|
HOST |
Sets the host address to listen on | 127.0.0.1 |
PORT |
Sets the server port on which the application will run | 8000 |
RELOAD |
Enable auto-reload on file changes for local development | false |
WORKERS |
Sets multiple worker processes | 1 |
LOGGING_FILE |
Enable saving logs to files | false |
DEBUG |
Enable the debug mode and debug logs | false |
PROGRESS_BAR |
Displays the progress bar of the request process | true |
-
The
RELOADandWORKERSoptions are mutually exclusive. -
Setting the
HOSTto0.0.0.0makes the application externally available.
Note
The address 0.0.0.0 is not a valid domain for the Cross-Origin-Opener-Policy, use localhost instead.
-
Set
WORKERS, maximum 4, to start multiple server processes. -
In production,
RELOAD,DEBUG, andPROGRESS_BARare automatically disabled.
Tip
Take a look at the .env.example file.
You will need Python3.12 with Pip and Venv installed.
# Create new Venv (.venv)
make venv
# Activate Venv
source .venv/bin/activateInstall Poetry with all dependencies: app, dev, tests, docs, and run with Uvicorn.
# Install all dependencies groups from pyproject.toml with Poetry
(.venv) make install-dev
# Run with Poetry
(.venv) make run-devInstall only the main dependencies: app and run with Gunicorn.
# Install only main dependencies from requirements.txt with Pip
(.venv) make install-prod
# Run with Gunicorn
(.venv) make run-prodYou will need Docker installed. Build the scopus-survey-api image from the Dockerfile, install only the main dependencies from requirements.txt with Pip, and run with Uvicorn.
# Run in Docker Container from Dockerfile
make docker
# Follow and show the last logs
make docker-logsWe declare that all use of the Scopus® database and its APIs, owned and maintained by © Elsevier B.V., is intended only for non-commercial academic research, without implying endorsement or affiliation, and is subject to our Terms, as well as Elsevier's Terms and Scopus's Policy. All data we handle is retrieved and obtained "AS IS" and, therefore, despite its known reliability, we do not guarantee or assume responsibility for any errors or inaccuracies in the data in the Scopus database.
Caution
You are strictly prohibited from misuse or attempt to misuse data obtained from the Scopus APIs in violation of Elsevier API Service Agreement.
In general, the data will be preserved without any direct alteration; however, since they are obtained "AS IS", it will need to be properly validated based on the HTTP response fields from the APIs:
- Those that returned a value will be kept as is;
- Those that did not return any value will be set to "
null" by default; - The "
authors" field will be set to the first author ("dc:creator") or all authors ("authors") concatenated, depending on what is returned.
Finally, the documents will be filtered and removed in the following order:
- Exact duplicates, where the first one will be kept.
- Exactly the same title and same author(s), where the first one will be kept.
- Same author(s) with similar titles, where the one with the most recent publication date will be kept.
In accordance with the API Service Agreement and Use Policies, Elsevier will issue you an API Key that grants you a limited license to use the Scopus APIs, so that you can properly authenticate to query the Scopus database. It can be obtained by accessing the Elsevier Developer Portal and registering. If you are part of an educational institution, you can try to signing in using your organization's or academic email.
About the fields we use in the search to produce more relevant results:
- The combined field "
TITLE-ABS-KEY" to simultaneously search for keyword combinations in abstracts, keywords, and titles, and retrieve the literature where they are found. - The "
date" and "sort" fields to delimit the period of interest for publications, and sort by year and date of publication and by relevance. - Other optional additional fields that we can send by combining them with the Boolean operator "
AND", such as subject area and language.
The searches will be conducted as follows:
- Retrieve the total number of results found for each keyword combination, concatenating them with the Boolean operator "
AND". - Effectively perform the final search with the selected combination and obtain the Scopus ID of each result in the pagination.
- Retrieve a complete dataset with comprehensive metadata by Scopus ID, obtaining all fields with relevant bibliographic information for each result of the previous search.
Please be aware that the API Key will only authenticate correctly if you submit it while using your academic institution's network, which must be registered with Elsevier. This does not include VPN or proxy access. Therefore, if you are fully remote and off-campus, some data may not be returned.
There's a maximum limit to the number of requests we can make to Scopus APIs using your API Key. This request quota resets every seven days, is unique to each API, and you can check its availability in the details panel after each operation. If requests exceed the quota or throttling rate, an error will be returned. See the API Key Settings.
| Scopus API | Weekly Quota | Rate Limit |
|---|---|---|
| Search API | 20,000 | 9req/s |
| Abstract Retrieval API | 10,000 | 9req/s |
To avoid exceeding the API's request Rate Limit and Quota when making several requests, we built an asynchronous HTTP client with flow control and error handling mechanisms to handle this large volume of requests concurrently, while respecting the API limits based on the total number of requests to be made. We employ:
asyncio.Semaphoreandasyncio.sleepto control concurrency and insert additional delays;aiohttp.ClientSessionandaiohttp.ClientTimeoutto manage the client session, timeout, and connection;aiolimiter.AsyncLimiterfor rate limiting;aiohttp_retry.RetryClientandaiohttp_retry.JitterRetryfor automatic retry mechanisms, with jitter, backoff, and timeout.
For retries, up to 3 attempts will be made, and for rate limiting, a dynamic strategy will be used based on the number of requests to be made.
| Requests | Rate Limit | Backoff Factor | Sleep | Concurrent Requests |
|---|---|---|---|---|
| 100 | 8.0 req/s | 2.0 | 0.0s | 10 |
| 500 | 6.0 req/s | 3.0 | 0.15s | 5 |
| 1000 | 5.0 req/s | 3.5 | 0.25s | 3 |
| 2000 | 4.0 req/s | 4.5 | 0.35s | 2 |
- STEP 1 - API Key
In this step, you will need to enter the API Key, issued by Elsevier. This is the main parameter without which the application cannot be run, as it is necessary for correct authentication and use of the Scopus APIs. If you have already conducted a survey before, you can also try downloading the previously generated CSV file, which may still be stored.
- STEP 2 - Additional Params
In this step, you can enter and select multiple fields that will be combined using the AND operator and sent, when filled in, as parameters to perform a Boolean query in the Scopus database and produce more relevant results.
- STEP 3 - Keyword Combination
In this step, you must select keywords based on the theme or subject of your research. These keywords will be concatenated using the Boolean operator AND, generating all possible combinations. Finally, the total number of documents found in Scopus will be retrieved, searching abstracts, keywords, and titles for each combination, thus narrowing the scope of your search based on the chosen combination and the total results returned.
- STEP 4 - Final Survey
In this final step, all filled fields will be submitted for systematic information survey, removing duplicates and filtering similar documents, leaving only the most relevant and recent data.
- API Key: The API key issued by Elsevier, obtained by accessing the Elsevier Portal and registering.
- Keywords: The Keywords, with a minimum of two (required) and a maximum of four, that the documents you are searching for contain. They must be written in English, with a maximum of 70 characters, and can contain letters, numbers, spaces, hyphens, underscores, phrases, wildcards, and the Boolean operators
ORandAND NOT.
Warning
Because AND NOT can generate unexpected results, it should be used in the last field.
- Combination: The keyword combination option that best suits your needs based on the total number of documents found.
- Date: The range of years, from the last ten years to the current year, as the target of interest for published articles. By default, the last three years are considered.
- Doctype: The type in which the document is classified.
- Pubstage: The publication stage of the document.
- Language: The language in which the original document was written.
- Open Access: Whether the indexed content is open access or not.
- Srctype: The type of source from which the document originates.
- Subjarea: The subject area in which the document is classified.
- Pages: Whether the document is short (up to 4 pages, such as research notes) or complete (5 pages or more), by the number of pages.
- Similarity Threshold: The threshold value, in the range of 0 to 100, used to filter documents with the same author(s) and similar titles, keeping the one with the most recent publication date.
Mapped fields of the CSV file
| Field | Column | Description |
|---|---|---|
link ref=scopus |
Article Preview Page URL | Scopus article preview page URL |
dc:identifier |
Scopus ID | Article Scopus ID |
authors or dc:creator |
Authors | Complete author list or only the first author |
dc:title |
Title | Article title |
prism:publicationName |
Publication Name | Source title |
dc:description |
Abstract | Article complete abstract |
prism:coverDate |
Date | Article complete abstract |
eid |
Electronic ID | Article Electronic ID |
prism:doi |
DOI | Document Object Identifier |
prism:volume |
Volume | Identifier for a serial publication |
citedby-count |
Citations | Cited-by count |
Since the result of the survey is a CSV file, which is essentially a dataset obtained from the Scopus APIs, we must acknowledge both Scopus and Elsevier as data sources. Therefore, we will add some metadata at the top of the file (4 lines) as comments indicating the parameters used, survey details, and the date the data was obtained.
Example:
# GeneratedBy: ScopusSurveyAPI https://github.com/mauprogramador/scopus-survey-api
# Params: api_key=..., date=2022-2025, keywords=['Python', 'Web API', 'Scopus', 'bibliographic survey'], combination=Python AND Web API AND Scopus, ratio=80
# Survey: total=5, items_per_page=5, pages_count=1, loss=0doc / 0.00%
# Source: data retrieved from Scopus APIs on November 22, 2025 via http://api.elsevier.com and http://www.scopus.com.It is possible to perform more than one survey using the keyword combinations from the table. Therefore, in order to avoid confusion, we will save the CSV files with the combination used to produce that result by default.
Example:
| Combination | Filename |
|---|---|
| Python AND Scopus | [API Key]_python-scopus_docs.csv |
| Python AND Scopus AND Web API | [API Key]_python-scopus-web-api_docs.csv |
To visualize the documents (at least the preview), you can use:
-
The URL in the Article Preview Page URL column:
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=[SCOPUS_ID]&origin=inward -
The Digital Object Identifier (DOI) in the DOI column:
https://doi.org/[DOI]
| Keywords | Total gather | Process time | Loss |
|---|---|---|---|
| Web API AND Scopus | 25 | 3.63s | 0doc / 0.00% |
| Python AND Scopus | 141 | 19.26s | 1doc / 0.71% |
| Bibliographic Survey | 1073 | 246.38s (4.10m) | 7doc / 0.65% |
Tip
Download a sample survey CSV file and take a look.
For questions or concerns please contact me at sir.silvabmauricio@gmail.com.
Terms of Service • Privacy Policy • Cookie Policy • Attributions
