Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -159,8 +159,8 @@ and **New Files Only** to create a configuration that incrementally loads all ne
- **Add Row Number Column**: A new column `s3_row_filename` is added to the table and will contain the row number in each
of the downloaded files.

The data source connector also supports [Advanced mode](/components/#advanced-mode), all supported
The data source connector also supports [Debug mode](/components/#debug-mode), all supported
parameters are described in the [GitHub repository](https://github.com/keboola/aws-s3-extractor).

## Limitations
All files stored in [AWS Glacier](https://aws.amazon.com/glacier/) are ignored.
All files stored in [AWS Glacier](https://aws.amazon.com/s3/glacier/) are ignored.
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ you save them to Keboola Storage.
## Configuration
[Create a new configuration](/components/#creating-component-configuration) of the **Azure Datalake Gen 2** connector.

In order to access the files in you need to prepare an account name, account key, and file system.
In order to access the files in Azure Datalake Gen2, you need to prepare an account name, account key, and file system.

### Authentication
Fill in the Account name, key and file system you wish to retrieve data from.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,5 +71,5 @@ adds the row number in the source file.

![Screenshot - FTP extractor audit](/components/extractors/storage/ftp/ftp-6.png)

The connector also supports [Advanced mode](/components/#advanced-mode), all supported
The connector also supports [Debug mode](/components/#debug-mode), all supported
parameters are described in the [GitHub repository](https://github.com/keboola/ex-ftp).
Original file line number Diff line number Diff line change
Expand Up @@ -61,5 +61,5 @@ There are three options for determining column names:

**Primary Key** can be used to specify the primary key in Storage, which can be used with **Incremental Load**.

The data source connector also supports [Advanced mode](/components/#advanced-mode), all supported
The data source connector also supports [Debug mode](/components/#debug-mode), all supported
parameters are described in the [GitHub repository](https://github.com/keboola/http-extractor).
18 changes: 9 additions & 9 deletions src/content/docs/components/extractors/storage/index.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
---
title: Storage Data Source Connectors
slug: 'components/extractors/storage'
redirect_from:
- /extractors/storage/
---
title: Storage Data Source Connectors
slug: 'components/extractors/storage'
redirect_from:
- /extractors/storage/
---

Data source connectors import data from external sources and integrate it to the Keboola environment.
The following data source connectors allow access to data from generic storage services:

- [AWS S3](/components/extractors/storage/aws-s3) --- imports CSV files from multiple AWS S3 buckets into multiple tables with additional postprocessing.
- [Azure Datalake Gen2](/components/extractors/storage/azure-datalake-gen2) --- imports CSV files from Azure Datalake Gen2 into multiple tables.
- [FTP](/components/extractors/storage/ftp) --- imports CSV files from the FTP, FTPS, and SFTP servers.
- [GoogleDrive](/components/extractors/storage/google-drive/) --- imports data from Google Sheets (also part of the [Tutorial](/tutorial/load/googlesheets/)).
- [AWS S3](/components/extractors/storage/aws-s3/) --- imports CSV files from multiple AWS S3 buckets into multiple tables with additional postprocessing.
- [Azure Datalake Gen2](/components/extractors/storage/azure-datalake-gen2/) --- imports CSV files from Azure Datalake Gen2 into multiple tables.
- [FTP](/components/extractors/storage/ftp/) --- imports CSV files from the FTP, FTPS, and SFTP servers.
- [Google Drive](/components/extractors/storage/google-drive/) --- imports data from Google Sheets (also part of the [Tutorial](/tutorial/load/googlesheets/)).
- [HTTP](/components/extractors/storage/http/) --- imports CSV files stored on HTTP or HTTPS.
- [Keboola Storage](/components/extractors/storage/storage-api/) --- loads single or multiple tables from a Keboola project and stores them in a bucket in your current project; can be used where [Data Catalog](/catalog/) cannot.
- [OneDrive Excel Sheets](/components/extractors/storage/onedrive-excel-sheets/) --- imports data from OneDrive Excel sheets.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
title: OneDrive Files
slug: 'components/extractors/storage/onedrive-files'
---
title: OneDrive Files
slug: 'components/extractors/storage/onedrive-files'
---





The OneDrive Files data source connector downloads files from the [Microsoft OneDrive](https://www.microsoft.com/en-us/microsoft-365/onedrive/online-cloud-storage) cloud storage and stores them in your project.

The Microsoft OneDrive cloud storage integrates with the [Office365](https://www.office.com/) and [SharePoint](https://www.microsoft.com/en-us/microsoft-365/sharepoint/collaboration) sites.
Expand All @@ -31,7 +31,7 @@ The **File Path** parameter defines the location of the file/s that you will be
- **`*.csv`**: Downloads all available CSV files.
- **`/reports/*.csv`**: Downloads all available CSV files from the "reports" folder and its subfolders.
- **`db_exports/report_*.xlsx`**: Downloads all .xlsx files named "report_*" (* is a wildcard) from the "db_exports" folder and its subfolders.
- **`db_exports/2022_*/.csv`**: Downloads all CSV files from folders matching "db_exports/2022_*" (* is a wildcard).
- **`db_exports/2022_*/*.csv`**: Downloads all CSV files from folders matching "db_exports/2022_*" (* is a wildcard).

**new_files_only (optional)**: If set to true, the component will use the timestamp of the freshest file downloaded at the last run to download only newer files. The `LastModifiedAt` value from the GraphAPI is used.

Expand Down
Loading