Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions src/content/docs/storage/buckets/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Buckets
slug: 'storage/buckets'
---
title: Buckets
slug: 'storage/buckets'
---

Buckets are containers for tables in Storage. They are further organized into the following two **stages**:
Expand All @@ -9,7 +9,7 @@ Buckets are containers for tables in Storage. They are further organized into th
2. **out** --- for processed data (usually results of transformations or applications)

The distinction between the input and output stages is purely conventional differentiation between raw and processed data.
When creating a new bucket, select one of the stages and a suitable [database backend](/storage/#backend-properties) based on its properties.
When creating a new bucket, select one of the stages and a suitable [database backend](/storage/#storage-backend-types-and-features) based on its properties.
For information on how to load data into Storage, see the corresponding part of our [tutorial](/tutorial/load/).

![Screenshot - Create bucket](/storage/buckets/create-bucket.png)
Expand Down
12 changes: 6 additions & 6 deletions src/content/docs/storage/byodb/external-buckets/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ slug: 'storage/byodb/external-buckets'

If you operate Keboola in Bring-Your-Own-Database (BYODB) mode using your own data warehouse, the data in the warehouse won't automatically be visible or accessible within Keboola. To address this, we offer the **External Datasets** feature.

The implementation of **External Datasets requires the [BYODB](https://help.keboola.com/storage/byodb/) to be enabled first**.
The implementation of **External Datasets requires the [BYODB](/storage/byodb/) to be enabled first**.
Unless specified otherwise, this description refers to the implementation of Snowflake and BigQuery.

## What Is an External Dataset?
Expand Down Expand Up @@ -37,14 +37,14 @@ grant Keboola correct access to the schema in your Snowflake. Once access has be

### BigQuery
Fill in the **name** of the new dataset and **dataset** name. Click **Next Step**. Keboola will generate a code that you can use to grant Keboola
correct access to the dataset in BigQuery. Once access has been grated, click **Register Dataset** to start using it.
correct access to the dataset in BigQuery. Once access has been granted, click **Register Dataset** to start using it.

:::note
**Note:** By adding the Keboola service account as a subscriber, you enable read-only access to the data.
:::

### BigLake Tables
Keboola generaly does not support external tables, except for [BigLake tables](https://cloud.google.com/bigquery/docs/create-cloud-storage-table-biglake).
Keboola generally does not support external tables, except for [BigLake tables](https://cloud.google.com/bigquery/docs/create-cloud-storage-table-biglake).
Please ensure that any table you are using is of this type. External tables of other types will not work in transformations and workspaces due to permission issues.

Please ensure that you can perform a `SELECT * FROM <table> LIMIT 1` query on your created BigLake table. Keboola checks this during the registration process.
Expand Down Expand Up @@ -109,7 +109,7 @@ WHERE `source` = "mql";
:::

## Sharing an External Dataset
It is possible to share an external dataset using the same process as [any other Storage bucket](https://help.keboola.com/catalog/#enable-sharing). Once the bucket is shared, the refresh operation is only available in the source project (the project where the external dataset was registered). Currently, it is possible to share entire buckets, not specific tables within them.
It is possible to share an external dataset using the same process as [any other Storage bucket](/catalog/#enable-sharing). Once the bucket is shared, the refresh operation is only available in the source project (the project where the external dataset was registered). Currently, it is possible to share entire buckets, not specific tables within them.

### Snowflake
Sharing a Snowflake external dataset works out of the box — no additional configuration is required beyond the standard bucket sharing flow.
Expand All @@ -132,7 +132,7 @@ The scope of this custom role depends on where your external datasets live:
Grant the built-in `roles/analyticshub.listingAdmin` role to the Keboola service account on your listing. This role includes the required permissions, but also covers additional capabilities (such as updating or deleting the listing) that Keboola does not use.

:::note
**Note:** Sharing permissions can be granted at any time after initial registration, but the registration process navigates you to provide such permissions to enable sharing. If not provided during the reigstration (e.g. for the previously registered datasets) Keboola detects the change on the next refresh and enables sharing from that point on. Revoking the permission will prevent new shares; projects that are already linked remain unaffected.
**Note:** Sharing permissions can be granted at any time after initial registration, but the registration process navigates you to provide such permissions to enable sharing. If not provided during the registration (e.g. for the previously registered datasets) Keboola detects the change on the next refresh and enables sharing from that point on. Revoking the permission will prevent new shares; projects that are already linked remain unaffected.
:::

:::note
Expand Down Expand Up @@ -160,7 +160,7 @@ If you wish to remove the schema, you must do so manually in your warehouse.
* Table names can't be longer than **92 characters** and can contain only **alphanumeric** characters, **dashes**, and **underscores**. Tables that do not meet these requirements **will be ignored**.
* Table names are not case-sensitive. You cannot create two tables with the same name that differ only in letter case.
* [Creating snapshots](https://api.keboola.com/?service=storage#post-/v2/storage/branch/-branchId-/tables/-id-/snapshots) from tables in external buckets is not supported.
* A read-only input mapping with an external dataset has a limitation. If you delete and recreate a registered table in the source schema, our [read-only input mapping](/workspace/#read-only-input-mapping) will lose access to this table. This occurs because we aim to limit clients from having excessive permissions, such as [OWNERSHIP](https://docs.snowflake.com/en/sql-reference/sql/grant-privilege#restrictions-and-limitations), on their external schemas. **However, manually refreshing the bucket addressess this issue.** <br> To permanently resolve this issue, you can manually grant the read-only input mapping role future access to your tables and views as illustrated below: <br>
* A read-only input mapping with an external dataset has a limitation. If you delete and recreate a registered table in the source schema, our [read-only input mapping](/workspace/#read-only-input-mapping) will lose access to this table. This occurs because we aim to limit clients from having excessive permissions, such as [OWNERSHIP](https://docs.snowflake.com/en/sql-reference/sql/grant-privilege#restrictions-and-limitations), on their external schemas. **However, manually refreshing the bucket addresses this issue.** <br> To permanently resolve this issue, you can manually grant the read-only input mapping role future access to your tables and views as illustrated below: <br>
```
GRANT SELECT ON FUTURE TABLES IN SCHEMA "REPORTING"."sales_schema" TO ROLE KEBOOLA_8_RO;
GRANT SELECT ON FUTURE VIEWS IN SCHEMA "REPORTING"."sales_schema" TO ROLE KEBOOLA_8_RO;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ This guide explains how to share a database and its objects with one or more acc
The process involves **two roles**:

- **Producer:** The Snowflake account that owns and shares the data.
- **Consumer:** The Snowflake account that accesses the shared data. This account must be used in Keboola as [BYODB](https://help.keboola.com/storage/byodb/#main-header).
- **Consumer:** The Snowflake account that accesses the shared data. This account must be used in Keboola as [BYODB](/storage/byodb/).

## Producer Workflow
As the producer, your role involves creating the share, adding the necessary database objects, and granting access to consumer accounts. Follow these steps to configure the share:
Expand Down
4 changes: 2 additions & 2 deletions src/content/docs/storage/tables/csv-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ The resulting file `import-data.txt.csv` can now be imported into Keboola Storag
When you export a table from Storage, the same format is used for import:

- **Delimiter** is set to comma `,`.
- **Enclosure** is set double quote `"`.
- **Enclosure** is set to double quote `"`.
- Enclosure is escaped by preceding it with another enclosure character.
- Header row is always present.
- Unix line breaks are used (LF -- \n)
Expand All @@ -90,7 +90,7 @@ The above format is again compatible with many applications; you can
- open it in OpenOffice / LibreOffice Calc without any conversion (just make sure you use only comma as a delimiter
when asked about the file format).
- import it into Google Drive without any conversion (notice, however, that you might want to
use the Google Drive Writer instead)
use the [Google Drive data destination connector](/components/writers/storage/google-drive/) instead)
- import it into Microsoft Excel by following the below instructions.

*Note: The rows are exported in random order and there is no way to specify ordering of rows in the exported file.*
Expand Down
18 changes: 9 additions & 9 deletions src/content/docs/storage/tables/index.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
title: Tables
slug: 'storage/tables'
---
title: Tables
slug: 'storage/tables'
---





The *Table Storage* for your project is available under the **Tables** tab in the Storage section.
All data tables are organized into [buckets](/storage/buckets/) that can also be
used to [share tables](/catalog/) between projects.
Expand Down Expand Up @@ -123,7 +123,7 @@ Their uniqueness is checked and the data are de-duplicated. The result table loo

The order of rows in the imported file is not important and is not kept. That means that from each of
the duplicate rows a randomly selected one is kept and all others are discarded.
In our example, the rows `John,$150`, `John,$340` and `Darla,$60000` were discarded.
In our example, the rows `John,$150`, `John,$340` and `Darla,$600000` were discarded.

With a primary key defined on **multiple columns**, the combination of their values is unique.
Let's say you have a table with three columns: `name`, `age` and `money`. The primary key is defined
Expand Down Expand Up @@ -187,9 +187,9 @@ The above applies only when **incremental load** is used.
When an incremental load is not used, the contents of the target table are cleared before the load. When a primary key
is not defined and an incremental load is used, it simply appends the data to the table and does not update anything.

#### Difference between tables with [native datatypes](/storage/tables/data-types/#native-datatypes) and string tables
#### Difference between tables with [native datatypes](/storage/tables/data-types/) and string tables

There is significant change when loading incrementally into table with native datatypes on. If a table does not have native datatypes eanbled during incremental loading, the `_timestamp` column is updated based on the primary key only when a value in the row changes. In tables with native datatypes, the `_timestamp` column is updated every time when duplicate primary keys are imported. This behavior has an impact on [incremental processing](/storage/tables/#incremental-processing). When rows with duplicate primary keys are imported into tables with native types, they are treated as new rows.
There is significant change when loading incrementally into table with native datatypes on. If a table does not have native datatypes enabled during incremental loading, the `_timestamp` column is updated based on the primary key only when a value in the row changes. In tables with native datatypes, the `_timestamp` column is updated every time when duplicate primary keys are imported. This behavior has an impact on [incremental processing](/storage/tables/#incremental-processing). When rows with duplicate primary keys are imported into tables with native types, they are treated as new rows.

**Example:**

Expand Down Expand Up @@ -249,7 +249,7 @@ Here we can see a **significant change in the incremental load**. The `_timestam
| existing row, no new values => |7|Edith|ED-BT-13| 9471 |1996-12-18|
| new row => |8|Kate|CD-CZ-01| 5282 |2008-06-07|
| new row => |9|Josh|BA-AB-11| 6624 |2004-10-04|
| new row => |10|Arthur|EE-FF-66| 596 2021-04-06 |
| new row => |10|Arthur|EE-FF-66| 596 |2021-04-06 |

- Result of incremental import A3

Expand Down
10 changes: 5 additions & 5 deletions src/content/docs/storage/tables/uploads.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Table Import & Export
slug: 'storage/tables/uploads'
---
title: Table Import & Export
slug: 'storage/tables/uploads'
---

All tables imported to and exported from Storage go through [Files](/storage/files/).
Expand All @@ -10,14 +10,14 @@ the CSV file is first stored in *Files* and only then imported to an actual tabl
This means that the Storage Files contain a history of data uploaded to the Storage Tables.
It is useful mainly in the two following cases:

1. [Reverting table](/storage/tables/#events) content to a particular imported version
1. [Reverting table](/storage/tables/) content to a particular imported version
2. Analyzing how something got into a table (useful mainly for incremental loads)

Every time a table is **exported** from Storage, the process is reversed: first, a file is
created in *Files* and then it is actually downloaded from there. This does not apply when exporting
Storage tables manually though.
Beware, however, that due to the nature of database exports, the exported table may be **sliced** and require
[substantial effort to reconstruct](http://developers.keboola.com/integrate/storage/api/import-export/#working-with-sliced-files).
[substantial effort to reconstruct](https://developers.keboola.com/integrate/storage/api/import-export/#working-with-sliced-files).
To make sure your tables are exported as merged files, always use the **Export** feature in
the **Action** tab of the table detail:

Expand Down
Loading