Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/gh-pages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ jobs:
# extended: true

- name: Build
run: hugo --minify
run: hugo --minify --panicOnWarning

- name: Copy .asf.yaml
run: cp .asf.yaml ./public
Expand Down
44 changes: 22 additions & 22 deletions content/docs/latest/language/languagemanual-udf.md

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions content/docs/latest/user/hive-transactions.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ This module is responsible for discovering which tables or partitions are due fo

#### Worker

Each Worker handles a single compaction task.  A compaction is a MapReduce job with name in the following form: \<hostname\>-compactor-\<db\>.\<table\>.\<partition\>.  Each worker submits the job to the cluster (via [hive.compactor.job.queue]({{< ref "#hive-compactor-job-queue" >}}) if defined) and waits for the job to finish.  [hive.compactor.worker.threads]({{< ref "#hive-compactor-worker-threads" >}}) determines the number of Workers in each Metastore.  The total number of Workers in the Hive Warehouse determines the maximum number of concurrent compactions.
Each Worker handles a single compaction task.  A compaction is a MapReduce job with name in the following form: <hostname>-compactor-<db>.<table>.<partition>.  Each worker submits the job to the cluster (via [hive.compactor.job.queue]({{< ref "#hive-compactor-job-queue" >}}) if defined) and waits for the job to finish.  [hive.compactor.worker.threads]({{< ref "#hive-compactor-worker-threads" >}}) determines the number of Workers in each Metastore.  The total number of Workers in the Hive Warehouse determines the maximum number of concurrent compactions.

#### Cleaner

Expand Down Expand Up @@ -178,7 +178,7 @@ A number of new configuration parameters have been added to the system to suppor
| metastore.compactor.long.running.initiator.threshold.error | *Default:* 12h | Metastore | Initiator cycle duration after which an error will be logged. Default time unit is: hours |
| hive.compactor.worker.sleep.time | *Default:*10800ms | HiveServer2 | Time in milliseconds for which a worker threads goes into sleep before starting another iteration in case of no launched job or error |
| hive.compactor.worker.max.sleep.time | *Default:* 320000ms | HiveServer2 | Max time in milliseconds for which a worker threads goes into sleep before starting another iteration used for backoff in case of no launched job or error |
| [hive.compactor.worker.threads]({{< ref "#hive-compactor-worker-threads" >}}) deprecated. Use metastore.compactor.worker.threads instead. | *Default:* 0*Value required for transactions:* \> 0 on at least one instance of the Thrift metastore service | Metastore | How many compactor worker threads to run on this metastore instance.2 |
| [hive.compactor.worker.threads]({{< ref "#hive-compactor-worker-threads" >}}) deprecated. Use metastore.compactor.worker.threads instead. | *Default:* 0*Value required for transactions:* > 0 on at least one instance of the Thrift metastore service | Metastore | How many compactor worker threads to run on this metastore instance.2 |
| [hive.compactor.worker.timeout]({{< ref "#hive-compactor-worker-timeout" >}}) | *Default:* 86400s | Metastore | Time in seconds after which a compaction job will be declared failed and the compaction re-queued. |
| [hive.compactor.cleaner.run.interval]({{< ref "#hive-compactor-cleaner-run-interval" >}}) | *Default*: 5000ms | Metastore | Time in milliseconds between runs of the cleaner thread. ([Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-8258) and later.) |
| [hive.compactor.check.interval]({{< ref "#hive-compactor-check-interval" >}}) | *Default:* 300s | Metastore | Time in seconds between checks to see if any tables or partitions need to be compacted.3 |
Expand Down Expand Up @@ -244,7 +244,7 @@ If a table owner does not wish the system to automatically determine when to com

Table properties are set with the TBLPROPERTIES clause when a table is created or altered, as described in the [Create Table]({{< ref "#create-table" >}}) and [Alter Table Properties]({{< ref "#alter-table-properties" >}}) sections of Hive Data Definition Language. The "`transactional`" and "`NO_AUTO_COMPACTION`" table properties are case-sensitive in Hive releases 0.x and 1.0, but they are case-insensitive starting with release 1.1.0 ([HIVE-8308](https://issues.apache.org/jira/browse/HIVE-8308)).

More compaction related options can be set via TBLPROPERTIES as of [Hive 1.3.0 and 2.1.0](https://issues.apache.org/jira/browse/HIVE-13354). They can be set at both table-level via [CREATE TABLE](/docs/latest/language/languagemanual-ddl#createdroptruncate-table), and on request-level via [ALTER TABLE/PARTITION COMPACT](/docs/latest/language/languagemanual-ddl#alter-tablepartition-compact).  These are used to override the Warehouse/table wide settings.  For example, to override an MR property to affect a compaction job, one can add "compactor.\<mr property name\>=\<value\>" in either CREATE TABLE statement or when launching a compaction explicitly via ALTER TABLE.  The "\<mr property name\>=\<value\>" will be set on JobConf of the compaction MR job.   Similarly, "tblprops.\<prop name\>=\<value\>" can be used to set/override any table property which is interpreted by the code running on the cluster.  Finally, "compactorthreshold.\<prop name\>=\<value\>" can be used to override properties from the "New Configuration Parameters for Transactions" table above that end with ".threshold" and control when compactions are triggered by the system.  Examples:
More compaction related options can be set via TBLPROPERTIES as of [Hive 1.3.0 and 2.1.0](https://issues.apache.org/jira/browse/HIVE-13354). They can be set at both table-level via [CREATE TABLE](/docs/latest/language/languagemanual-ddl#createdroptruncate-table), and on request-level via [ALTER TABLE/PARTITION COMPACT](/docs/latest/language/languagemanual-ddl#alter-tablepartition-compact).  These are used to override the Warehouse/table wide settings.  For example, to override an MR property to affect a compaction job, one can add "compactor.<mr property name>=<value>" in either CREATE TABLE statement or when launching a compaction explicitly via ALTER TABLE.  The "<mr property name>=<value>" will be set on JobConf of the compaction MR job.   Similarly, "tblprops.<prop name>=<value>" can be used to set/override any table property which is interpreted by the code running on the cluster.  Finally, "compactorthreshold.<prop name>=<value>" can be used to override properties from the "New Configuration Parameters for Transactions" table above that end with ".threshold" and control when compactions are triggered by the system.  Examples:

**Example: Set compaction options in TBLPROPERTIES at table level**

Expand Down
24 changes: 12 additions & 12 deletions content/docs/latest/user/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ Explicit type conversion can be done using the cast operator as shown in the [#B
Complex Types can be built up from primitive types and other composite types using:

* Structs: the elements within the type can be accessed using the DOT (.) notation. For example, for a column c of type STRUCT {a INT; b INT}, the a field is accessed by the expression c.a
* Maps (key-value tuples): The elements are accessed using ['element name'] notation. For example in a map M comprising of a mapping from 'group' -\> gid the gid value can be accessed using M['group']
* Maps (key-value tuples): The elements are accessed using ['element name'] notation. For example in a map M comprising of a mapping from 'group' -> gid the gid value can be accessed using M['group']
* Arrays (indexable lists): The elements in the array have to be in the same type. Elements can be accessed using the [n] notation where n is an index (zero-based) into the array. For example, for an array A having the elements ['a', 'b', 'c'], A[1] retruns 'b'.

Using the primitive types and the constructs for creating complex types, types with arbitrary levels of nesting can be created. For example, a type User may comprise of the following fields:
Expand Down Expand Up @@ -143,7 +143,7 @@ Java's "Instant" timestamps define a point in time that remains constant regardl

#### Comparisons with other tools

| | SQL 2003 | Oracle | Sybase | Postgres | MySQL | Microsoft SQL | IBM DB2 | Presto | Snowflake | Hive \>= 3.1 | Iceberg | Spark |
| | SQL 2003 | Oracle | Sybase | Postgres | MySQL | Microsoft SQL | IBM DB2 | Presto | Snowflake | Hive >= 3.1 | Iceberg | Spark |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| timestamp | Local | Local | Local | Local | Instant | Other | Local | Local | Local | Local | Local | Instant |
| timestamp with local time zone | | Instant | | | | | | | Instant | Instant | | |
Expand Down Expand Up @@ -177,10 +177,10 @@ All Hive keywords are case-insensitive, including the names of Hive operators an
| --- | --- | --- |
| A = B | all primitive types | TRUE if expression A is equivalent to expression B; otherwise FALSE |
| A != B | all primitive types | TRUE if expression A is *not* equivalent to expression B; otherwise FALSE |
| A \< B | all primitive types | TRUE if expression A is less than expression B; otherwise FALSE |
| A \<= B | all primitive types | TRUE if expression A is less than or equal to expression B; otherwise FALSE |
| A \> B | all primitive types | TRUE if expression A is greater than expression B] otherwise FALSE |
| A \>= B | all primitive types | TRUE if expression A is greater than or equal to expression B otherwise FALSE |
| A < B | all primitive types | TRUE if expression A is less than expression B; otherwise FALSE |
| A <= B | all primitive types | TRUE if expression A is less than or equal to expression B; otherwise FALSE |
| A > B | all primitive types | TRUE if expression A is greater than expression B] otherwise FALSE |
| A >= B | all primitive types | TRUE if expression A is greater than or equal to expression B otherwise FALSE |
| A IS NULL | all types | TRUE if expression A evaluates to NULL otherwise FALSE |
| A IS NOT NULL | all types | FALSE if expression A evaluates to NULL otherwise TRUE |
| A LIKE B | strings | TRUE if string A matches the SQL simple regular expression B, otherwise FALSE. The comparison is done character by character. The _ character in B matches any character in A (similar to **.** in posix regular expressions), and the % character in B matches an arbitrary number of characters in A (similar to **.*** in posix regular expressions). For example, `'foobar' LIKE 'foo'` evaluates to FALSE where as `'foobar' LIKE 'foo___'` evaluates to TRUE and so does `'foobar' LIKE 'foo%'`. To escape % use \ (% matches one % character). If the data contains a semicolon, and you want to search for it, it needs to be escaped, `columnValue LIKE 'a\;b'` |
Expand Down Expand Up @@ -217,7 +217,7 @@ All Hive keywords are case-insensitive, including the names of Hive operators an
| **Operator** | **Operand types** | **Description** |
| --- | --- | --- |
| A[n] | A is an Array and n is an int | returns the nth element in the array A. The first element has index 0, for example, if A is an array comprising of ['foo', 'bar'] then A[0] returns 'foo' and A[1] returns 'bar' |
| M[key] | M is a Map\<K, V\> and key has type K | returns the value corresponding to the key in the map for example, if M is a map comprising of {'f' -\> 'foo', 'b' -\> 'bar', 'all' -\> 'foobar'} then M['all'] returns 'foobar' |
| M[key] | M is a Map<K, V> and key has type K | returns the value corresponding to the key in the map for example, if M is a map comprising of {'f' -> 'foo', 'b' -> 'bar', 'all' -> 'foobar'} then M['all'] returns 'foobar' |
| S.x | S is a struct | returns the x field of S, for example, for struct foobar {int foo, int bar} foobar.foo returns the integer stored in the foo field of the struct. |

### Built In Functions
Expand All @@ -242,9 +242,9 @@ All Hive keywords are case-insensitive, including the names of Hive operators an
| string | ltrim(string A) | returns the string resulting from trimming spaces from the beginning(left hand side) of A. For example, ltrim(' foobar ') results in 'foobar ' |
| string | rtrim(string A) | returns the string resulting from trimming spaces from the end(right hand side) of A. For example, rtrim(' foobar ') results in ' foobar' |
| string | regexp_replace(string A, string B, string C) | returns the string resulting from replacing all substrings in B that match the Java regular expression syntax(See [Java regular expressions syntax](http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html)) with C. For example, regexp_replace('foobar', 'oo|ar', ) returns 'fb' |
| int | size(Map\<K.V\>) | returns the number of elements in the map type |
| int | size(Array\<T\>) | returns the number of elements in the array type |
| *value of \<type\>* | cast(*\<expr\>* as *\<type\>*) | converts the results of the expression expr to \<type\>, for example, cast('1' as BIGINT) will convert the string '1' to it integral representation. A null is returned if the conversion does not succeed. |
| int | size(Map<K.V>) | returns the number of elements in the map type |
| int | size(Array<T>) | returns the number of elements in the array type |
| *value of <type>* | cast(*<expr>* as *<type>*) | converts the results of the expression expr to <type>, for example, cast('1' as BIGINT) will convert the string '1' to it integral representation. A null is returned if the conversion does not succeed. |
| string | from_unixtime(int unixtime) | convert the number of seconds from the UNIX epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00" |
| string | to_date(string timestamp) | Return the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01" |
| int | year(string date) | Return the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970 |
Expand Down Expand Up @@ -827,7 +827,7 @@ Array columns in tables can be as follows:
CREATE TABLE array_table (int_array_column ARRAY<INT>);
```

Assuming that pv.friends is of the type ARRAY\<INT\> (i.e. it is an array of integers), the user can get a specific element in the array by its index as shown in the following command:
Assuming that pv.friends is of the type ARRAY<INT> (i.e. it is an array of integers), the user can get a specific element in the array by its index as shown in the following command:

```
SELECT pv.friends[2]
Expand All @@ -847,7 +847,7 @@ The user can also get the length of the array using the size function as shown b

### Map (Associative Arrays) Operations

Maps provide collections similar to associative arrays. Such structures can only be created programmatically currently. We will be extending this soon. For the purpose of the current example assume that pv.properties is of the type map\<String, String\> i.e. it is an associative array from strings to string. Accordingly, the following query:
Maps provide collections similar to associative arrays. Such structures can only be created programmatically currently. We will be extending this soon. For the purpose of the current example assume that pv.properties is of the type map<String, String> i.e. it is an associative array from strings to string. Accordingly, the following query:

```
INSERT OVERWRITE page_views_map
Expand Down
8 changes: 4 additions & 4 deletions content/docs/latest/webhcat/webhcat-configure.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ The webhcat-log4j.properties file sets the location of the log files created by
| **templeton.hcat** | The path to the HCatalog executable. |
| **templeton.hive.archive** | The path to the Hive archive. |
| **templeton.hive.path** | The path to the Hive executable. |
| **templeton.hive.properties** | Properties to set when running Hive (during job submission).  This is expected to be a comma-separated prop=value list. If some value is itself a comma-separated list, the escape character is '\\' (from [Hive 0.13.1](https://issues.apache.org/jira/browse/HIVE-4576) onward).To use it in a cluster with Kerberos security enabled, set `hive.metastore.sasl.enabled=false` and add `hive.metastore.execute.setugi=true`. Using localhost in metastore URI does not work with Kerberos security. |
| **templeton.hive.properties** | Properties to set when running Hive (during job submission).  This is expected to be a comma-separated prop=value list. If some value is itself a comma-separated list, the escape character is '\' </description> (from [Hive 0.13.1](https://issues.apache.org/jira/browse/HIVE-4576) onward).To use it in a cluster with Kerberos security enabled, set `hive.metastore.sasl.enabled=false` and add `hive.metastore.execute.setugi=true`. Using localhost in metastore URI does not work with Kerberos security. |
| **templeton.exec.encoding** | The encoding of the stdout and stderr data. |
| **templeton.exec.timeout** | How long in milliseconds a program is allowed to run on the WebHCat box. |
| **templeton.exec.max-procs** | The maximum number of processes allowed to run at once. |
Expand All @@ -74,15 +74,15 @@ The webhcat-log4j.properties file sets the location of the log files created by
| **templeton.kerberos.keytab** | The keytab file containing the credentials for the Kerberos principal. |
| **templeton.hadoop.queue.name** | MapReduce queue name where WebHCat map-only jobs will be submitted to. Can be used to avoid a deadlock where all map slots in the cluster are taken over by Templeton launcher tasks.Versions: [Hive 0.12.0](https://issues.apache.org/jira/browse/HIVE-4679) and later. |
| **templeton.mapper.memory.mb** | WebHCat controller job's Launch mapper's memory limit in megabytes. When submitting a controller job, WebHCat will overwrite `mapreduce.map.memory.mb` with this value. If empty, WebHCat will not set `mapreduce.map.memory.mb` when submitting the controller job, therefore the configuration in mapred-site.xml will be used.Versions: [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-7155) and later. |
| **templeton.frame.options.filter** | Adds web server protection from clickjacking using X-Frame-Options header. The possible values are DENY, SAMEORIGIN, ALLOW-FROM \<uri\>.Versions: [Hive 3.0.0](https://issues.apache.org/jira/browse/HIVE-17679) and later. |
| **templeton.frame.options.filter** | Adds web server protection from clickjacking using X-Frame-Options header. The possible values are DENY, SAMEORIGIN, ALLOW-FROM <uri>.Versions: [Hive 3.0.0](https://issues.apache.org/jira/browse/HIVE-17679) and later. |

#### Default Values

Some of the default values for WebHCat configuration variables depend on the release number. For the default values in the Hive release you are using, see the webhcat-default.xml file. It can be found in the SVN repository at:

* http://svn.apache.org/repos/asf/hive/branches/branch-*\<release_number\>*/hcatalog/webhcat/svr/src/main/config/webhcat-default.xml
* http://svn.apache.org/repos/asf/hive/branches/branch-*<release_number>*/hcatalog/webhcat/svr/src/main/config/webhcat-default.xml

where *\<release_number\>* is 0.11, 0.12, and so on. Prior to Hive 0.11, WebHCat was in the Apache incubator.
where *<release_number>* is 0.11, 0.12, and so on. Prior to Hive 0.11, WebHCat was in the Apache incubator.

For example:

Expand Down
2 changes: 1 addition & 1 deletion content/docs/latest/webhcat/webhcat-installwebhcat.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ hadoop fs -put <hadoop streaming jar> \

```

where *\<templeton.streaming.jar\>* is a property value defined in `webhcat-default.xml` which can be overridden in the `webhcat-site.xml` file, and *\<hadoop streaming jar\>* is the Hadoop streaming jar in your Hadoop version:
where *<templeton.streaming.jar>* is a property value defined in `webhcat-default.xml` which can be overridden in the `webhcat-site.xml` file, and *<hadoop streaming jar>* is the Hadoop streaming jar in your Hadoop version:

+ `hadoop-1.*/contrib/streaming/hadoop-streaming-*.jar` in the Hadoop 1.x tar
+ `hadoop-2.*/share/hadoop/tools/lib/hadoop-streaming-*.jar` in the Hadoop 2.x tar
Expand Down
4 changes: 2 additions & 2 deletions content/general/PrivacyPolicy.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,9 @@ the following:
5. The addresses of pages from where you followed a link to our site.

Part of this information is gathered using a tracking cookie set by the
[Google Analytics](http://www.google.com/analytics/)
<a href="http://www.google.com/analytics/">Google Analytics</a>
service and handled by Google as
described in their [privacy policy](http://www.google.com/privacy.html).
described in their <a href="http://www.google.com/privacy.html">privacy policy</a>.
See your browser documentation for instructions on how to disable the
cookie if you prefer not to share this data with Google.

Expand Down
Loading