diff --git a/.github/workflows/gh-pages.yml b/.github/workflows/gh-pages.yml index 485db67f..c1b5dbb0 100644 --- a/.github/workflows/gh-pages.yml +++ b/.github/workflows/gh-pages.yml @@ -46,7 +46,7 @@ jobs: # extended: true - name: Build - run: hugo --minify + run: hugo --minify --panicOnWarning - name: Copy .asf.yaml run: cp .asf.yaml ./public diff --git a/content/docs/latest/language/languagemanual-udf.md b/content/docs/latest/language/languagemanual-udf.md index b6247878..92341edf 100644 --- a/content/docs/latest/language/languagemanual-udf.md +++ b/content/docs/latest/language/languagemanual-udf.md @@ -188,12 +188,12 @@ The following built-in collection functions are supported in Hive: | **Return Type** | **Name(Signature)** | **Description** | | --- | --- | --- | -| int | size(Map\) | Returns the number of elements in the map type. | -| int | size(Array\) | Returns the number of elements in the array type. | -| array\ | map_keys(Map\) | Returns an unordered array containing the keys of the input map. | -| array\ | map_values(Map\) | Returns an unordered array containing the values of the input map. | -| boolean | array_contains(Array\, value) | Returns TRUE if the array contains value. | -| array\ | sort_array(Array\) | Sorts the input array in ascending order according to the natural ordering of the array elements and returns it (as of version [0.9.0](https://issues.apache.org/jira/browse/HIVE-2279)). | +| int | size(Map) | Returns the number of elements in the map type. | +| int | size(Array) | Returns the number of elements in the array type. | +| array | map_keys(Map) | Returns an unordered array containing the keys of the input map. | +| array | map_values(Map) | Returns an unordered array containing the values of the input map. | +| boolean | array_contains(Array, value) | Returns TRUE if the array contains value. | +| array | sort_array(Array) | Sorts the input array in ascending order according to the natural ordering of the array elements and returns it (as of version [0.9.0](https://issues.apache.org/jira/browse/HIVE-2279)). | ### Type Conversion Functions @@ -202,7 +202,7 @@ The following type conversion functions are supported in Hive: | Return Type | Name(Signature) | Description | | --- | --- | --- | | binary | binary(string|binary) | Casts the parameter into a binary. | -| **Expected "=" to follow "type"** | cast(expr as \) | Converts the results of the expression expr to \. For example, cast('1' as BIGINT) will convert the string '1' to its integral representation. A null is returned if the conversion does not succeed. If cast(expr as boolean) Hive returns true for a non-empty string. | +| **Expected "=" to follow "type"** | cast(expr as ) | Converts the results of the expression expr to . For example, cast('1' as BIGINT) will convert the string '1' to its integral representation. A null is returned if the conversion does not succeed. If cast(expr as boolean) Hive returns true for a non-empty string. | ### Date Functions @@ -272,9 +272,9 @@ The following built-in String functions are supported in Hive: | int | character_length(string str) | Returns the number of UTF-8 characters contained in str (as of Hive [2.2.0](https://issues.apache.org/jira/browse/HIVE-15979)). The function char_length is shorthand for this function. | | string | chr(bigint|double A) | Returns the ASCII character having the binary equivalent to A (as of Hive [1.3.0 and 2.1.0](https://issues.apache.org/jira/browse/HIVE-13063)). If A is larger than 256 the result is equivalent to chr(A % 256). Example: select chr(88); returns "X". | | string | concat(string|binary A, string|binary B...) | Returns the string or bytes resulting from concatenating the strings or bytes passed in as parameters in order. For example, concat('foo', 'bar') results in 'foobar'. Note that this function can take any number of input strings. | -| array\\> | context_ngrams(array\\>, array\, int K, int pf) | Returns the top-k contextual N-grams from a set of tokenized sentences, given a string of "context". See [StatisticsAndDataMining]({{< ref "statisticsanddatamining" >}}) for more information. | +| array> | context_ngrams(array>, array, int K, int pf) | Returns the top-k contextual N-grams from a set of tokenized sentences, given a string of "context". See [StatisticsAndDataMining]({{< ref "statisticsanddatamining" >}}) for more information. | | string | concat_ws(string SEP, string A, string B...) | Like concat() above, but with custom separator SEP. | -| string | concat_ws(string SEP, array\) | Like concat_ws() above, but taking an array of strings. (as of Hive [0.9.0](https://issues.apache.org/jira/browse/HIVE-2203)) | +| string | concat_ws(string SEP, array) | Like concat_ws() above, but taking an array of strings. (as of Hive [0.9.0](https://issues.apache.org/jira/browse/HIVE-2203)) | | string | decode(binary bin, string charset) | Decodes the first argument into a String using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null. (As of Hive [0.12.0](https://issues.apache.org/jira/browse/HIVE-2482).) | | string | elt(N int,str1 string,str2 string,str3 string,...) | Return string at index number. For example elt(2,'hello','world') returns 'world'. Returns NULL if N is less than 1 or greater than the number of arguments.(see ) | | binary | encode(string src, string charset) | Encodes the first argument into a BINARY using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null. (As of Hive [0.12.0](https://issues.apache.org/jira/browse/HIVE-2482).) | @@ -289,7 +289,7 @@ The following built-in String functions are supported in Hive: | string | lower(string A) lcase(string A) | Returns the string resulting from converting all characters of B to lower case. For example, lower('fOoBaR') results in 'foobar'. | | string | lpad(string str, int len, string pad) | Returns str, left-padded with pad to a length of len. If str is longer than len, the return value is shortened to len characters. In case of empty pad string, the return value is null. | | string | ltrim(string A) | Returns the string resulting from trimming spaces from the beginning(left hand side) of A. For example, ltrim(' foobar ') results in 'foobar '. | -| array\\> | ngrams(array\\>, int N, int K, int pf) | Returns the top-k N-grams from a set of tokenized sentences, such as those returned by the sentences() UDAF. See [StatisticsAndDataMining]({{< ref "statisticsanddatamining" >}}) for more information. | +| array> | ngrams(array>, int N, int K, int pf) | Returns the top-k N-grams from a set of tokenized sentences, such as those returned by the sentences() UDAF. See [StatisticsAndDataMining]({{< ref "statisticsanddatamining" >}}) for more information. | | int | octet_length(string str) | Returns the number of octets required to hold the string str in UTF-8 encoding (since Hive [2.2.0](https://issues.apache.org/jira/browse/HIVE-15979)). Note that octet_length(str) can be larger than character_length(str). | | string | parse_url(string urlString, string partToExtract [, string keyToExtract]) | Returns the specified part from the URL. Valid values for partToExtract include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. For example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') returns 'facebook.com'. Also a value of a particular key in QUERY can be extracted by providing the key as the third argument, for example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') returns 'v1'. | | string | printf(String format, Obj... args) | Returns the input formatted according do printf-style format strings (as of Hive [0.9.0](https://issues.apache.org/jira/browse/HIVE-2695)). | @@ -311,10 +311,10 @@ The following built-in String functions are supported in Hive: | string | reverse(string A) | Returns the reversed string. | | string | rpad(string str, int len, string pad) | Returns str, right-padded with pad to a length of len. If str is longer than len, the return value is shortened to len characters. In case of empty pad string, the return value is null. | | string | rtrim(string A) | Returns the string resulting from trimming spaces from the end(right hand side) of A. For example, rtrim(' foobar ') results in ' foobar'. | -| array\\> | sentences(string str, string lang, string locale) | Tokenizes a string of natural language text into words and sentences, where each sentence is broken at the appropriate sentence boundary and returned as an array of words. The 'lang' and 'locale' are optional arguments. For example, sentences('Hello there! How are you?') returns ( ("Hello", "there"), ("How", "are", "you") ). | +| array> | sentences(string str, string lang, string locale) | Tokenizes a string of natural language text into words and sentences, where each sentence is broken at the appropriate sentence boundary and returned as an array of words. The 'lang' and 'locale' are optional arguments. For example, sentences('Hello there! How are you?') returns ( ("Hello", "there"), ("How", "are", "you") ). | | string | space(int n) | Returns a string of n spaces. | | array | split(string str, string pat) | Splits str around pat (pat is a regular expression). | -| map\ | str_to_map(text[, delimiter1, delimiter2]) | Splits text into key-value pairs using two delimiters. Delimiter1 separates text into K-V pairs, and Delimiter2 splits each K-V pair. Default delimiters are ',' for delimiter1 and ':' for delimiter2. | +| map | str_to_map(text[, delimiter1, delimiter2]) | Splits text into key-value pairs using two delimiters. Delimiter1 separates text into K-V pairs, and Delimiter2 splits each K-V pair. Default delimiters are ',' for delimiter1 and ':' for delimiter2. | | string | substr(string|binary A, int start) substring(string|binary A, int start) | Returns the substring or slice of the byte array of A starting from start position till the end of string A. For example, substr('foobar', 4) results in 'bar' (see []). | | string | substr(string|binary A, int start, int len) substring(string|binary A, int start, int len) | Returns the substring or slice of the byte array of A starting from start position with length len. For example, substr('foobar', 4, 1) results in 'b' (see []). | | string | substring_index(string A, string delim, int count) | Returns the substring from string A before count occurrences of the delimiter delim (as of Hive [1.3.0](https://issues.apache.org/jira/browse/HIVE-686)). If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. Substring_index performs a case-sensitive match when searching for delim. Example: substring_index('www.apache.org', '.', 2) = 'www.apache'. | @@ -433,9 +433,9 @@ The following built-in aggregate functions are supported in Hive: | DOUBLE | covar_samp(col1, col2) | Returns the sample covariance of a pair of a numeric columns in the group. | | DOUBLE | corr(col1, col2) | Returns the Pearson coefficient of correlation of a pair of a numeric columns in the group. | | DOUBLE | percentile(BIGINT col, p) | Returns the exact pth percentile of a column in the group (does not work with floating point types). p must be between 0 and 1. NOTE: A true percentile can only be computed for integer values. Use PERCENTILE_APPROX if your input is non-integral. | -| array\ | percentile(BIGINT col, array(p1 [, p2]...)) | Returns the exact percentiles p1, p2, ... of a column in the group (does not work with floating point types). pi must be between 0 and 1. NOTE: A true percentile can only be computed for integer values. Use PERCENTILE_APPROX if your input is non-integral. | +| array | percentile(BIGINT col, array(p1 [, p2]...)) | Returns the exact percentiles p1, p2, ... of a column in the group (does not work with floating point types). pi must be between 0 and 1. NOTE: A true percentile can only be computed for integer values. Use PERCENTILE_APPROX if your input is non-integral. | | DOUBLE | percentile_approx(DOUBLE col, p [, B]) | Returns an approximate pth percentile of a numeric column (including floating point types) in the group. The B parameter controls approximation accuracy at the cost of memory. Higher values yield better approximations, and the default is 10,000. When the number of distinct values in col is smaller than B, this gives an exact percentile value. | -| array\ | percentile_approx(DOUBLE col, array(p1 [, p2]...) [, B]) | Same as above, but accepts and returns an array of percentile values instead of a single one. | +| array | percentile_approx(DOUBLE col, array(p1 [, p2]...) [, B]) | Same as above, but accepts and returns an array of percentile values instead of a single one. | | double | regr_avgx(independent, dependent) | Equivalent to avg(dependent). As of [Hive 2.2.0](https://issues.apache.org/jira/browse/HIVE-15978). | | double | regr_avgy(independent, dependent) | Equivalent to avg(independent). As of [Hive 2.2.0](https://issues.apache.org/jira/browse/HIVE-15978). | | double | regr_count(independent, dependent) | Returns the number of non-null pairs used to fit the linear regression line. As of [Hive 2.2.0](https://issues.apache.org/jira/browse/HIVE-15978). | @@ -445,7 +445,7 @@ The following built-in aggregate functions are supported in Hive: | double | regr_sxx(independent, dependent) | Equivalent to regr_count(independent, dependent) * var_pop(dependent). As of [Hive 2.2.0](https://issues.apache.org/jira/browse/HIVE-15978). | | double | regr_sxy(independent, dependent) | Equivalent to regr_count(independent, dependent) * covar_pop(independent, dependent). As of [Hive 2.2.0](https://issues.apache.org/jira/browse/HIVE-15978). | | double | regr_syy(independent, dependent) | Equivalent to regr_count(independent, dependent) * var_pop(independent). As of [Hive 2.2.0](https://issues.apache.org/jira/browse/HIVE-15978). | -| array\ | histogram_numeric(col, b) | Computes a histogram of a numeric column in the group using b non-uniformly spaced bins. The output is an array of size b of double-valued (x,y) coordinates that represent the bin centers and heights | +| array | histogram_numeric(col, b) | Computes a histogram of a numeric column in the group using b non-uniformly spaced bins. The output is an array of size b of double-valued (x,y) coordinates that represent the bin centers and heights | | array | collect_set(col) | Returns a set of objects with duplicate elements eliminated. | | array | collect_list(col) | Returns a list of objects with duplicates. (As of Hive [0.13.0](https://issues.apache.org/jira/browse/HIVE-5294).) | | INTEGER | ntile(INTEGER x) | Divides an ordered partition into `x` groups called buckets and assigns a bucket number to each row in the partition. This allows easy calculation of tertiles, quartiles, deciles, percentiles and other common summary statistics. (As of Hive [0.11.0](https://issues.apache.org/jira/browse/HIVE-896).) | @@ -456,14 +456,14 @@ Normal user-defined functions, such as concat(), take in a single input row and | **Row-set columns types** | **Name(Signature)** | **Description** | | --- | --- | --- | -| T | explode(ARRAY\ a) | Explodes an array to multiple rows. Returns a row-set with a single column (*col*), one row for each element from the array. | -| Tkey,Tvalue | explode(MAP\ m) | Explodes a map to multiple rows. Returns a row-set with a two columns (*key,value)* , one row for each key-value pair from the input map. (As of Hive [0.8.0](https://issues.apache.org/jira/browse/HIVE-1735).). | -| int,T | posexplode(ARRAY\ a) | Explodes an array to multiple rows with additional positional column of *int* type (position of items in the original array, starting with 0). Returns a row-set with two columns (*pos,val*), one row for each element from the array. | -| T1,...,Tn | inline(ARRAY\\> a) | Explodes an array of structs to multiple rows. Returns a row-set with N columns (N = number of top level elements in the struct), one row per struct from the array. (As of Hive [0.10](https://issues.apache.org/jira/browse/HIVE-3238).) | +| T | explode(ARRAY a) | Explodes an array to multiple rows. Returns a row-set with a single column (*col*), one row for each element from the array. | +| Tkey,Tvalue | explode(MAP m) | Explodes a map to multiple rows. Returns a row-set with a two columns (*key,value)* , one row for each key-value pair from the input map. (As of Hive [0.8.0](https://issues.apache.org/jira/browse/HIVE-1735).). | +| int,T | posexplode(ARRAY a) | Explodes an array to multiple rows with additional positional column of *int* type (position of items in the original array, starting with 0). Returns a row-set with two columns (*pos,val*), one row for each element from the array. | +| T1,...,Tn | inline(ARRAY> a) | Explodes an array of structs to multiple rows. Returns a row-set with N columns (N = number of top level elements in the struct), one row per struct from the array. (As of Hive [0.10](https://issues.apache.org/jira/browse/HIVE-3238).) | | T1,...,Tn/r | stack(int r,T1 V1,...,Tn/r Vn) | Breaks up *n* values V1,...,Vn into *r* rows. Each row will have *n/r* columns. *r* must be constant. | | | | | | string1,...,stringn | json_tuple(string jsonStr,string k1,...,string kn) | Takes JSON string and a set of *n* keys, and returns a tuple of *n* values. This is a more efficient version of the `get_json_object` UDF because it can get multiple keys with just one call. | -| string 1,...,stringn | parse_url_tuple(string urlStr,string p1,...,string pn) | Takes URL string and a set of *n* URL parts, and returns a tuple of *n* values. This is similar to the `parse_url()` UDF but can extract multiple parts at once out of a URL. Valid part names are: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO, QUERY:\. | +| string 1,...,stringn | parse_url_tuple(string urlStr,string p1,...,string pn) | Takes URL string and a set of *n* URL parts, and returns a tuple of *n* values. This is similar to the `parse_url()` UDF but can extract multiple parts at once out of a URL. Valid part names are: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO, QUERY:. | @@ -576,7 +576,7 @@ Also see [Writing UDTFs]({{< ref "developerguide-udtf" >}}) if you want to creat As an example of using `explode()` in the SELECT expression list, consider a table named myTable that has a single column (myCol) and two rows: -| Array\ myCol | +| Array myCol | | --- | | [100,200,300] | | [400,500,600] | @@ -615,7 +615,7 @@ Available as of Hive 0.13.0. See [HIVE-4943](https://issues.apache.org/jira/brow As an example of using `posexplode()` in the SELECT expression list, consider a table named myTable that has a single column (myCol) and two rows: -| Array\ myCol | +| Array myCol | | --- | | [100,200,300] | | [400,500,600] | diff --git a/content/docs/latest/user/hive-transactions.md b/content/docs/latest/user/hive-transactions.md index 2a368ab1..30258c3e 100644 --- a/content/docs/latest/user/hive-transactions.md +++ b/content/docs/latest/user/hive-transactions.md @@ -101,7 +101,7 @@ This module is responsible for discovering which tables or partitions are due fo #### Worker -Each Worker handles a single compaction task.  A compaction is a MapReduce job with name in the following form: \-compactor-\.\.\.  Each worker submits the job to the cluster (via [hive.compactor.job.queue]({{< ref "#hive-compactor-job-queue" >}}) if defined) and waits for the job to finish.  [hive.compactor.worker.threads]({{< ref "#hive-compactor-worker-threads" >}}) determines the number of Workers in each Metastore.  The total number of Workers in the Hive Warehouse determines the maximum number of concurrent compactions. +Each Worker handles a single compaction task.  A compaction is a MapReduce job with name in the following form: -compactor-...  Each worker submits the job to the cluster (via [hive.compactor.job.queue]({{< ref "#hive-compactor-job-queue" >}}) if defined) and waits for the job to finish.  [hive.compactor.worker.threads]({{< ref "#hive-compactor-worker-threads" >}}) determines the number of Workers in each Metastore.  The total number of Workers in the Hive Warehouse determines the maximum number of concurrent compactions. #### Cleaner @@ -178,7 +178,7 @@ A number of new configuration parameters have been added to the system to suppor | metastore.compactor.long.running.initiator.threshold.error | *Default:* 12h | Metastore | Initiator cycle duration after which an error will be logged. Default time unit is: hours | | hive.compactor.worker.sleep.time | *Default:*10800ms | HiveServer2 | Time in milliseconds for which a worker threads goes into sleep before starting another iteration in case of no launched job or error | | hive.compactor.worker.max.sleep.time | *Default:* 320000ms | HiveServer2 | Max time in milliseconds for which a worker threads goes into sleep before starting another iteration used for backoff in case of no launched job or error | -| [hive.compactor.worker.threads]({{< ref "#hive-compactor-worker-threads" >}}) deprecated. Use metastore.compactor.worker.threads instead. | *Default:* 0*Value required for transactions:* \> 0 on at least one instance of the Thrift metastore service | Metastore | How many compactor worker threads to run on this metastore instance.2 | +| [hive.compactor.worker.threads]({{< ref "#hive-compactor-worker-threads" >}}) deprecated. Use metastore.compactor.worker.threads instead. | *Default:* 0*Value required for transactions:* > 0 on at least one instance of the Thrift metastore service | Metastore | How many compactor worker threads to run on this metastore instance.2 | | [hive.compactor.worker.timeout]({{< ref "#hive-compactor-worker-timeout" >}}) | *Default:* 86400s | Metastore | Time in seconds after which a compaction job will be declared failed and the compaction re-queued. | | [hive.compactor.cleaner.run.interval]({{< ref "#hive-compactor-cleaner-run-interval" >}}) | *Default*: 5000ms | Metastore | Time in milliseconds between runs of the cleaner thread. ([Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-8258) and later.) | | [hive.compactor.check.interval]({{< ref "#hive-compactor-check-interval" >}}) | *Default:* 300s | Metastore | Time in seconds between checks to see if any tables or partitions need to be compacted.3 | @@ -244,7 +244,7 @@ If a table owner does not wish the system to automatically determine when to com Table properties are set with the TBLPROPERTIES clause when a table is created or altered, as described in the [Create Table]({{< ref "#create-table" >}}) and [Alter Table Properties]({{< ref "#alter-table-properties" >}}) sections of Hive Data Definition Language. The "`transactional`" and "`NO_AUTO_COMPACTION`" table properties are case-sensitive in Hive releases 0.x and 1.0, but they are case-insensitive starting with release 1.1.0 ([HIVE-8308](https://issues.apache.org/jira/browse/HIVE-8308)). -More compaction related options can be set via TBLPROPERTIES as of [Hive 1.3.0 and 2.1.0](https://issues.apache.org/jira/browse/HIVE-13354). They can be set at both table-level via [CREATE TABLE](/docs/latest/language/languagemanual-ddl#createdroptruncate-table), and on request-level via [ALTER TABLE/PARTITION COMPACT](/docs/latest/language/languagemanual-ddl#alter-tablepartition-compact).  These are used to override the Warehouse/table wide settings.  For example, to override an MR property to affect a compaction job, one can add "compactor.\=\" in either CREATE TABLE statement or when launching a compaction explicitly via ALTER TABLE.  The "\=\" will be set on JobConf of the compaction MR job.   Similarly, "tblprops.\=\" can be used to set/override any table property which is interpreted by the code running on the cluster.  Finally, "compactorthreshold.\=\" can be used to override properties from the "New Configuration Parameters for Transactions" table above that end with ".threshold" and control when compactions are triggered by the system.  Examples: +More compaction related options can be set via TBLPROPERTIES as of [Hive 1.3.0 and 2.1.0](https://issues.apache.org/jira/browse/HIVE-13354). They can be set at both table-level via [CREATE TABLE](/docs/latest/language/languagemanual-ddl#createdroptruncate-table), and on request-level via [ALTER TABLE/PARTITION COMPACT](/docs/latest/language/languagemanual-ddl#alter-tablepartition-compact).  These are used to override the Warehouse/table wide settings.  For example, to override an MR property to affect a compaction job, one can add "compactor.=" in either CREATE TABLE statement or when launching a compaction explicitly via ALTER TABLE.  The "=" will be set on JobConf of the compaction MR job.   Similarly, "tblprops.=" can be used to set/override any table property which is interpreted by the code running on the cluster.  Finally, "compactorthreshold.=" can be used to override properties from the "New Configuration Parameters for Transactions" table above that end with ".threshold" and control when compactions are triggered by the system.  Examples: **Example: Set compaction options in TBLPROPERTIES at table level** diff --git a/content/docs/latest/user/tutorial.md b/content/docs/latest/user/tutorial.md index e03d3488..f2906226 100644 --- a/content/docs/latest/user/tutorial.md +++ b/content/docs/latest/user/tutorial.md @@ -108,7 +108,7 @@ Explicit type conversion can be done using the cast operator as shown in the [#B Complex Types can be built up from primitive types and other composite types using: * Structs: the elements within the type can be accessed using the DOT (.) notation. For example, for a column c of type STRUCT {a INT; b INT}, the a field is accessed by the expression c.a -* Maps (key-value tuples): The elements are accessed using ['element name'] notation. For example in a map M comprising of a mapping from 'group' -\> gid the gid value can be accessed using M['group'] +* Maps (key-value tuples): The elements are accessed using ['element name'] notation. For example in a map M comprising of a mapping from 'group' -> gid the gid value can be accessed using M['group'] * Arrays (indexable lists): The elements in the array have to be in the same type. Elements can be accessed using the [n] notation where n is an index (zero-based) into the array. For example, for an array A having the elements ['a', 'b', 'c'], A[1] retruns 'b'. Using the primitive types and the constructs for creating complex types, types with arbitrary levels of nesting can be created. For example, a type User may comprise of the following fields: @@ -143,7 +143,7 @@ Java's "Instant" timestamps define a point in time that remains constant regardl #### Comparisons with other tools -| | SQL 2003 | Oracle | Sybase | Postgres | MySQL | Microsoft SQL | IBM DB2 | Presto | Snowflake | Hive \>= 3.1 | Iceberg | Spark | +| | SQL 2003 | Oracle | Sybase | Postgres | MySQL | Microsoft SQL | IBM DB2 | Presto | Snowflake | Hive >= 3.1 | Iceberg | Spark | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | timestamp | Local | Local | Local | Local | Instant | Other | Local | Local | Local | Local | Local | Instant | | timestamp with local time zone | | Instant | | | | | | | Instant | Instant | | | @@ -177,10 +177,10 @@ All Hive keywords are case-insensitive, including the names of Hive operators an | --- | --- | --- | | A = B | all primitive types | TRUE if expression A is equivalent to expression B; otherwise FALSE | | A != B | all primitive types | TRUE if expression A is *not* equivalent to expression B; otherwise FALSE | -| A \< B | all primitive types | TRUE if expression A is less than expression B; otherwise FALSE | -| A \<= B | all primitive types | TRUE if expression A is less than or equal to expression B; otherwise FALSE | -| A \> B | all primitive types | TRUE if expression A is greater than expression B] otherwise FALSE | -| A \>= B | all primitive types | TRUE if expression A is greater than or equal to expression B otherwise FALSE | +| A < B | all primitive types | TRUE if expression A is less than expression B; otherwise FALSE | +| A <= B | all primitive types | TRUE if expression A is less than or equal to expression B; otherwise FALSE | +| A > B | all primitive types | TRUE if expression A is greater than expression B] otherwise FALSE | +| A >= B | all primitive types | TRUE if expression A is greater than or equal to expression B otherwise FALSE | | A IS NULL | all types | TRUE if expression A evaluates to NULL otherwise FALSE | | A IS NOT NULL | all types | FALSE if expression A evaluates to NULL otherwise TRUE | | A LIKE B | strings | TRUE if string A matches the SQL simple regular expression B, otherwise FALSE. The comparison is done character by character. The _ character in B matches any character in A (similar to **.** in posix regular expressions), and the % character in B matches an arbitrary number of characters in A (similar to **.*** in posix regular expressions). For example, `'foobar' LIKE 'foo'` evaluates to FALSE where as `'foobar' LIKE 'foo___'` evaluates to TRUE and so does `'foobar' LIKE 'foo%'`. To escape % use \ (% matches one % character). If the data contains a semicolon, and you want to search for it, it needs to be escaped, `columnValue LIKE 'a\;b'` | @@ -217,7 +217,7 @@ All Hive keywords are case-insensitive, including the names of Hive operators an | **Operator** | **Operand types** | **Description** | | --- | --- | --- | | A[n] | A is an Array and n is an int | returns the nth element in the array A. The first element has index 0, for example, if A is an array comprising of ['foo', 'bar'] then A[0] returns 'foo' and A[1] returns 'bar' | -| M[key] | M is a Map\ and key has type K | returns the value corresponding to the key in the map for example, if M is a map comprising of {'f' -\> 'foo', 'b' -\> 'bar', 'all' -\> 'foobar'} then M['all'] returns 'foobar' | +| M[key] | M is a Map and key has type K | returns the value corresponding to the key in the map for example, if M is a map comprising of {'f' -> 'foo', 'b' -> 'bar', 'all' -> 'foobar'} then M['all'] returns 'foobar' | | S.x | S is a struct | returns the x field of S, for example, for struct foobar {int foo, int bar} foobar.foo returns the integer stored in the foo field of the struct. | ### Built In Functions @@ -242,9 +242,9 @@ All Hive keywords are case-insensitive, including the names of Hive operators an | string | ltrim(string A) | returns the string resulting from trimming spaces from the beginning(left hand side) of A. For example, ltrim(' foobar ') results in 'foobar ' | | string | rtrim(string A) | returns the string resulting from trimming spaces from the end(right hand side) of A. For example, rtrim(' foobar ') results in ' foobar' | | string | regexp_replace(string A, string B, string C) | returns the string resulting from replacing all substrings in B that match the Java regular expression syntax(See [Java regular expressions syntax](http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html)) with C. For example, regexp_replace('foobar', 'oo|ar', ) returns 'fb' | -| int | size(Map\) | returns the number of elements in the map type | -| int | size(Array\) | returns the number of elements in the array type | -| *value of \* | cast(*\* as *\*) | converts the results of the expression expr to \, for example, cast('1' as BIGINT) will convert the string '1' to it integral representation. A null is returned if the conversion does not succeed. | +| int | size(Map) | returns the number of elements in the map type | +| int | size(Array) | returns the number of elements in the array type | +| *value of * | cast(** as **) | converts the results of the expression expr to , for example, cast('1' as BIGINT) will convert the string '1' to it integral representation. A null is returned if the conversion does not succeed. | | string | from_unixtime(int unixtime) | convert the number of seconds from the UNIX epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00" | | string | to_date(string timestamp) | Return the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01" | | int | year(string date) | Return the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970 | @@ -827,7 +827,7 @@ Array columns in tables can be as follows: CREATE TABLE array_table (int_array_column ARRAY); ``` -Assuming that pv.friends is of the type ARRAY\ (i.e. it is an array of integers), the user can get a specific element in the array by its index as shown in the following command: +Assuming that pv.friends is of the type ARRAY (i.e. it is an array of integers), the user can get a specific element in the array by its index as shown in the following command: ``` SELECT pv.friends[2] @@ -847,7 +847,7 @@ The user can also get the length of the array using the size function as shown b ### Map (Associative Arrays) Operations -Maps provide collections similar to associative arrays. Such structures can only be created programmatically currently. We will be extending this soon. For the purpose of the current example assume that pv.properties is of the type map\ i.e. it is an associative array from strings to string. Accordingly, the following query: +Maps provide collections similar to associative arrays. Such structures can only be created programmatically currently. We will be extending this soon. For the purpose of the current example assume that pv.properties is of the type map i.e. it is an associative array from strings to string. Accordingly, the following query: ``` INSERT OVERWRITE page_views_map diff --git a/content/docs/latest/webhcat/webhcat-configure.md b/content/docs/latest/webhcat/webhcat-configure.md index 831b68e8..31d18537 100644 --- a/content/docs/latest/webhcat/webhcat-configure.md +++ b/content/docs/latest/webhcat/webhcat-configure.md @@ -52,7 +52,7 @@ The webhcat-log4j.properties file sets the location of the log files created by | **templeton.hcat** | The path to the HCatalog executable. | | **templeton.hive.archive** | The path to the Hive archive. | | **templeton.hive.path** | The path to the Hive executable. | -| **templeton.hive.properties** | Properties to set when running Hive (during job submission).  This is expected to be a comma-separated prop=value list. If some value is itself a comma-separated list, the escape character is '\\' (from [Hive 0.13.1](https://issues.apache.org/jira/browse/HIVE-4576) onward).To use it in a cluster with Kerberos security enabled, set `hive.metastore.sasl.enabled=false` and add `hive.metastore.execute.setugi=true`. Using localhost in metastore URI does not work with Kerberos security. | +| **templeton.hive.properties** | Properties to set when running Hive (during job submission).  This is expected to be a comma-separated prop=value list. If some value is itself a comma-separated list, the escape character is '\' (from [Hive 0.13.1](https://issues.apache.org/jira/browse/HIVE-4576) onward).To use it in a cluster with Kerberos security enabled, set `hive.metastore.sasl.enabled=false` and add `hive.metastore.execute.setugi=true`. Using localhost in metastore URI does not work with Kerberos security. | | **templeton.exec.encoding** | The encoding of the stdout and stderr data. | | **templeton.exec.timeout** | How long in milliseconds a program is allowed to run on the WebHCat box. | | **templeton.exec.max-procs** | The maximum number of processes allowed to run at once. | @@ -74,15 +74,15 @@ The webhcat-log4j.properties file sets the location of the log files created by | **templeton.kerberos.keytab** | The keytab file containing the credentials for the Kerberos principal. | | **templeton.hadoop.queue.name** | MapReduce queue name where WebHCat map-only jobs will be submitted to. Can be used to avoid a deadlock where all map slots in the cluster are taken over by Templeton launcher tasks.Versions: [Hive 0.12.0](https://issues.apache.org/jira/browse/HIVE-4679) and later. | | **templeton.mapper.memory.mb** | WebHCat controller job's Launch mapper's memory limit in megabytes. When submitting a controller job, WebHCat will overwrite `mapreduce.map.memory.mb` with this value. If empty, WebHCat will not set `mapreduce.map.memory.mb` when submitting the controller job, therefore the configuration in mapred-site.xml will be used.Versions: [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-7155) and later. | -| **templeton.frame.options.filter** | Adds web server protection from clickjacking using X-Frame-Options header. The possible values are DENY, SAMEORIGIN, ALLOW-FROM \.Versions: [Hive 3.0.0](https://issues.apache.org/jira/browse/HIVE-17679) and later. | +| **templeton.frame.options.filter** | Adds web server protection from clickjacking using X-Frame-Options header. The possible values are DENY, SAMEORIGIN, ALLOW-FROM .Versions: [Hive 3.0.0](https://issues.apache.org/jira/browse/HIVE-17679) and later. | #### Default Values Some of the default values for WebHCat configuration variables depend on the release number. For the default values in the Hive release you are using, see the webhcat-default.xml file. It can be found in the SVN repository at: -* http://svn.apache.org/repos/asf/hive/branches/branch-*\*/hcatalog/webhcat/svr/src/main/config/webhcat-default.xml +* http://svn.apache.org/repos/asf/hive/branches/branch-**/hcatalog/webhcat/svr/src/main/config/webhcat-default.xml -where *\* is 0.11, 0.12, and so on. Prior to Hive 0.11, WebHCat was in the Apache incubator. +where ** is 0.11, 0.12, and so on. Prior to Hive 0.11, WebHCat was in the Apache incubator. For example: diff --git a/content/docs/latest/webhcat/webhcat-installwebhcat.md b/content/docs/latest/webhcat/webhcat-installwebhcat.md index 3053ddca..4273b81a 100644 --- a/content/docs/latest/webhcat/webhcat-installwebhcat.md +++ b/content/docs/latest/webhcat/webhcat-installwebhcat.md @@ -81,7 +81,7 @@ hadoop fs -put \ ``` -where *\* is a property value defined in `webhcat-default.xml` which can be overridden in the `webhcat-site.xml` file, and *\* is the Hadoop streaming jar in your Hadoop version: +where ** is a property value defined in `webhcat-default.xml` which can be overridden in the `webhcat-site.xml` file, and ** is the Hadoop streaming jar in your Hadoop version: + `hadoop-1.*/contrib/streaming/hadoop-streaming-*.jar` in the Hadoop 1.x tar + `hadoop-2.*/share/hadoop/tools/lib/hadoop-streaming-*.jar` in the Hadoop 2.x tar diff --git a/content/general/PrivacyPolicy.md b/content/general/PrivacyPolicy.md index 71ce8211..99f64980 100644 --- a/content/general/PrivacyPolicy.md +++ b/content/general/PrivacyPolicy.md @@ -36,9 +36,9 @@ the following: 5. The addresses of pages from where you followed a link to our site. Part of this information is gathered using a tracking cookie set by the -[Google Analytics](http://www.google.com/analytics/) +Google Analytics service and handled by Google as -described in their [privacy policy](http://www.google.com/privacy.html). +described in their privacy policy. See your browser documentation for instructions on how to disable the cookie if you prefer not to share this data with Google.