-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[core] Add row filter & column masking support for table read #7034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| TableQueryAuth queryAuth = | ||
| ((FileStoreTable) table).catalogEnvironment().tableQueryAuth(options); | ||
| TableQueryAuthResult authResult = | ||
| queryAuth.auth(readType == null ? null : readType.getFieldNames()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New Read will invoked in every works. We should not let so many works to do RPC to metastore.
I think maybe we can introduce QueryAuthSplit with filter and masking. And here wrapping new TableRead to do filter and masking.
| public void testColumnMasking() { | ||
| spark.sql( | ||
| "CREATE TABLE t_column_masking (id INT, secret STRING, email STRING, phone STRING) TBLPROPERTIES" | ||
| + " ('bucket'='1', 'bucket-key'='id', 'file.format'='avro', 'query-auth.enabled'='true')"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to test bucketed table. Just no bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to test bucketed table. Just no bucket.
done
|
|
||
| Map<String, Transform> columnMasking = new HashMap<>(); | ||
| columnMasking.put("secret", maskTransform); | ||
| restCatalogServer.setColumnMaskingAuth( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add case for row filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add case for row filtering.
done
7bc811f to
5cc62fd
Compare
|
|
||
| /** Input splits. Needed by most batch computation engines. */ | ||
| public class DataSplit implements Split { | ||
| public class DataSplit implements QueryAuthSplit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't modify DataSplit, just create a new class to wrap it.
DataSplit has a heavy historical burden, and we need to ensure its compatibility. Its serialization protocol is related to multiple languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't modify DataSplit, just create a new class to wrap it.
DataSplit has a heavy historical burden, and we need to ensure its compatibility. Its serialization protocol is related to multiple languages.
Got it!
dc73579 to
942da63
Compare
|
+1 |
Purpose
Add row filter & column masking support for table read