Original issue:
dask/dask#2852
I am tagging @bdrosen96 at the recommendation of @martindurant
Summary:
token = hdfs3.HDFileSystem().delegate_token(user='jlord')
hdfs = hdfs3.HDFileSystem(token=token)
hdfs.ls('/user/jlord') # everything works
hdfs.ls('/user/hive/warehouse/database_name.db/table_name') # File Not Found Error Below
The error:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-35-1412502a9b50> in <module>()
----> 1 hdfs.ls('/user/hive/warehouse/database_name.db/table_name')
/nas/isg_prodops_work/jlord/conda/envs/dask/lib/python3.6/site-packages/hdfs3/core.py in ls(self, path, detail)
333 """
334 if not self.exists(path):
--> 335 raise FileNotFoundError(path)
336 num = ctypes.c_int(0)
337 fi = _lib.hdfsListDirectory(self._handle, ensure_bytes(path),
FileNotFoundError: /user/hive/warehouse/database_name.db/table_name
I can use hdfs.ls on any directory that I would expect to have access to except for the hive warehouse which is protected using Sentry. I have been told that Sentry uses ACLs to manage permissions to files/directories in the hive warehouse.
Looking at the ACLs yields:
$ hadoop fs -getfacl /user/hive/warehouse/database_name.db/table_name
# file: /user/hive/warehouse/database_name.db/table_name
# owner: hive
# group: hive
user::rwx
user:hive:rwx
group::---
group:hive:rwx
group:hdpruff:rwx
group:hdpqra:r-x
mask::rwx
other::--x
I do not have group hive (which I assume is managed by sentry), but I do have group hdpruff and hdpqra.
hadoop fs -ls can find that directory and I can use spark to read the underlying parquet file for that hive table.
Original issue:
dask/dask#2852
I am tagging @bdrosen96 at the recommendation of @martindurant
Summary:
The error:
I can use
hdfs.lson any directory that I would expect to have access to except for the hive warehouse which is protected using Sentry. I have been told that Sentry uses ACLs to manage permissions to files/directories in the hive warehouse.Looking at the ACLs yields:
I do not have group
hive(which I assume is managed by sentry), but I do have grouphdpruffandhdpqra.hadoop fs -lscan find that directory and I can use spark to read the underlying parquet file for that hive table.