Several cluster parameters have to be configured in different files on the client-side to be able to interact with an Hadoop cluster.
These parameters can be retrieved from the multiple Web interfaces on Hadoop components and have to be placed accordingly in the following files:
Parameter name | Parameter value | Parameter example |
---|---|---|
fs.defaultFS |
The IP of the NameNode, used for filesystem metadata interaction | hdfs://1.2.3.4:8020 |
Parameter name | Parameter value | Parameter example |
---|---|---|
dfs.datanode.address |
The IP of the DataNode, used for file transfer | hdfs://1.2.3.4:8020 |
dfs.client.use.datanode.hostname |
true or false depending of the cluster architecture (multihomed vs monohomed), it is used to specify if HDFS clients should use hostnames instead of IPs to connect to DataNodes. In most of the case, it should be set to true |
true |
Parameter name | Parameter value | Parameter example |
---|---|---|
yarn.resourcemanager.hostname |
The hostname of the ResourceManager, used to execute jobs | foobar.example.com |
yarn.resourcemanager.address |
The IP and port of the Resource Manager | ${yarn.resourcemanager.hostname}:8050 |
yarn.application.classpath |
The classpath of the Resource Manager, to be included when executing jobs | $HADOOP_CONF_DIR,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/* |
Parameter name | Parameter value | Parameter example |
---|---|---|
mapreduce.framework.name |
The runtime framework for executing MapReduce jobs. Can be one of local , classic or yarn but it should be yarn |
yarn |
mapreduce.jobhistory.address |
The IP and port of the JobHistory server, used to track jobs | foobar.example.com:10020 |
mapreduce.application.classpath |
The classpath of MapReduce, to be included when executing jobs | $PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure |
mapreduce.application.framework.path |
The path of a custom job framework, notably used in HortonWorks clusters | /hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework |
By default they have to be placed in the following local folder on the attacker environment
<hadoop_installation>/etc/hadoop
If you followed the Setting up an Hadoop attack environment
tutorial, it then should be placed in
/opt/hadoop-2.7.3/etc/hadoop
You can also use a custom folder and specify the path to it in an Hadoop command with the --config
option:
$ hadoop -h
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
The cluster configuration can be retrieved on any WebUI at the /conf
URI on all native Hadoop components including:
- HDFS NameNode WebUI, on port HTTP/50070 or HTTPS/50470
- HDFS DataNode WebUI, on port HTTP/50075 or HTTPS/50475
- Secondary NameNode WebUI, on port HTTP/50090
- YARN ResourceManager WebUI, on port HTTP/8088 or HTTPS/8090
- YARN NodeManager WebUI, on port HTTP/8042 or HTTPS/8044
- MapReduce v2 JobHistory Server WebUI, on port HTTP/19888 or HTTPS/19890
- MapReduce v1 JobTracker WebUI, on port HTTP/50030
- MapReduce v1 TaskTracker WebUI, on port HTTP/50060
HadoopSnooper
has been developped to allow attackers to easily retrieve a suitable minimum client-side configuration using configuration files exposed on Hadoop components' Web interfaces.
It simply grabs a remote cluster configuration, parse it and generate the appropriate configuration files to be used in the attacker environment.
$ python hadoopsnooper.py -h
usage: hadoopsnooper.py [-h] --nn NN [--dn DN] [-o OUTPUT_DIR] [--batch] host
__ __ __ _____
/ / / /___ _____/ /___ ____ ____ / ___/____ ____ ____ ____ ___ _____
/ /_/ / __ `/ __ / __ \/ __ \/ __ \__ \/ __ \/ __ \/ __ \/ __ \/ _ \/ ___/
/ __ / /_/ / /_/ / /_/ / /_/ / /_/ /__/ / / / / /_/ / /_/ / /_/ / __/ /
/_/ /_/\__,_/\__,_/\____/\____/ .___/____/_/ /_/\____/\____/ .___/\___/_/
/_/ /_/
positional arguments:
host host
optional arguments:
-h, --help show this help message and exit
--nn NN Cluster namenode (format: hdfs://namenode.hadoop:8020)
--dn DN Cluster datanodes addresses
-o OUTPUT_DIR, --output-dir OUTPUT_DIR
Output directory
--batch Never ask for user input, use the default behaviour
- Python 2
requests
andlxml
modules:$ pip install requests lxml