什么是大数据 PB = 1024tb 7123913827189tb Reids 无共享 HDFS 优点 :特别适合存储大型文件 TFS hdfs 架构 NameNode: 整个hadoop总管,只有一个,DataNode down了 存储为镜像文件fsimage 和edites secondary 定期合并日志文件及镜像文件 DataNode 负责存储数据 以固定大小的block为基本单位组织文件内容 默认大小是64M MapReduce JobTracker 主要负责资源监控及作业调度。 TaskTrachker slot 分为Map slot Reduce slot Task map Task Reduce Tack 配置单台hadoop 伪分布式环境 1编辑 ~/.bashrc export HADOOP_HOME=/usr/local/hadoop //hadoop 安装路径 export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin 保存后让设置生效 source ~/.bashrc ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+' 数据本地化 运算移动,数据不移动 需求:查询哪个账户钱最多 moneys[] ....//moneys = 56789778687687; max = 0L; for(i=0L:moneys){ if(i>max){ max=i; } } MapReduce Map1 Map 2 Map4 1233 4223423 423432 1000 800 1200 1200 ./etc/hadoop/core-site.xml <configuration> <property> <name>hadoop.tmp.dir</name> <value>file:/usr/local/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration> hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/data</value> </property> </configuration> 优酷Java视频总汇地址:http://i.youku.com/i/UMTI4MTEzNTA0MA==?spm=a2hww.20023042.uerCenter.5~5!2~A Java企业框架原视频视频免费资料获取加群:575057881 |
小黑屋|在路上
( 蜀ICP备15035742号-1 )
GMT+8, 2025-7-10 09:57
Copyright 2015-2025 djqfx