c语言sscanf函数的用法是什么
240
2022-11-22
#yyds干货盘点#Hadoop中,切片split逻辑,FileOutputFormat.getSplits()源码解析
其实虽然说是源码解析,但根本没那个本事,只是看了段视频,跟着在源码里写了点注释而已。 Hadoop中,MapReduce时,会对文件进行切片,这其中涉及到了FileOutputFormat.getSplits()。该方法的作用是得到切片。 所有的英文都是源码,所有的中文都是我跟着敲的注释。 把包名类名和导包先给大家贴上。
/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.hadoop.mapreduce.lib.input; import java.io.IOException; import java.util.ArrayList; import java.util.List; import java.util.concurrent.TimeUnit; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.hadoop.classification.InterfaceAudience; import org.apache.hadoop.classification.InterfaceStability; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.LocatedFileStatus; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.PathFilter; import org.apache.hadoop.fs.BlockLocation; import org.apache.hadoop.fs.RemoteIterator; import org.apache.hadoop.mapred.LocatedFileStatusFetcher; import org.apache.hadoop.mapred.SplitLocationInfo; import org.apache.hadoop.mapreduce.InputFormat; import org.apache.hadoop.mapreduce.InputSplit; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.JobContext; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.security.TokenCache; import org.apache.hadoop.util.ReflectionUtils; import org.apache.hadoop.util.StopWatch; import org.apache.hadoop.util.StringUtils; import com.google.common.collect.Lists; /** * A base class for file-based {@link InputFormat}s. * *
FileInputFormat
is the base class for all file-based
* InputFormat
s. This provides a generic implementation of
* {@link #getSplits(JobContext)}.
* Subclasses of FileInputFormat
can also override the
* {@link #isSplitable(JobContext, Path)} method to ensure input-files are
* not split-up and are processed as a whole by {@link Mapper}s.
*/
@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class FileInputFormat
/**
* Generate the list of files and make them into FileSplits.
* @param job the job context
* @throws IOException
*/
public List
这是computeSplitSize()方法的源码:
protected long computeSplitSize(long blockSize, long minSize, long maxSize) { return Math.max(minSize, Math.min(maxSize, blockSize)); }
暂时就这么多~
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。
发表评论
暂时没有评论,来抢沙发吧~