BT - The type of records from the build side that are stored in the hash table.PT - The type of records from the probe side that are stored in the hash table.public class MutableHashTable<BT,PT> extends Object implements MemorySegmentSource
The design of this class follows on many parts the design presented in "Hash joins and hash teams in Microsoft SQL Server", by Goetz Graefe et al. In its current state, the implementation lacks features like dynamic role reversal, partition tuning, or histogram guided partitioning.
+----------------------------- Bucket x ---------------------------- |Partition (1 byte) | Status (1 byte) | element count (2 bytes) | | next-bucket-in-chain-pointer (8 bytes) | reserved (4 bytes) | | |hashCode 1 (4 bytes) | hashCode 2 (4 bytes) | hashCode 3 (4 bytes) | | ... hashCode n-1 (4 bytes) | hashCode n (4 bytes) | |pointer 1 (8 bytes) | pointer 2 (8 bytes) | pointer 3 (8 bytes) | | ... pointer n-1 (8 bytes) | pointer n (8 bytes) | +---------------------------- Bucket x + 1-------------------------- |Partition (1 byte) | Status (1 byte) | element count (2 bytes) | | next-bucket-in-chain-pointer (8 bytes) | reserved (4 bytes) | | |hashCode 1 (4 bytes) | hashCode 2 (4 bytes) | hashCode 3 (4 bytes) | | ... hashCode n-1 (4 bytes) | hashCode n (4 bytes) | |pointer 1 (8 bytes) | pointer 2 (8 bytes) | pointer 3 (8 bytes) | | ... pointer n-1 (8 bytes) | pointer n (8 bytes) +------------------------------------------------------------------- | ... |
| Modifier and Type | Class and Description |
|---|---|
static class |
MutableHashTable.HashBucketIterator<BT,PT> |
static class |
MutableHashTable.ProbeIterator<PT> |
| Modifier and Type | Field and Description |
|---|---|
protected List<MemorySegment> |
availableMemory
The free memory segments currently available to the hash join.
|
protected MemorySegment[] |
buckets
The array of memory segments that contain the buckets which form the actual hash-table
of hash-codes and pointers to the elements.
|
protected int |
bucketsPerSegmentBits
The number of bits that describe the position of a bucket in a memory segment.
|
protected int |
bucketsPerSegmentMask
The number of hash table buckets in a single memory segment - 1.
|
protected TypeComparator<BT> |
buildSideComparator
The utilities to hash and compare the build side data types.
|
protected TypeSerializer<BT> |
buildSideSerializer
The utilities to serialize the build side data types.
|
protected AtomicBoolean |
closed
Flag indicating that the closing logic has been invoked.
|
protected FileIOChannel.Enumerator |
currentEnumerator
The channel enumerator that is used while processing the current partition to create
channels for the spill partitions it requires.
|
protected int |
currentRecursionDepth
The recursion depth of the partition that is currently processed.
|
protected boolean |
furtherPartitioning |
protected IOManager |
ioManager
The I/O manager used to instantiate writers for the spilled partitions.
|
protected boolean |
keepBuildSidePartitions
If true, build side partitions are kept for multiple probe steps.
|
protected int |
numBuckets
The number of buckets in the current table.
|
protected ArrayList<HashPartition<BT,PT>> |
partitionsBeingBuilt
The partitions that are built by processing the current partition.
|
protected MutableHashTable.ProbeIterator<PT> |
probeIterator
Iterator over the elements from the probe side.
|
protected TypeSerializer<PT> |
probeSideSerializer
The utilities to serialize the probe side data types.
|
protected int |
segmentSize
The size of the segments used by the hash join buckets.
|
protected LinkedBlockingQueue<MemorySegment> |
writeBehindBuffers
The queue of buffers that can be used for write-behind.
|
protected int |
writeBehindBuffersAvailable
The number of buffers in the write behind queue that are actually not write behind buffers,
but regular buffers that only have not yet returned.
|
| Constructor and Description |
|---|
MutableHashTable(TypeSerializer<BT> buildSideSerializer,
TypeSerializer<PT> probeSideSerializer,
TypeComparator<BT> buildSideComparator,
TypeComparator<PT> probeSideComparator,
TypePairComparator<PT,BT> comparator,
List<MemorySegment> memorySegments,
IOManager ioManager) |
MutableHashTable(TypeSerializer<BT> buildSideSerializer,
TypeSerializer<PT> probeSideSerializer,
TypeComparator<BT> buildSideComparator,
TypeComparator<PT> probeSideComparator,
TypePairComparator<PT,BT> comparator,
List<MemorySegment> memorySegments,
IOManager ioManager,
int avgRecordLen) |
| Modifier and Type | Method and Description |
|---|---|
void |
abort() |
static byte |
assignPartition(int bucket,
byte numPartitions)
Assigns a partition to a bucket.
|
protected void |
buildInitialTable(MutableObjectIterator<BT> input) |
protected void |
buildTableFromSpilledPartition(HashPartition<BT,PT> p) |
protected void |
clearPartitions()
This method clears all partitions currently residing (partially) in memory.
|
void |
close()
Closes the hash table.
|
protected void |
createPartitions(int numPartitions,
int recursionLevel) |
MutableHashTable.HashBucketIterator<BT,PT> |
getBuildSideIterator() |
PT |
getCurrentProbeRecord() |
List<MemorySegment> |
getFreedMemory() |
static int |
getInitialTableSize(int numBuffers,
int bufferSize,
int numPartitions,
int recordLenBytes) |
MutableHashTable.HashBucketIterator<BT,PT> |
getMatchesFor(PT record) |
protected HashPartition<BT,PT> |
getNewInMemoryPartition(int number,
int recursionLevel)
Returns a new inMemoryPartition object.
|
static int |
getNumWriteBehindBuffers(int numBuffers)
Determines the number of buffers to be used for asynchronous write behind.
|
MutableObjectIterator<BT> |
getPartitionEntryIterator() |
static int |
getPartitioningFanOutNoEstimates(int numBuffers)
Gets the number of partitions to be used for an initial hash-table, when no estimates are
available.
|
TypeComparator<PT> |
getProbeSideComparator() |
static int |
hash(int code,
int level)
This function hashes an integer value.
|
protected void |
initTable(int numBuckets,
byte numPartitions) |
protected void |
insertIntoTable(BT record,
int hashCode) |
boolean |
nextRecord() |
MemorySegment |
nextSegment() |
void |
open(MutableObjectIterator<BT> buildSide,
MutableObjectIterator<PT> probeSide)
Opens the hash join.
|
protected boolean |
prepareNextPartition() |
protected boolean |
processProbeIter() |
protected void |
releaseTable()
Releases the table (the array of buckets) and returns the occupied memory segments to the list of free segments.
|
protected int |
spillPartition()
Selects a partition and spills it.
|
protected final TypeSerializer<BT> buildSideSerializer
protected final TypeSerializer<PT> probeSideSerializer
protected final TypeComparator<BT> buildSideComparator
protected final List<MemorySegment> availableMemory
protected final LinkedBlockingQueue<MemorySegment> writeBehindBuffers
protected final IOManager ioManager
protected final int segmentSize
protected final int bucketsPerSegmentMask
protected final int bucketsPerSegmentBits
protected final ArrayList<HashPartition<BT,PT>> partitionsBeingBuilt
protected MutableHashTable.ProbeIterator<PT> probeIterator
protected FileIOChannel.Enumerator currentEnumerator
protected MemorySegment[] buckets
protected int numBuckets
protected int writeBehindBuffersAvailable
protected int currentRecursionDepth
protected AtomicBoolean closed
protected boolean keepBuildSidePartitions
protected boolean furtherPartitioning
public MutableHashTable(TypeSerializer<BT> buildSideSerializer, TypeSerializer<PT> probeSideSerializer, TypeComparator<BT> buildSideComparator, TypeComparator<PT> probeSideComparator, TypePairComparator<PT,BT> comparator, List<MemorySegment> memorySegments, IOManager ioManager)
public MutableHashTable(TypeSerializer<BT> buildSideSerializer, TypeSerializer<PT> probeSideSerializer, TypeComparator<BT> buildSideComparator, TypeComparator<PT> probeSideComparator, TypePairComparator<PT,BT> comparator, List<MemorySegment> memorySegments, IOManager ioManager, int avgRecordLen)
public void open(MutableObjectIterator<BT> buildSide, MutableObjectIterator<PT> probeSide) throws IOException
IOException - Thrown, if an I/O problem occurs while spilling a partition.protected boolean processProbeIter()
throws IOException
IOExceptionprotected boolean prepareNextPartition()
throws IOException
IOExceptionpublic boolean nextRecord()
throws IOException
IOExceptionpublic MutableHashTable.HashBucketIterator<BT,PT> getMatchesFor(PT record) throws IOException
IOExceptionpublic PT getCurrentProbeRecord()
public MutableHashTable.HashBucketIterator<BT,PT> getBuildSideIterator()
public MutableObjectIterator<BT> getPartitionEntryIterator()
public void close()
public void abort()
public List<MemorySegment> getFreedMemory()
protected void buildInitialTable(MutableObjectIterator<BT> input) throws IOException
input - IOExceptionprotected void buildTableFromSpilledPartition(HashPartition<BT,PT> p) throws IOException
p - IOExceptionprotected final void insertIntoTable(BT record, int hashCode) throws IOException
record - hashCode - IOExceptionprotected HashPartition<BT,PT> getNewInMemoryPartition(int number, int recursionLevel)
protected void createPartitions(int numPartitions,
int recursionLevel)
numPartitions - protected void clearPartitions()
This method is intended for a hard cleanup in the case that the join is aborted.
protected void initTable(int numBuckets,
byte numPartitions)
protected void releaseTable()
protected int spillPartition()
throws IOException
IOExceptionpublic MemorySegment nextSegment()
nextSegment in interface MemorySegmentSourcepublic static final int getNumWriteBehindBuffers(int numBuffers)
numBuffers - The number of available buffers.public static final int getPartitioningFanOutNoEstimates(int numBuffers)
The current logic makes sure that there are always between 10 and 127 partitions, and close to 0.1 of the number of buffers.
numBuffers - The number of buffers available.public static final int getInitialTableSize(int numBuffers,
int bufferSize,
int numPartitions,
int recordLenBytes)
public static final byte assignPartition(int bucket,
byte numPartitions)
bucket - numPartitions - public static final int hash(int code,
int level)
code - The integer to be hashed.public TypeComparator<PT> getProbeSideComparator()
Copyright © 2015 The Apache Software Foundation. All rights reserved.