We have not found clear definitions of properties in the official document and source code, so we examined the properties listed in Spark Web UI, and copied their descriptions as follows.
Property | Category | Reference | Description |
---|---|---|---|
avg hash probe bucket list iters | Cost | Link | the average bucket list iterations per lookup during aggregation |
data size | Cardinality | Link | Estimated size of broadcast/shuffled/collected data of the operator |
data size of build side | Cost | Link | the size of the built hash map |
fetch wait time | Cost | Link | the time spent on fetching data (local and remote) |
local blocks read | Cardinality | Link | the number of blocks read locally |
local bytes read | Cardinality | Link | the number of bytes read locally |
metadata time | Cost | Link | the time spent on getting metadata like the number of partitions, number of files |
number of output rows | Cardinality | Link | the number of output rows of the operator |
peak memory | Cost | Link | the peak memory usage in the operator |
records read | Cardinality | Link | the number of read records |
remote blocks read | Cardinality | Link | the number of blocks read remotely |
remote bytes read | Cardinality | Link | the number of bytes read remotely |
remote bytes read to disk | Cardinality | Link | the number of bytes read from remote to local disk |
scan time | Cost | Link | the time spent on scanning data |
shuffle bytes written | Cardinality | Link | the number of bytes written |
shuffle records written | Cardinality | Link | the number of records written |
shuffle write time | Cardinality | Link | the time spent on shuffle writing |
sort time | Cost | Link | the time spent on sorting |
spill size | Cost | Link | number of bytes spilled to disk from memory in the operator |
time in aggregation build | Cost | Link | the time spent on aggregation |
time to build hash map | Cost | Link | the time spent on building a hash map |
time to collect | Cost | Link | the time spent on collecting data |