Skip to main content Link Menu Expand (external link) Document Search Copy Copied

We have not found clear definitions of properties in the official document and source code, so we examined the properties listed in Spark Web UI, and copied their descriptions as follows.

Property Category Reference Description
avg hash probe bucket list iters Cost Link the average bucket list iterations per lookup during aggregation
data size Cardinality Link Estimated size of broadcast/shuffled/collected data of the operator
data size of build side Cost Link the size of the built hash map
fetch wait time Cost Link the time spent on fetching data (local and remote)
local blocks read Cardinality Link the number of blocks read locally
local bytes read Cardinality Link the number of bytes read locally
metadata time Cost Link the time spent on getting metadata like the number of partitions, number of files
number of output rows Cardinality Link the number of output rows of the operator
peak memory Cost Link the peak memory usage in the operator
records read Cardinality Link the number of read records
remote blocks read Cardinality Link the number of blocks read remotely
remote bytes read Cardinality Link the number of bytes read remotely
remote bytes read to disk Cardinality Link the number of bytes read from remote to local disk
scan time Cost Link the time spent on scanning data
shuffle bytes written Cardinality Link the number of bytes written
shuffle records written Cardinality Link the number of records written
shuffle write time Cardinality Link the time spent on shuffle writing
sort time Cost Link the time spent on sorting
spill size Cost Link number of bytes spilled to disk from memory in the operator
time in aggregation build Cost Link the time spent on aggregation
time to build hash map Cost Link the time spent on building a hash map
time to collect Cost Link the time spent on collecting data

References