Properties

We have not found clear definitions of properties in the official document and source code, so we examined the properties listed in Spark Web UI, and copied their descriptions as follows.

Property	Category	Reference	Description
avg hash probe bucket list iters	Cost	Link	the average bucket list iterations per lookup during aggregation
data size	Cardinality	Link	Estimated size of broadcast/shuffled/collected data of the operator
data size of build side	Cost	Link	the size of the built hash map
fetch wait time	Cost	Link	the time spent on fetching data (local and remote)
local blocks read	Cardinality	Link	the number of blocks read locally
local bytes read	Cardinality	Link	the number of bytes read locally
metadata time	Cost	Link	the time spent on getting metadata like the number of partitions, number of files
number of output rows	Cardinality	Link	the number of output rows of the operator
peak memory	Cost	Link	the peak memory usage in the operator
records read	Cardinality	Link	the number of read records
remote blocks read	Cardinality	Link	the number of blocks read remotely
remote bytes read	Cardinality	Link	the number of bytes read remotely
remote bytes read to disk	Cardinality	Link	the number of bytes read from remote to local disk
scan time	Cost	Link	the time spent on scanning data
shuffle bytes written	Cardinality	Link	the number of bytes written
shuffle records written	Cardinality	Link	the number of records written
shuffle write time	Cardinality	Link	the time spent on shuffle writing
sort time	Cost	Link	the time spent on sorting
spill size	Cost	Link	number of bytes spilled to disk from memory in the operator
time in aggregation build	Cost	Link	the time spent on aggregation
time to build hash map	Cost	Link	the time spent on building a hash map
time to collect	Cost	Link	the time spent on collecting data

References