Operations

SparkSQL defines operations in spearate classes under the package org.apache.spark.sql.execution, and uses the class names as operation names. Therefore, we identified the operations via the linux command

git clone https://github.com/apache/spark
cd spark
git checkout f4e53fa9
cd sql/core/src/main/scala/org/apache/spark/sql/execution
find . -type f -name "*Exec.scala"| sed -n 's|.*/\([^/]*\)Exec.scala|\1|p'

We further manually examined the identified operations and removed three abstract base classes: BaseAggregate, BaseJoin, and BaseScriptTransformation, which are used for other operations and are not shown as operation names in query plans.

Operation	Category	Reference	Description
AdaptiveSparkPlan	Executor	Link	The operation appears in the root of query plans and all children nodes are executed adaptively.
AddPartition	Executor	Link	It adds partitions to the table.
AggregateInPandas	Folder	Link	Aggregation with group aggregate Pandas UDF.
AlterNamespaceSetProperties	Consumer	Link	Setting properties of the namespace.
AlterTable	Consumer	Link	Altering a table.
AQEShuffleRead	Executor	Link	A wrapper of shuffle query stage
ArrowEvalPython	Executor	Link	A physical plan that evaluates a [[PythonUDF]].
AttachDistributedSequence	Consumer	Link	Adds a new long column with `sequenceAttr` that increases one by one.
BatchEvalPython	Producer	Link	A physical plan that evaluates a [[PythonUDF]].
BatchScan	Producer	Link	Scanning a batch of data from a data source v2.
BroadcastExchange	Executor	Link	A BroadcastExchangeExec collects, transforms, and finally broadcasts the result of a transformed SparkPlan.
BroadcastHashJoin	Executor	Link	Performs an inner hash join of two child relations. When the output RDD of this operator is being constructed, a Spark job is asynchronously started to calculate the values for the broadcast relation. This data is then placed in a Spark broadcast variable. The streamed relation is not shuffled.
BroadcastNestedLoopJoin	Executor	Link	Performs a nested loop join of two child relations.
CacheTable	Executor	Link	Cache table into memory.
CartesianProduct	Folder	Link	Apply the cartesian product of two children.
CollectMetrics	Executor	Link	Collect arbitrary (named) metrics from a [[SparkPlan]].
CommandResult	Executor	Link	Physical plan node for holding data from a command.
ContinuousScan	Producer	Link	Physical plan node for scanning data from a streaming data source with continuous mode.
CreateIndex	Consumer	Link	Physical plan node for creating an index.
CreateNamespace	Consumer	Link	Physical plan node for creating a namespace.
CreateTable	Consumer	Link	Physical plan node for creating a table.
DataSourceScan	Producer	Link	Read data from any data source.
DeleteFromTable	Consumer	Link	Delete data from a table.
DescribeColumn	Executor	Link	Obtain information about a column.
DescribeNamespace	Executor	Link	Obtain information about a namespace.
DescribeTable	Executor	Link	Obtain information about a table.
DropIndex	Consumer	Link	Delete an index.
DropNamespace	Consumer	Link	Remove a namespace.
DropPartition	Consumer	Link	Remove a partition.
DropTable	Consumer	Link	Delete a table.
EvalPython	Executor	Link	A physical plan that evaluates a PythonUDF, one partition of tuples at a time.
EventTimeWatermark	Executor	Link	Used to mark a column as containing the event time for a given record.
Expand	Executor	Link	Apply all of the GroupExpressions to every input row.
FlatMapCoGroupsInPandas	Executor	Link	Physical node for [[org.apache.spark.sql.catalyst.plans.logical.FlatMapCoGroupsInPandas]]
FlatMapGroupsInPandas	Executor	Link	Physical node for [[org.apache.spark.sql.catalyst.plans.logical.FlatMapGroupsInPandas]]
FlatMapGroupsInPandasWithState	Executor	Link	Physical operator for executing [[org.apache.spark.sql.catalyst.plans.logical.FlatMapGroupsInPandasWithState]]
FlatMapGroupsWithState	Executor	Link	Physical operator for executing `FlatMapGroupsWithState`
Generate	Executor	Link	Applies a [[Generator]] to a stream of input rows, combining the output of each into a new stream of rows.
HashAggregate	Folder	Link	Aggregates rows for a GROUP BY operation using a hash table.
InMemoryTableScan	Producer	Link	Scan the tables in memory.
LocalTableScan	Producer	Link	Physical plan node for scanning data from a local collection.
MapInBatch	Executor	Link	A relation produced by applying a function that takes an iterator of batches, such as pandas DataFrame or PyArrow’s record batches, and outputs an iterator of them.
MapInPandas	Executor	Link	A relation produced by applying a function that takes an iterator of pandas DataFrames and outputs an iterator of pandas DataFrames.
MergingSessions	Executor	Link	This node is a variant of SortAggregateExec which merges the session windows based on the fact child node will provide inputs as sorted by group keys + the start time of the session window.
MicroBatchScan	Producer	Link	Physical plan node for scanning a micro-batch of data from a data source.
ObjectHashAggregate	Folder	Link	A hash-based aggregate operator that supports [[TypedImperativeAggregate]] functions that may use arbitrary JVM objects as aggregation states.
PythonMapInArrow	Executor	Link	A relation produced by applying a function that takes an iterator of PyArrow’s record batches and outputs an iterator of PyArrow’s record batches.
QueryStage	Executor	Link	A query stage is an independent subgraph of the query plan.
RefreshTable	Executor	Link	Refresh all caches referencing the given table.
RenamePartition	Consumer	Link	Physical plan node for renaming a table partition.
RenameTable	Consumer	Link	Physical plan node for renaming a table.
ReplaceTable	Consumer	Link	Physical plan node for replacing a table.
SetCatalogAndNamespace	Consumer	Link	Physical plan node for setting the current catalog and/or namespace.
ShowCreateTable	Executor	Link	Physical plan node for showing creating a table.
ShowFunctions	Executor	Link	Physical plan node for showing functions.
ShowNamespaces	Executor	Link	Physical plan node for showing namespaces.
ShowPartitions	Executor	Link	Physical plan node for showing partitions.
ShowTableProperties	Executor	Link	Physical plan node for showing table properties.
ShowTables	Executor	Link	Physical plan node for showing tables.
ShuffledHashJoin	Executor	Link	Performs a hash join of two child relations by first shuffling the data using the join keys.
ShuffleExchange	Executor	Link	Performs a shuffle that will result in the desired partitioning.
Sort	Combinator	Link	Performs (external) sorting.
SortAggregate	Folder	Link	Sort-based aggregate operator.
SortMergeJoin	Join	Link	Performs a sort-merge join of two child relations.
SparkScriptTransformation	Executor	Link	Transforms the input by forking and running the specified script.
StreamingSymmetricHashJoin	Join	Link	Performs stream-stream join using symmetric hash join algorithm.
SubqueryAdaptiveBroadcast	Executor	Link	Similar to [[SubqueryBroadcastExec]], this node is used to store the initial physical plan of DPP subquery filters when enabling both AQE and DPP. It is an intermediate physical plan and not executable.
SubqueryBroadcast	Executor	Link	Physical plan for a custom subquery that collects and transforms the broadcast key values.
TruncatePartition	Consumer	Link	Physical plan node for table partition truncation.
TruncateTable	Consumer	Link	Physical plan node for table truncation.
UpdatingSessions	Consumer	Link	This node updates the session window spec of each input row by analyzing neighbor rows and determining rows belonging to the same session window. The number of input rows remains the same.
V2Command	Executor	Link	A physical operator that executes run() and saves the result to prevent multiple executions.
WholeStageCodegen	Executor	Link	WholeStageCodegen compiles a subtree of plans that support codegen together into a single Java function.
Window	Folder	Link	This class calculates and outputs (windowed) aggregates over the rows in a single (sorted) partition.
WindowInPandas	Executor	Link	This operation calculates and outputs windowed aggregates over the rows in a single partition.
WriteToContinuousDataSource	Executor	Link	The physical plan for writing data into continuous processing [[StreamingWrite]].
WriteToDataSourceV2	Executor	Link	The physical plan for writing data into a data source.

References