Skip to main content Link Menu Expand (external link) Document Search Copy Copied

SparkSQL defines operations in spearate classes under the package org.apache.spark.sql.execution, and uses the class names as operation names. Therefore, we identified the operations via the linux command

git clone https://github.com/apache/spark
cd spark
git checkout f4e53fa9
cd sql/core/src/main/scala/org/apache/spark/sql/execution
find . -type f -name "*Exec.scala"| sed -n 's|.*/\([^/]*\)Exec.scala|\1|p'

We further manually examined the identified operations and removed three abstract base classes: BaseAggregate, BaseJoin, and BaseScriptTransformation, which are used for other operations and are not shown as operation names in query plans.

Operation Category Reference Description
AdaptiveSparkPlan Executor Link The operation appears in the root of query plans and all children nodes are executed adaptively.
AddPartition Executor Link It adds partitions to the table.
AggregateInPandas Folder Link Aggregation with group aggregate Pandas UDF.
AlterNamespaceSetProperties Consumer Link Setting properties of the namespace.
AlterTable Consumer Link Altering a table.
AQEShuffleRead Executor Link A wrapper of shuffle query stage
ArrowEvalPython Executor Link A physical plan that evaluates a [[PythonUDF]].
AttachDistributedSequence Consumer Link Adds a new long column with `sequenceAttr` that increases one by one.
BatchEvalPython Producer Link A physical plan that evaluates a [[PythonUDF]].
BatchScan Producer Link Scanning a batch of data from a data source v2.
BroadcastExchange Executor Link  A BroadcastExchangeExec collects, transforms, and finally broadcasts the result of a transformed SparkPlan.
BroadcastHashJoin Executor Link Performs an inner hash join of two child relations.  When the output RDD of this operator is being constructed, a Spark job is asynchronously started to calculate the values for the broadcast relation.  This data is then placed in a Spark broadcast variable.  The streamed relation is not shuffled.
BroadcastNestedLoopJoin Executor Link Performs a nested loop join of two child relations.
CacheTable Executor Link Cache table into memory.
CartesianProduct Folder Link Apply the cartesian product of two children.
CollectMetrics Executor Link Collect arbitrary (named) metrics from a [[SparkPlan]].
CommandResult Executor Link Physical plan node for holding data from a command.
ContinuousScan Producer Link Physical plan node for scanning data from a streaming data source with continuous mode.
CreateIndex Consumer Link Physical plan node for creating an index.
CreateNamespace Consumer Link Physical plan node for creating a namespace.
CreateTable Consumer Link Physical plan node for creating a table.
DataSourceScan Producer Link Read data from any data source.
DeleteFromTable Consumer Link Delete data from a table.
DescribeColumn Executor Link Obtain information about a column.
DescribeNamespace Executor Link Obtain information about a namespace.
DescribeTable Executor Link Obtain information about a table.
DropIndex Consumer Link Delete an index.
DropNamespace Consumer Link Remove a namespace.
DropPartition Consumer Link Remove a partition.
DropTable Consumer Link Delete a table.
EvalPython Executor Link A physical plan that evaluates a PythonUDF, one partition of tuples at a time.
EventTimeWatermark Executor Link Used to mark a column as containing the event time for a given record.
Expand Executor Link Apply all of the GroupExpressions to every input row.
FlatMapCoGroupsInPandas Executor Link Physical node for [[org.apache.spark.sql.catalyst.plans.logical.FlatMapCoGroupsInPandas]]
FlatMapGroupsInPandas Executor Link Physical node for [[org.apache.spark.sql.catalyst.plans.logical.FlatMapGroupsInPandas]]
FlatMapGroupsInPandasWithState Executor Link Physical operator for executing [[org.apache.spark.sql.catalyst.plans.logical.FlatMapGroupsInPandasWithState]]
FlatMapGroupsWithState Executor Link Physical operator for executing `FlatMapGroupsWithState`
Generate Executor Link Applies a [[Generator]] to a stream of input rows, combining the output of each into a new stream of rows.
HashAggregate Folder Link Aggregates rows for a GROUP BY operation using a hash table.
InMemoryTableScan Producer Link Scan the tables in memory.
LocalTableScan Producer Link Physical plan node for scanning data from a local collection.
MapInBatch Executor Link A relation produced by applying a function that takes an iterator of batches, such as pandas DataFrame or PyArrow’s record batches, and outputs an iterator of them.
MapInPandas Executor Link A relation produced by applying a function that takes an iterator of pandas DataFrames and outputs an iterator of pandas DataFrames.
MergingSessions Executor Link This node is a variant of SortAggregateExec which merges the session windows based on the fact child node will provide inputs as sorted by group keys + the start time of the session window.
MicroBatchScan Producer Link Physical plan node for scanning a micro-batch of data from a data source.
ObjectHashAggregate Folder Link A hash-based aggregate operator that supports [[TypedImperativeAggregate]] functions that may use arbitrary JVM objects as aggregation states.
PythonMapInArrow Executor Link A relation produced by applying a function that takes an iterator of PyArrow’s record batches and outputs an iterator of PyArrow’s record batches.
QueryStage Executor Link A query stage is an independent subgraph of the query plan.
RefreshTable Executor Link Refresh all caches referencing the given table.
RenamePartition Consumer Link Physical plan node for renaming a table partition.
RenameTable Consumer Link Physical plan node for renaming a table.
ReplaceTable Consumer Link Physical plan node for replacing a table.
SetCatalogAndNamespace Consumer Link Physical plan node for setting the current catalog and/or namespace.
ShowCreateTable Executor Link Physical plan node for showing creating a table.
ShowFunctions Executor Link Physical plan node for showing functions.
ShowNamespaces Executor Link Physical plan node for showing namespaces.
ShowPartitions Executor Link Physical plan node for showing partitions.
ShowTableProperties Executor Link Physical plan node for showing table properties.
ShowTables Executor Link Physical plan node for showing tables.
ShuffledHashJoin Executor Link Performs a hash join of two child relations by first shuffling the data using the join keys.
ShuffleExchange Executor Link Performs a shuffle that will result in the desired partitioning.
Sort Combinator Link Performs (external) sorting.
SortAggregate Folder Link Sort-based aggregate operator.
SortMergeJoin Join Link Performs a sort-merge join of two child relations.
SparkScriptTransformation Executor Link Transforms the input by forking and running the specified script.
StreamingSymmetricHashJoin Join Link Performs stream-stream join using symmetric hash join algorithm.
SubqueryAdaptiveBroadcast Executor Link Similar to [[SubqueryBroadcastExec]], this node is used to store the initial physical plan of DPP subquery filters when enabling both AQE and DPP. It is an intermediate physical plan and not executable.
SubqueryBroadcast Executor Link Physical plan for a custom subquery that collects and transforms the broadcast key values.
TruncatePartition Consumer Link Physical plan node for table partition truncation.
TruncateTable Consumer Link Physical plan node for table truncation.
UpdatingSessions Consumer Link This node updates the session window spec of each input row by analyzing neighbor rows and determining rows belonging to the same session window. The number of input rows remains the same.
V2Command Executor Link A physical operator that executes run() and saves the result to prevent multiple executions.
WholeStageCodegen Executor Link WholeStageCodegen compiles a subtree of plans that support codegen together into a single Java function.
Window Folder Link This class calculates and outputs (windowed) aggregates over the rows in a single (sorted) partition.
WindowInPandas Executor Link This operation calculates and outputs windowed aggregates over the rows in a single partition.
WriteToContinuousDataSource Executor Link The physical plan for writing data into continuous processing [[StreamingWrite]].
WriteToDataSourceV2 Executor Link The physical plan for writing data into a data source.

References