Question

Where can I find an exhaustive list of actions for spark?

I want to know exactly what I can do in spark without triggering the computation of the spark RDD/DataFrame.

It's my understanding that only actions trigger the execution of the transformations in order to produce a DataFrame. The problem is that I'm unable to find a comprehensive list of spark actions.

Spark documentation lists some actions, but it's not exhaustive. For example show is not there, but it is considered an action.

  • Where can I find a full list of actions?
  • Can I assume that all methods listed here are also actions?
 8  87  8
1 Jan 1970

Solution

 4

All the methods annotated in the with @group action are actions. They can be found as a list here in scaladocs. They can also be found in the source where each method is defined, looking like this:

   * @group action
   * @since 1.6.0
   */
  def show(numRows: Int): Unit = show(numRows, truncate = true)

Additionally, some other methods do not have that annotation, but also perform an eager evaluation: Those that call withAction. Checkpoint, for example, actually performs an action but isn't grouped as such in the docs:

private[sql] def checkpoint(eager: Boolean, reliableCheckpoint: Boolean): Dataset[T] = {
    val actionName = if (reliableCheckpoint) "checkpoint" else "localCheckpoint"
    withAction(actionName, queryExecution) { physicalPlan =>
      val internalRdd = physicalPlan.execute().map(_.copy())
      if (reliableCheckpoint) {

To find all of them

  1. Go to the source
  2. Use control + F
  3. Search for private def withAction
  4. Click on withAction
  5. On the right you should see a list of methods that use them. This is how that list currently looks:

current withAction methods

2024-07-09
Chris

Solution

 0

I don't think there exists an exhaustive list of all Spark actions out there. But I think it is helpful to build up a mental model on the difference and refer to the documentation when needed.

For transformation there is no expected output from calling the function alone. It is only when you call an action that Spark starts to compute the results. There are three kinds of actions as follows

(Excerpt from Spark: The Definitive Guide) Excerpt from Spark: The Definitive Guide

The link you provided lists some actions, but includes transformations in there as well

2024-07-08
Ahmed Nader