Question
distinct on data from multiple executors
When performing the distinct operation in Spark,
- Initially each partition computes distinct values based on hashing.
- These distinct values are then passed to the driver or another executor for a final computation of distinct values across all partitions.
Question: Where does the second level of distinct computation occur? Does it happen at the executor level or directly at the driver?
3 47
3