scala dataframe에서 null 값 coalesce으로 채우기 :: 개발하다가 쓰는 블로그

ABOUT ME

-

Today: -

Yesterday: -

Total: -

scala dataframe에서 null 값 coalesce으로 채우기

프로그래밍/Scala 2020. 12. 3. 20:05
coalesce은 partition을 줄여주는 역할 뿐만 아니라 null 값을 채워줄 때도 사용할 수 있다.

위와 같은 data frame이 있을 때, first라고 기술하였기 때문에 DT 정렬 기준으로 첫번째 값(1)으로 null이 채워진다.

import org.apache.spark.sql.expressions.Window val w = Window.orderBy("DT") ex_df.withColumn("ex_col", coalesce(col("ex_col"), first(col("ex_col"), true).over(w)))

가장 마지막 값인 3으로 채워지길 바란다면 window에 desc을 걸거나

val w = Window.orderBy("DT").desc ex_df.withColumn("ex_col", coalesce(col("ex_col"), first(col("ex_col"), true).over(w))).show(10)

first 대신 last를 사용하면 된다.

ex_df.withColumn("ex_col", coalesce(col("ex_col"), last(col("ex_col"), true).over(w))).show(10)

다음과 같은 결과를 얻을 수 있다.
저작자표시 비영리 변경금지 (새창열림)
댓글

인기포스트

ABOUT ME

초보 개발자의 메모장

LINK

ADMIN

티스토리툴바