spark-sql遇到的问题
发布时间
阅读量:
阅读量
- 在Spark SQL中使用$符号来引用字段时,在不导入spark.implicits库的情况下可能会导致运行时错误。
- 当Session读取JSON文件时,默认将每一行视为一个完整的JSON解析对象;如果实际的JSON数据是以分片形式存储的,则会导致解析错误。
ERROR FileFormatWriter: Aborting job null.
org.apache.spark.sql.AnalysisException: Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the
referenced columns only include the internal corrupt record column
(named _corrupt_record by default). For example:
spark.read.schema(schema).json(file).filter($"_corrupt_record".isNotNull).count()
and spark.read.schema(schema).json(file).select("_corrupt_record").show().
Instead, you can cache or save the parsed results and then send the same query.
For example, val df = spark.read.schema(schema).json(file).cache() and then
df.filter($"_corrupt_record".isNotNull).count().;
1,{"year":"2021","month":"202102","day":"01"}
2,{"year":"2021",
"month":"202102","day":"01"}
未完待续。。。
全部评论 (0)
还没有任何评论哟~
