小编典典

无法将猪元组传递给python UDF

python

我有具有10K记录的master.txt,因此它的每一行都是一个元组,并且需要将相同的全部传递给python
UDF。由于它具有多个记录,因此在存储p2preportmap时会出现以下错误。请帮忙

错误如下:

无法打开别名p2preportmap的迭代器。后端错误:org.apache.pig.backend.executionengine.ExecException:错误0:标量在输出中有多个行。第一个:(010301,MTS,MM),第二个:(010B06,MTS,TN)(常见原因:“
JOIN”然后“ FOREACH … GENERATE foo.bar”应为“ foo :: bar”)

猪脚本如下:

REGISTER 'smsiuc_udf.py' using streaming_python as smsiuc_udfs;
cdrs = load '2016040111*' USING PigStorage('|','-tagFile') ;

mastergtrec = load 'master.txt' USING PigStorage(',','-tagFile');

mastergt = FOREACH mastergtrec GENERATE (chararray) UPPER($1) as opcdpc, (chararray) UPPER($2) as gtoptname,(chararray) UPPER($3) as gtoptcircle;

mastergttup = FOREACH mastergt generate TOTUPLE(opcdpc,gtoptname,gtoptcircle) as mstgttup;

cdrrecord = FOREACH cdrs GENERATE (chararray) UPPER($1) as aparty, (chararray) UPPER($2) as bparty,$3 as smssentdate,$4 as smssenttime,($29=='6' ? 'S' : 'F') as status,(chararray) UPPER($26) as srcgt,(chararray) UPPER($27) as destgt,($12=='405899136999995' ? 'MTSDEL-CDMA' : ($12=='919875089998' ? 'MTSRAJ-GSM' : ($12=='405899150999995' ? 'MTSCHN-CDMA' : $12) ) ) as smscgt, (chararray)$0 as cdrfname,(chararray) $13 as prepost;

filteredp2pcdrs = FILTER cdrrecord by smsiuc_udfs.pullp2pcdrs(aparty,bparty,srcgt,destgt) and status == 'S' and SUBSTRING(smssentdate,4,6) == '$MON';

groupp2pcdrs = GROUP filteredp2pcdrs by (srcgt,destgt,aparty,bparty,smscgt,status,prepost);

distinctp2pcdrs= FOREACH groupp2pcdrs {
uniq = DISTINCT filteredp2pcdrs.(srcgt,destgt,aparty,bparty,smscgt,status,prepost);
GENERATE FLATTEN(group),COUNT(uniq) as cnt;
};

p2preportmap = FOREACH distinctp2pcdrs GENERATE smsiuc_udfs.p2preport(srcgt,destgt,aparty,bparty,mastergttup ),smscgt,status,prepost,cnt

阅读 219

收藏
2021-01-20

共1个答案

小编典典

这可以通过添加虚拟列然后分组来完成。

dummmy = foreach p2preportmap生成1,$ 0,$ 1 ....

分组=组虚拟$ 0

2021-01-20