[06:25:15] Hi team [06:57:06] 10Analytics, 10Product-Analytics: event_user_id is always NULL for anonymous edits in Mediawiki History table - https://phabricator.wikimedia.org/T232171 (10JAllemandou) Good point @nettrom_WMF, there is indeed a discrepancy between doc and reality, introduced with the database change of using a normalized tab... [07:00:38] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: event_user_id is always NULL for anonymous edits in Mediawiki History table - https://phabricator.wikimedia.org/T232171 (10JAllemandou) [07:01:17] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: event_user_id is always NULL for anonymous edits in Mediawiki History table - https://phabricator.wikimedia.org/T232171 (10JAllemandou) a:03JAllemandou [07:01:18] 10Analytics, 10Analytics-Kanban: Sqoop: remove cuc_comment and join to comment table - https://phabricator.wikimedia.org/T217848 (10JAllemandou) a:05JAllemandou→03None [09:48:35] (03PS5) 10Fdans: (wip) Add cassandra loading job for requests per file metric [analytics/refinery] - 10https://gerrit.wikimedia.org/r/533921 (https://phabricator.wikimedia.org/T228149) [10:32:44] (03PS6) 10Fdans: (wip) Add cassandra loading job for requests per file metric [analytics/refinery] - 10https://gerrit.wikimedia.org/r/533921 (https://phabricator.wikimedia.org/T228149) [10:33:17] (03PS7) 10Fdans: (wip) Add cassandra loading job for requests per file metric [analytics/refinery] - 10https://gerrit.wikimedia.org/r/533921 (https://phabricator.wikimedia.org/T228149) [12:09:22] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: event_user_id is always NULL for anonymous edits in Mediawiki History table - https://phabricator.wikimedia.org/T232171 (10Nuria) 05Open→03Resolved [13:25:39] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10CPT Initiatives (Modern Event Platform (TEC2)), 10Services (watching): Migrate all event-schemas schemas to current.yaml and materialize with jsonschema-tools. - https://phabricator.wikimedia.org/T232144 (10Ottomata) 05Resolved→03Open Let's use thi... [13:25:44] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10CPT Initiatives (Modern Event Platform (TEC2)), and 2 others: CI Support for Schema Registry - https://phabricator.wikimedia.org/T206814 (10Ottomata) [15:16:10] (03PS1) 10Fdans: Add per file media requests endpoing to AQS [analytics/aqs] - 10https://gerrit.wikimedia.org/r/534824 (https://phabricator.wikimedia.org/T207208) [15:17:30] (03PS8) 10Fdans: (wip) Add cassandra loading job for requests per file metric [analytics/refinery] - 10https://gerrit.wikimedia.org/r/533921 (https://phabricator.wikimedia.org/T228149) [15:24:20] (03PS2) 10Fdans: Add per file media requests endpoing to AQS [analytics/aqs] - 10https://gerrit.wikimedia.org/r/534824 (https://phabricator.wikimedia.org/T207208) [15:24:50] (03PS9) 10Fdans: Add cassandra loading job for requests per file metric [analytics/refinery] - 10https://gerrit.wikimedia.org/r/533921 (https://phabricator.wikimedia.org/T228149) [15:38:34] (03CR) 10Nuria: [C: 04-1] "I have suggested some rewording and have just one question about the code" (0311 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/534824 (https://phabricator.wikimedia.org/T207208) (owner: 10Fdans) [15:43:53] (03CR) 10Nuria: [C: 04-1] Add cassandra loading job for requests per file metric (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/533921 (https://phabricator.wikimedia.org/T228149) (owner: 10Fdans) [15:46:02] 10Analytics, 10Services (watching), 10cloud-services-team (Kanban): Mediarequests: Add endpoint for agreggated counts per file type per project - https://phabricator.wikimedia.org/T231589 (10Nuria) a:03fdans [15:52:47] Hey team - Melissa is stuck on the road behind an accident, I'll miss standup or arrive super late [15:52:52] sorry for the short notice [15:53:10] I'll send an escrum if needed when she arrives [15:55:31] joal: Ok! [15:56:10] 10Analytics, 10Analytics-Kanban, 10Services (watching): Mediarequests: Add endpoint for agreggated counts per file type per project - https://phabricator.wikimedia.org/T231589 (10Nuria) [15:57:22] 10Analytics, 10Analytics-Kanban, 10Tool-Pageviews: Create new mediarequests table - https://phabricator.wikimedia.org/T229817 (10Nuria) [15:57:54] 10Analytics, 10Analytics-Kanban, 10Tool-Pageviews: Create new mediarequests table - https://phabricator.wikimedia.org/T229817 (10Nuria) ping @fdans seems like if the oozie job is working we can move this ticket to done right? [15:58:21] 10Analytics, 10Analytics-Kanban, 10StructuredDataOnCommons, 10Tool-Pageviews: Change name and format of partition column in mediarequest table - https://phabricator.wikimedia.org/T231030 (10Nuria) 05Open→03Declined [15:58:30] 10Analytics, 10Analytics-Kanban, 10StructuredDataOnCommons, 10Tool-Pageviews, 10Patch-For-Review: Add literal transcoding to media file properties UDF - https://phabricator.wikimedia.org/T230312 (10Nuria) [15:59:59] 10Analytics, 10Tool-Pageviews: Add referrer to mediarequests dataset to inform about project - https://phabricator.wikimedia.org/T228151 (10Nuria) [16:01:38] 10Analytics, 10Tool-Pageviews: Add referrer to mediarequests dataset to inform about project - https://phabricator.wikimedia.org/T228151 (10Nuria) Ping @fdans thsi ticket can probably be moved to "done" in the canvas no? [16:02:06] ping nuria standuuup! [16:02:19] and ottomata i think [16:35:45] (03PS5) 10Mforns: [WIP] Add spark job to create mediawiki history dumps [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/528504 (https://phabricator.wikimedia.org/T208612) [18:59:52] RECOVERY - Check if active EventStreams endpoint is delivering messages. on icinga1001 is OK: OK: An EventStreams message was consumed from https://stream.wikimedia.org/v2/stream/recentchange within 10 seconds. https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams/Administration [19:17:53] looking at alerts [19:21:17] a-team wiki sites down for you or just me? [19:21:38] uoo [19:21:49] fdans: huge issue ongoing - probably DDoS - SRE on it [19:22:31] :( [19:22:34] joal: I'm guessing that's where our webrequest data loss errors are coming from? [19:22:56] it probably is yes [19:23:23] issue started at about 18UTC IIUC [19:23:46] The ops channel is a mess :( [19:31:56] joal, I changed to gzip and it worked fine [19:32:13] I think there was some problem in the way I specified Bzip2 [19:32:20] mforns: possible [19:32:23] because it failed in the end [19:32:35] maybe it was retrying, and that's why it took so long [19:32:51] mforns: I have wonders about how your job is structured :) [19:33:05] mforns: you are generating the base files currently, is that it? [19:33:21] what do you mean base files? [19:33:36] Files split by wiki and possibly time [19:34:55] joal, yes, and also there's a part that is supposed to rename the files, and move them to their final location with prettified names [19:35:07] but that didn't work :'( [19:35:18] still, I'm happy that the other part worked [19:35:39] ok - I was thinking about this last bit and realized I might have miss-driven you [19:35:43] mforns: --^ [19:36:05] why? [19:36:23] mforns: While running in spark for facilitation, the renaming/moving code doesn't make use of spark, only HDFS api [19:36:48] yes, I know [19:36:54] ok - I [19:36:59] again sorry [19:37:16] why? [19:37:17] ok - I had wondered if you had used spark to read / write the files [19:37:27] I wasnt expecting.. no no [19:37:31] which would be very inefficient [19:37:36] cool [19:37:45] no, it's using FileSystem [19:37:49] fs.rename [19:37:51] Awesome [19:38:21] well, soon awesome, when fixed :) [19:38:45] mforns: can you explain me how you specified your bz2 compression? [19:39:32] mforns: from https://yarn.wikimedia.org/cluster/app/application_1564562750409_160003 [19:39:45] mforns: Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.io.compress.Bzip2Codec not found [19:39:48] joal, same as with gzip, but changing org.apache.hadoop.io.compress.GzipCodec by org.apache.hadoop.io.compress.Bzip2Codec [19:40:00] yea, that's what I saw [19:40:36] joal, I thought that's how you would specify given that: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/BZip2Codec.html [19:40:39] mforns: And indeed, 6 AppMaster tries - long before failing [19:40:50] ooooooooh! [19:41:07] mforns: same page, full bottom [19:41:59] mforns: spark has API for compression codecs [19:42:11] for instance: df.write().option("compression","bzip2").text(filePath) [19:42:22] I see [19:42:42] mforns: using gzip should be done the same way [19:42:48] joal, so it should not be specified in the SparkConf? [19:42:56] Nope [19:43:25] mforns: well no - You can I guess, but it's probably having side effects [19:44:21] ok [19:45:03] will do! thanks :D [19:45:36] Thanks mforns :) [19:52:40] joal, then, do we need the other options on sparkconf: spark.hadoop.mapred.output.compress, spark.hadoop.mapred.output.compression.codec, spark.hadoop.mapred.output.compression.type ? [20:03:33] mforns: nope [20:03:43] ok :] [20:03:58] mforns: I think those options were useful in previous spark versions we used (1.6) [20:04:22] But I have not tested for real (maybe I'm completely mistaking) [20:13:27] Ok gone for tonight team [20:15:34] byeeeee