[00:31:23] 10Analytics, 10Multimedia, 10Tool-Pageviews: Allow users to query mediarequests using a file page link - https://phabricator.wikimedia.org/T244712 (10Krinkle) Swift bucket identifiers like `wikipedia/commons/thumb` are internal and should imho not be turned into public APIs as strings that users should someh... [00:32:35] (03CR) 10Krinkle: Improve flexibility of media file paths (031 comment) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/571968 (https://phabricator.wikimedia.org/T244712) (owner: 10Fdans) [01:16:38] 10Analytics, 10Multimedia, 10Tool-Pageviews: Allow users to query mediarequests using a file page link - https://phabricator.wikimedia.org/T244712 (10Krinkle) > Why allow both forms? About 0.12% of Mediarequests are directed to files that weren't uploaded from a specific wiki, therefore not having a File: pa... [01:44:21] 10Analytics, 10Operations, 10Research, 10Traffic: Wikipedia Accessibility, check false positives and false negatives of traffic alarms - https://phabricator.wikimedia.org/T245166 (10Nuria) [01:48:44] (03PS9) 10Nuria: Classification of actors for bot detection [analytics/refinery] - 10https://gerrit.wikimedia.org/r/562368 (https://phabricator.wikimedia.org/T238361) [01:49:26] (03CR) 10Nuria: [C: 04-1] "Oozie jobs run, need to add a field to label table per irc conversation and also asses quality of data" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/562368 (https://phabricator.wikimedia.org/T238361) (owner: 10Nuria) [06:12:14] 10Analytics, 10Operations: Remove references to m4-master - https://phabricator.wikimedia.org/T245238 (10Marostegui) [07:09:14] \o/ for nuria's typos :) [07:30:09] 10Analytics, 10User-Elukey: Redesign architecture of irc-recentchanges on top of Kafka - https://phabricator.wikimedia.org/T234234 (10MZMcBride) I appreciate the effort being made toward maintaining capability. Since 2011, I've run the "snatch" IRC bot that sits on irc.wikimedia.org and relays to the "snitch"... [10:44:49] 10Analytics, 10Operations, 10serviceops, 10vm-requests, and 2 others: Create a replacement for kraz.wikimedia.org - https://phabricator.wikimedia.org/T244719 (10elukey) 05Open→03Resolved a:03elukey [10:44:52] 10Analytics, 10User-Elukey: Redesign architecture of irc-recentchanges on top of Kafka - https://phabricator.wikimedia.org/T234234 (10elukey) [10:45:04] 10Analytics, 10User-Elukey: Redesign architecture of irc-recentchanges on top of Kafka - https://phabricator.wikimedia.org/T234234 (10elukey) @faidon irc2001 is ready for testing! [11:32:12] 10Analytics, 10Analytics-Cluster, 10User-Elukey: Upgrade the Hadoop test cluster to BigTop - https://phabricator.wikimedia.org/T244499 (10elukey) The following was needed in zookeeper to fix https://issues.apache.org/jira/browse/YARN-8310 permanently for the RM: ` setAcl /yarn-rmstore/analytics-test-hadoop/Z... [11:32:41] * elukey lunch! [12:23:36] hi elukey - Sorry for not answering, Naé is home with chicken-pox [13:07:13] 10Analytics, 10Operations: Remove references to m4-master - https://phabricator.wikimedia.org/T245238 (10jbond) p:05Triage→03Medium [13:14:48] 10Analytics, 10Operations, 10Research, 10Traffic: Wikipedia Accessibility, check false positives and false negatives of traffic alarms - https://phabricator.wikimedia.org/T245166 (10jbond) p:05Triage→03Medium [13:47:40] joal: np! Please take care of Nae! I just write silly things during the day you know :) [13:54:31] thinking out loud in here to see if anybody has ideas [13:55:00] so the problem is that after the upgrade to BigTop in Hadoop test, I cannot see yarn apps in RUNNING state [13:55:21] all the jobs are correctly submitted but they reach the ACCEPTED stated, and never go in RUNNING [13:55:33] from the Yarn UI I can see all the nodes recognized [13:55:39] vcores, ram, etc.. [13:55:52] all "Free", nothing that indicates cluster busy [13:56:06] no error messages on Node managers or Resource Managers [13:56:15] tried also to restart multiple times, same thing [13:57:25] usually this behavior happens when there are no resources available for Yarn [13:57:30] but it doesn't seem the case [14:00:04] via cumin I don't see any jvm created on the workers other than datanode/journalnode/nodemanager [14:00:35] so it shouldn't be anything happening when the app master is created (I'd also expect logs in case this steps errors out) [14:03:19] elukey: dumb double check: do you use the correct version number for examples jar? [14:09:43] joal: no no atm I just tried re-enabling jobs or simply trying to query hive [14:09:59] didn't try the test job, lemme check [14:15:02] elukey: I guess there is a probability of version mismatch between oozie and the new cluster version [14:15:53] joal: no no sorry jobs as timers [14:16:00] Ah [14:16:06] camus for instance? [14:16:20] yep [14:16:27] same thing with word count, stuck in ACCEPTED [14:16:32] it is like there were no resources [14:16:38] it must be a stupid thing [14:16:57] Same idea elukey: camus is compiled with hadoop low versions (maybe not that though) [14:18:55] yep yep I took into account that some weird behavior related to that could have happened, but with some logs screaming something :D [14:20:20] (03CR) 10Milimetric: [C: 04-1] Url encode file name before querying the data store (031 comment) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/570608 (https://phabricator.wikimedia.org/T244373) (owner: 10Fdans) [14:20:31] right [14:20:34] hmmmm [14:22:56] elukey: may try to kill one of the app luanched to look at logs? [14:23:05] if any, not even started :( [14:24:25] ah yes please [14:24:39] didn't want to force you doing that, it was only a chat to see if you have ideas [14:24:41] elukey: killing your rgep-search [14:24:43] I can do it, don't worry [14:24:44] :) [14:24:51] so you can keep going with your tasks [14:24:59] * joal likes trying to help :) [14:25:17] yes I know but I don't want to nerd snipe you :) [14:27:10] elukey: from which machine can I use hdfs user keytab? [14:27:42] analytics1028,29,30 [14:27:46] (masters and coord) [14:27:53] can't from 30 :( [14:28:18] PBCAK sorry - all goods [14:28:53] no log - obviously [14:29:49] elukey: could be related to the 2 nodes with issues [14:30:14] elukey: see http://localhost:8088/cluster/nodes [14:30:43] yes yes but there is plenty of capacity [14:31:04] it would be really bad if it wasn't running for that [14:31:06] yes but if local dirs are not set properly, maybe yarn doesn't accept to do stuff? [14:31:10] yeah I agree [14:31:31] I'll fix them so we'll rule out that variable [14:31:35] ack! [14:35:07] actually elukey, easiest might be to just remove them from the cluster for now, should do the trick I guess (to check) [14:35:48] it is a quick patch, deploying now [14:36:03] sure elukey [14:39:27] all healthy now, still no running app :( [14:39:35] that is good though [14:42:20] milimetric: how would there be no / present at all? [14:42:38] fdans: like if someone passes "blah" or something [14:43:29] I guess it throws a 404 but I was lazy and didn't check if there'd be an out of bounds array error [14:43:54] * elukey coffee [14:44:08] fdans: also, I guess you're changing that behavior anyway with your next change that I'm reviewing now, if you want you can squash them together, they're mostly related [14:44:39] milimetric: i think I'd rather not because one is more controversial than the other [14:44:50] this one is absolutely necessary [14:45:06] (aka one fixes a bug, the other one adds features) [14:45:44] makes sense [14:52:38] hm, looks like webrequest_sampled_128 has continent and country code, but not a subdivision code (e.g. for which US state)? [14:56:39] cdanis: yep we didn't add it add it at the time since it wasn't needed, is it now? :) [14:56:48] (less dimensions etc..) [14:57:06] not sure about needed! it would be helpful for me in this one case, but that doesn't necessarily mean needed [14:57:24] let me check in with traffic folks and see how they usually do what I'm trying to do [15:14:19] hey teammm [16:06:17] 10Analytics, 10Inuka-Team, 10Product-Analytics: Set up pageview counting for KaiOS app - https://phabricator.wikimedia.org/T244547 (10Nuria) This is an eventlogging event that will be persisted on the InukaPageView table on the events database. There is nothing additional that needs doing to do for that to h... [16:19:22] 10Analytics, 10Operations, 10serviceops, 10vm-requests, 10User-Elukey: decom kraz.wikimedia.org - https://phabricator.wikimedia.org/T245279 (10Dzahn) [16:20:49] 10Analytics, 10Operations, 10serviceops: decommission kraz.wikimedia.org - https://phabricator.wikimedia.org/T245279 (10Dzahn) [16:21:12] 10Analytics, 10Operations, 10serviceops: decommission kraz.wikimedia.org - https://phabricator.wikimedia.org/T245279 (10Dzahn) a:05elukey→03Dzahn [16:22:46] 10Analytics, 10Operations, 10serviceops: decommission kraz.wikimedia.org - https://phabricator.wikimedia.org/T245279 (10elukey) As reminder for everybody (to avoid issues): please don't do it until irc2001 is fully set up and working with the new tool. We'll keep kraz working side to side with irc2001 for a... [16:23:05] 10Analytics, 10Operations, 10serviceops, 10vm-requests, 10User-Elukey: Create a replacement for kraz.wikimedia.org - https://phabricator.wikimedia.org/T244719 (10Dzahn) >@MoritzMuehlenhoff wrote: > Given that Luca also had an error during initial setup related to name resolution, this sounds like some er... [16:28:12] 10Analytics, 10Operations, 10serviceops: decommission kraz.wikimedia.org - https://phabricator.wikimedia.org/T245279 (10Dzahn) 05Open→03Stalled Setting to stalled to reflect that. We should change status to Open when it's ready to go. [16:28:19] 10Analytics, 10Operations, 10serviceops, 10vm-requests, 10User-Elukey: Create a replacement for kraz.wikimedia.org - https://phabricator.wikimedia.org/T244719 (10Dzahn) [16:28:44] 10Analytics, 10Operations, 10serviceops: decommission kraz.wikimedia.org - https://phabricator.wikimedia.org/T245279 (10Dzahn) a:05Dzahn→03None [16:36:19] 10Analytics: Set up automatic deletion for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10elukey) 05Resolved→03Open p:05Medium→03High [16:40:48] 10Analytics: Set up automatic deletion for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10elukey) Netflow data is not sanitized or dropped, we forgot to follow up on this task. I had a chat with SRE and ideally they'd like to keep historical/sanitized data. By now a lot of router in our... [16:40:55] mforns: I reopened ==^ [16:42:50] also, we are not dropping raw data either [16:56:37] elukey, thanks! [16:57:06] elukey, I don't know much about all fields contained in netflow data set, not sure how sensitive each one of those is... [16:57:55] mforns: we can pair together on this if you want [16:58:25] I have a good idea about all the fields (more or less), so I can tell you their meaning and you can give me your opinion [16:59:03] ok :] [17:25:20] 10Analytics: Set up automatic deletion/snatization for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10Nuria) [17:27:57] 10Analytics: Setup refinment/sanitization on netflow data similar to how it happens for other event-based data - https://phabricator.wikimedia.org/T245287 (10Nuria) [17:30:12] 10Analytics: Setup refinment/sanitization on netflow data similar to how it happens for other event-based data - https://phabricator.wikimedia.org/T245287 (10Nuria) We would need to: - create a schema for netflow data - post data to eventgate-analytics (similar to how api and cirrusserachrequets work) This wou... [17:47:55] 10Analytics: Set up automatic deletion/snatization for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10Nuria) Per our conversation in standup we will deleting the raw data and planning on switching this peippeline to the eventgate/refine workflow: {T245287} [18:02:35] joal: !!!!!!!!!!! https://issues.apache.org/jira/browse/YARN-5774 !!!!!!!!!! [18:02:38] it works now! [18:02:43] * elukey cries in a corner [18:02:50] 2+ hours on this [18:32:37] 10Analytics, 10Analytics-Cluster, 10Patch-For-Review, 10User-Elukey: Upgrade the Hadoop test cluster to BigTop - https://phabricator.wikimedia.org/T244499 (10elukey) https://issues.apache.org/jira/browse/YARN-5774 was really difficult to find, since all the Yarn job's AM were stuck into ACCEPTED state, lik... [18:41:52] ok hive works, spark shows some errors, but expected [18:42:05] so far it looks promising [18:42:18] I also haven't finalized the HDFS upgrade [18:42:42] what I'd like to do is create new data, with oozie/hive/etc.. , and then make sure it will be present after the rollback [18:42:54] should be possible on paper, good as test [18:43:05] ok logging off, have a good weekend folks! [18:43:07] * elukey off! [19:26:33] pffff elukey - I had reached many bugs, but not that one :( [19:27:13] Have a good weekend elukey - You man rock :) [19:44:57] nuria: heya - are you ok for me to "merge" my changes in the design doc and continue to work in edit mode instead of suggestion? [21:09:30] 10Analytics, 10Operations, 10decommission, 10serviceops: decommission kraz.wikimedia.org - https://phabricator.wikimedia.org/T245279 (10RobH) [21:50:41] 10Analytics, 10Inuka-Team, 10Product-Analytics: Set up pageview counting for KaiOS app - https://phabricator.wikimedia.org/T244547 (10nshahquinn-wmf) @hueitan, @Nuria , sorry for the confusing name—the InukaPageView data stream is meant for a [separate, time-limited analysis](https://docs.google.com/document... [21:53:23] (03CR) 10Milimetric: [C: 04-1] Improve flexibility of media file paths (036 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/571968 (https://phabricator.wikimedia.org/T244712) (owner: 10Fdans) [21:59:25] 10Analytics, 10LDAP-Access-Requests, 10Operations: LDAP access to the wmf group for CherRaye Glenn (superset, turnilo, hue) - https://phabricator.wikimedia.org/T244410 (10Dzahn) [22:19:37] 10Analytics, 10Inuka-Team, 10Product-Analytics: Set up pageview counting for KaiOS app - https://phabricator.wikimedia.org/T244547 (10Nuria) >It seems logical that for consistency, we should set up basic pageview counting the same way the Android and iOS apps do We need to know the urls the application is hi... [22:35:59] 10Analytics: Set up automatic deletion/snatization for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10Nuria) @elukey FYI that the filepath you listed above is now "raw" data but rather already refined data. The wm.netflow table is on top of: hdfs://analytics-hadoop/wmf/data/wmf/netflow... [22:59:31] 10Analytics, 10Inuka-Team, 10Product-Analytics: Set up pageview counting for KaiOS app - https://phabricator.wikimedia.org/T244547 (10nshahquinn-wmf) >>! In T244547#5886251, @Nuria wrote: >>It seems logical that for consistency, we should set up basic pageview counting the same way the Android and iOS apps d...