[03:40:49] 10Analytics, 10Analytics-Kanban: Problems with external referrals? - https://phabricator.wikimedia.org/T195880#4247791 (10JKatzWMF) @Nuria Sorry for the confusion. (context for anyone who didn't see the email exchange: I wrote the email quoted above). I wasn't suggesting the issues with referrer class were rel... [05:50:53] joal: o/ - there seems to be no issue for webrequest-load [05:51:17] the only issue that I can see is unrelated (Druid loading issues for virtualpageview monthly [05:55:25] the virtualpageview failure seems to be [05:55:25] javax.servlet.jsp.el.ELException: variable [target_datasource] cannot be resolved [05:55:50] that is one of the variables of the workflow? [06:39:57] running errand for ~30 min, brb! [06:55:55] Hi elukey [06:59:46] o/ [06:59:57] This error sounds bizarre :( [07:00:04] elukey: Have we dpeloy refinery yesterday? [07:00:54] nope! [07:01:05] elukey: The run was the first of that kind [07:01:07] but I think it is the first time that virtualpageviews-monthly runs? [07:01:11] correct [07:01:33] ah didn't ask sorry! All good today? [07:02:02] elukey: yessir :) [07:02:08] elukey: for you as well? [07:02:18] yep! [07:02:20] great [07:02:24] git up [07:02:27] oops [07:03:46] elukey: I'm gonna retry it - It's very bizarre this failure [07:03:48] pivot nuked from thorium and puppet [07:03:56] elukey: \o/ [07:04:12] elukey: the daily job doesn't fail but is confirmed the same exact way as the monthly one [07:04:26] s/confirmed/configured [07:04:30] weird.. [07:05:09] Rerun virtualpageview-druid-monthly-wf-2018-5 [07:05:15] !log Rerun virtualpageview-druid-monthly-wf-2018-5 [07:05:16] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:14:48] 10Analytics, 10User-Elukey: Upgrade AQS to Debian Stretch - https://phabricator.wikimedia.org/T196138#4247998 (10elukey) p:05Triage>03Normal [07:15:19] 10Analytics, 10Analytics-Kanban: Upgrade Analytics infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T192642#4248009 (10elukey) [07:15:27] elukey: Are we finished with druid->stretch or not yet (I'm suspecting d1004 is still to be done) [07:16:13] nope all done, both clusters have been updated [07:16:17] what do you see on druid1004? [07:16:39] It's just I don't recall :) [07:17:16] seems legit :) [07:17:25] if you see anything weird and/or remember let me know [07:20:05] elukey: As you can see, I have no problem asking ;) [07:27:42] elukey: Job failed again [07:28:02] elukey: the patch I've provided adding datasource to every job will fix that [07:28:12] elukey: I'll deploy early next week, and will rerun that job [07:31:50] ack! [07:37:10] I know that we have the rule of not deploying on a Friday but I am ok if we want to do it [07:37:24] it is a relative light day (hopefully) [07:37:32] but I am also ok to do it on Monday/Tue [07:41:19] (03PS2) 10Joal: Parameterize datasource of druid loading jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/436080 (https://phabricator.wikimedia.org/T195882) [07:41:48] elukey: Let's do it then :) [07:42:34] (03CR) 10Joal: [V: 031] "Thanks nuria" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/436080 (https://phabricator.wikimedia.org/T195882) (owner: 10Joal) [07:43:43] (03CR) 10Joal: [C: 032] Update mediawiki-history stats [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/434987 (https://phabricator.wikimedia.org/T192481) (owner: 10Joal) [07:44:56] nice patch 436080 [07:45:53] (03PS6) 10Joal: Update mediawiki-history stats [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/434987 (https://phabricator.wikimedia.org/T192481) [07:46:23] (03PS7) 10Joal: Update mediawiki-history stats [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/434987 (https://phabricator.wikimedia.org/T192481) [07:46:45] (03CR) 10Joal: [V: 032 C: 032] "Merging for deploy" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/434987 (https://phabricator.wikimedia.org/T192481) (owner: 10Joal) [07:49:44] (03PS1) 10Joal: Update changelog.md for version 0.0.65 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/436732 [07:49:46] elukey: --^ [07:50:40] (03CR) 10Joal: [V: 032 C: 032] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/435169 (owner: 10Joal) [07:51:01] (03CR) 10Joal: [V: 032 C: 032] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/436080 (https://phabricator.wikimedia.org/T195882) (owner: 10Joal) [07:51:55] (03CR) 10Elukey: [C: 031] Update changelog.md for version 0.0.65 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/436732 (owner: 10Joal) [07:52:35] (03CR) 10Joal: [V: 032 C: 032] "Merging for deploy" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/436732 (owner: 10Joal) [07:53:15] !log Releasing refinery-source v0.0.65 to archiva [07:53:16] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:56:07] (03PS1) 10Joal: Bump jar version for mediawiki-history job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/436734 [08:06:33] (03CR) 10Joal: [V: 032 C: 032] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/436734 (owner: 10Joal) [08:08:45] !log Deploying refinery using scap [08:08:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:21:37] joal: is the deploy finished? [08:21:46] not yet [08:21:49] wow [08:22:08] refinery-source + refinery takes some time [08:22:22] Currently scaping after canary [08:22:53] I thought something happened on the canary (like disk space exhaustion etc..) [08:23:34] no no - Just me not following closely, and therefore wating a bit [08:23:42] finished now elukey :) [08:24:02] !log Deploy refinery on HDFS [08:24:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:32:02] Ok elukey - Let's restart everything that needs it [08:32:12] \o/ [08:33:47] !log Restart mediawiki-history-denormalize oozie job after deploy [08:33:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:35:38] after the webrequest druid indexation I'll add the new turnilo config [08:36:04] (and we'll also need to drop the test_ and old webrequest datasources) [08:36:30] elukey: I already dropped the test ones [08:36:40] super :) [08:37:36] !log Restart every druid loading oozie job (except mediawiki reduced) to pick new configuration [08:37:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:39:21] joal: the webrequest one's restart procedure is the one that we discussed some days ago? [08:54:08] elukey: I'll start a new daily oozie job (datasource name has changed) starting 7 days ago, a new hourly one starting today, and when they have caught up, I'll stop the other webrequest one and delete the old datasource [08:54:48] When I'm done with all datasources (virtualpageviews, pageviews, webrequest, one of the uniques), I'll send an email to analytics [08:58:32] super [08:58:39] let me know if I can help [08:59:13] also, if you could send me the snippet that you are using to start the new webrequest oozie job (whenever you have time) I'll save it in my notes :) [09:01:26] 10Analytics, 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, 10Puppet: deployment-eventlog05 puppet error about missing mysql heartbeat.heartbeat table - https://phabricator.wikimedia.org/T191109#4248105 (10elukey) [09:01:52] 10Analytics, 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, 10Puppet: deployment-eventlog05 puppet error about missing mysql heartbeat.heartbeat table - https://phabricator.wikimedia.org/T191109#4093870 (10elukey) Thanks for the report! I added the following to the heartbeat database, and puppet now ru... [09:02:18] fixed! [09:02:24] I assume you mean webrequest-druid, right? [09:03:05] yes yes [09:03:24] I'll do :) [09:03:31] thanks :) [09:14:43] elukey: https://gist.github.com/jobar/88f66d6f47a31e3c23bfeb97212c4a05 [09:15:14] thanks! [09:15:36] mmm I can see two virtualpageviews-hourly in turnilo now [09:20:28] elukey: one has an _, the other a - [09:20:49] elukey: part of the patch was to rename datasources to having _ [09:24:44] ahhh yes the new standard [09:25:01] I'll shut up [09:25:03] :) [09:25:04] elukey: well, new from a long time ago, but actions were not taken :) [09:25:25] elukey: You'll be happier next time you'll try to use sql querying :-P [09:27:41] yep! [09:39:19] https://turnilo.wikimedia.org/#webrequest_sampled_128 \o/ [09:44:15] I am updating turnilo's config [09:44:28] to replace all - with _ [09:46:08] elukey: webrequest has almost catch up [09:48:34] joal: https://gerrit.wikimedia.org/r/#/c/436754/ [09:51:18] elukey: I think you are missing pageviews-daily and pageviews-hourly in this patch :) [09:52:37] elukey: Need to drop, will be back to continue rolling up ozzie jobs in 1/2hours [09:53:04] what do you mean? I renamed them to pageviews_hourly and pageviews_daily no? [09:53:50] too late :) [10:46:45] elukey: Indeed you changed them! Maaaaaan - My memory seems out of order today :( Sorry about that [10:48:35] merging!! [10:52:07] joal: turnilo did something very smart! the duplicate datasource names have now the '1' appended [10:53:44] elukey: we have duplicated datasources? [10:57:03] (03Abandoned) 10Zhuyifei1999: Change database connection & table charset to 'utf8mb4' [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/436576 (owner: 10Zhuyifei1999) [10:57:37] joal: name of datasources [10:58:38] mmm but the old ones shouldn't appear at all [10:58:49] ok turnilo is probably a bit confused [10:58:51] lemme check [11:00:46] or I am stupid, I bet on it [11:01:44] yes I am [11:03:39] fixed it, sending the code review joal [11:05:01] hm ... I must be stupid as well - I don't get what's wrong :) [11:06:41] joal: https://gerrit.wikimedia.org/r/#/c/436776/1/modules/turnilo/templates/config.yaml.erb [11:08:17] you were indeed right about pageviews :P [11:09:21] puppet fixed! [11:24:39] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add maxmind ip info to webrequest dataset on druid - https://phabricator.wikimedia.org/T194055#4187092 (10elukey) New datasource and fields available in Turnilo now (thanks to Joseph!) [11:25:05] brb [12:35:21] 10Analytics, 10Analytics-Kanban: Review Burrow alarms in order to avoid false positives when restarting it - https://phabricator.wikimedia.org/T196158#4248549 (10elukey) [12:54:38] one more peer review to go! [13:01:26] 10Analytics, 10EventBus, 10MediaWiki-JobQueue, 10MediaWiki-extensions-ORES, and 4 others: ORESFetchScoreJob fails quite a lot - https://phabricator.wikimedia.org/T196076#4248602 (10Ladsgroup) [13:02:46] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#4248605 (10Vgutierrez) it would be nice to be able to use X25519 curve here, OpenSSL provides support for X25519 since version 1.1.0. Regarding... [13:11:51] 10Analytics, 10Analytics-Kanban: Review Burrow alarms in order to avoid false positives when restarting it - https://phabricator.wikimedia.org/T196158#4248624 (10elukey) [13:30:29] 10Analytics, 10EventBus, 10ORES, 10Scoring-platform-team, and 3 others: Numeric keys in ORES models causing downstream Hive ingestion to fail - https://phabricator.wikimedia.org/T195979#4248660 (10Ladsgroup) Pinging @Tpt since he was involved in the [[https://github.com/wiki-ai/articlequality/pull/34 | dis... [13:33:19] 10Quarry, 10I18n: Quarry cannot save queries with emojies - https://phabricator.wikimedia.org/T196153#4248669 (10Aklapper) [13:53:50] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#4248709 (10Ottomata) Hm, ya, sounds like a way off before we get that in Debian then, ya? Is that something that would block removal of IPSec? [14:01:10] heya teaaam [14:03:43] o/ [14:23:39] yoohooo [14:25:10] o/ [14:43:23] heeey [15:14:16] 10Quarry, 10I18n: Quarry cannot save queries with emojies - https://phabricator.wikimedia.org/T196153#4249013 (10zhuyifei1999) [15:18:04] 10Analytics, 10EventBus, 10MediaWiki-JobQueue, 10MediaWiki-extensions-ORES, and 4 others: ORESFetchScoreJob fails quite a lot - https://phabricator.wikimedia.org/T196076#4249020 (10mobrovac) p:05Triage>03Normal [15:20:32] 10Analytics, 10EventBus, 10Services (watching): EventBus service can drop a few messages during kafka leadership change - https://phabricator.wikimedia.org/T196077#4246078 (10mobrovac) I think we should raise the priority of this issue. Not being able to deliver messages (and not retrying to do so) should be... [15:47:24] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Patch-For-Review, 10Services (watching): Enable multiple topics in EventStreams URL - https://phabricator.wikimedia.org/T187418#3974509 (10Ottomata) a:03Ottomata [16:29:35] * elukey off! [16:30:34] bye elukey :] [16:37:15] (03PS1) 10Joal: Add oozie jobs loading druid daily uniques monthly [analytics/refinery] - 10https://gerrit.wikimedia.org/r/436826 [16:47:35] (03PS2) 10Joal: Add oozie jobs loading druid daily uniques monthly [analytics/refinery] - 10https://gerrit.wikimedia.org/r/436826 [16:50:09] 10Analytics, 10EventBus, 10Services (watching): EventBus service can drop a few messages during kafka leadership change - https://phabricator.wikimedia.org/T196077#4249188 (10Ottomata) Agree! Ah the 'priority' here was kind of set accidentally to low. Phab will change the priority sometimes when we move it... [16:50:19] 10Analytics, 10EventBus, 10Services (watching): EventBus service can drop a few messages during kafka leadership change - https://phabricator.wikimedia.org/T196077#4249189 (10Ottomata) p:05Low>03Normal [16:54:40] 10Analytics, 10Analytics-Cluster, 10Patch-For-Review: Move EventStreams to new jumbo cluster. - https://phabricator.wikimedia.org/T185225#4249202 (10Ottomata) I think concurrent connections would be a good enough start. Would that be easy enough? If so, I'd postpone the EventStreams switch to jumbo until i... [17:17:05] 10Analytics, 10Android-app-Bugs, 10Wikipedia-Android-App-Backlog: Count link previews on the Android app - https://phabricator.wikimedia.org/T194961#4249286 (10Tbayer) >>! In T194961#4221056, @mpopov wrote: >>>! In T194961#4221017, @LGoto wrote: >> Hi @mpopov Is this for you? > > Analytics Engineering as fa... [18:01:39] (03PS3) 10Joal: Add oozie jobs loading druid daily uniques monthly [analytics/refinery] - 10https://gerrit.wikimedia.org/r/436826 [18:04:54] 10Analytics, 10Operations, 10hardware-requests: Site: eqiad | hardware request for a dedicated stat analytics host for the Research team - https://phabricator.wikimedia.org/T196080#4249469 (10RobH) a:03elukey We CANNOT move the GPU between hosts. It is in that chassis (stat1005), specifically ordered to h... [18:23:33] (03CR) 10Joal: [V: 031] "Tested on cluster" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/436826 (owner: 10Joal) [18:51:44] 10Analytics, 10Analytics-Kanban: Archive old geowiki data (editors per country) and make it easily available at WMF - https://phabricator.wikimedia.org/T190856#4249554 (10fdans) OK, the following tables have been created: **In hdfs:** `/wmf/data/archive/geowiki/geowiki_archive_active_editors_world` `/wmf/data... [18:58:01] 10Analytics, 10Operations, 10hardware-requests: Site: eqiad | hardware request for a dedicated stat analytics host for the Research team - https://phabricator.wikimedia.org/T196080#4249584 (10Ottomata) That is not a bad idea. Although moving folks between stat boxes is not the easiest thing to do... :) [19:12:56] 10Analytics, 10EventBus, 10ORES, 10Scoring-platform-team, and 3 others: Numeric keys in ORES models causing downstream Hive ingestion to fail - https://phabricator.wikimedia.org/T195979#4249618 (10Tpt) > Do you think it's okay to change the keys for ores responses from "1" to "one". The key are the "page... [21:40:47] 10Analytics, 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Puppet: deployment-eventlog05 puppet error about missing mysql heartbeat.heartbeat table - https://phabricator.wikimedia.org/T191109#4249907 (10EddieGP) 05Open>03Resolved [21:46:39] (03PS1) 10Krinkle: scap: Enable require_valid_service [analytics/statsv] - 10https://gerrit.wikimedia.org/r/436920 (https://phabricator.wikimedia.org/T195314) [21:49:23] (03CR) 10Krinkle: [C: 032] scap: Enable require_valid_service [analytics/statsv] - 10https://gerrit.wikimedia.org/r/436920 (https://phabricator.wikimedia.org/T195314) (owner: 10Krinkle) [21:50:50] (03CR) 10Krinkle: [C: 032] "Repo has no tests :(" [analytics/statsv] - 10https://gerrit.wikimedia.org/r/436920 (https://phabricator.wikimedia.org/T195314) (owner: 10Krinkle) [21:50:57] (03CR) 10Krinkle: [V: 032 C: 032] scap: Enable require_valid_service [analytics/statsv] - 10https://gerrit.wikimedia.org/r/436920 (https://phabricator.wikimedia.org/T195314) (owner: 10Krinkle) [22:46:11] hi all. i'm trying to use hue.wikimedia.org, and i'm not succeeding in logging in with my Wikitech login. is there a different login i should request or use? [23:52:23] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Access to usergroups for Marshall Miller - https://phabricator.wikimedia.org/T194550#4250064 (10MMiller_WMF) @herron @Ottomata -- I would like to use hue.wikimedia.org to query Hadoop, but it looks like my Wikitech login doesn't get me i...