[02:38:36] (03PS1) 10GoranSMilovanovic: 09/05/2016 commit to Gerrit from current WDCM official [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380427 [02:38:39] (03PS1) 10GoranSMilovanovic: WDCM Usage Dashboard [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380428 [02:41:33] (03CR) 10GoranSMilovanovic: [V: 032 C: 032] WDCM Usage Dashboard [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380428 (owner: 10GoranSMilovanovic) [02:42:28] (03CR) 10GoranSMilovanovic: [V: 032 C: 032] 09/05/2016 commit to Gerrit from current WDCM official [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380427 (owner: 10GoranSMilovanovic) [02:42:34] (03Merged) 10jenkins-bot: 09/05/2016 commit to Gerrit from current WDCM official [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380427 (owner: 10GoranSMilovanovic) [02:42:36] (03Merged) 10jenkins-bot: WDCM Usage Dashboard [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380428 (owner: 10GoranSMilovanovic) [07:24:11] ok to install apache updates on bohrium now or is it a bad time? [07:25:46] +1 [07:30:50] done [07:34:59] Heya team [07:35:46] o/ [08:05:30] (03Abandoned) 10Hashar: Update banner with prototype consultation [analytics/wikistats] - 10https://gerrit.wikimedia.org/r/349217 (owner: 10Milimetric) [08:29:35] elukey: I have noticed a really weird thin in mediawiki_history table [08:31:30] (03PS1) 10Joal: Correct typo in mobile_apps_sessions oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/380446 [08:34:34] * elukey waits for the weird thing :) [08:37:24] elukey: I correct the wmf.mediawiki_history hive table on Aug. 22: https://tools.wmflabs.org/sal/log/AV4LgNM2wg13V6285cYt [08:37:42] Or so Did I think - because it seems to have revetted back :( [08:37:50] I guess I have dreamt doing it [08:37:57] Mwarf [08:38:12] logging my dreams is really not something I should do [08:38:51] hahahaha [08:41:15] !log Rerun mobile_apps-session_metrics-wf- 2017-9-17 after failure [08:41:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:43:21] hi joal [08:43:32] Hi dsaez [08:43:49] 10Analytics-Kanban: Correct typo in oozie mobile_apps_session - https://phabricator.wikimedia.org/T176599#3630855 (10JAllemandou) [08:44:00] 10Analytics-Kanban: Correct typo in oozie mobile_apps_session - https://phabricator.wikimedia.org/T176599#3630866 (10JAllemandou) a:03JAllemandou [08:44:06] today is miriam_ first day, she is our new research scientist [08:44:36] Hi miriam__ :) [08:44:37] (03PS2) 10Joal: Correct typo in mobile_apps_sessions oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/380446 (https://phabricator.wikimedia.org/T176599) [08:44:44] Welcome in analytics world :) [08:45:10] Hello, nice to meet you all. Happy to be onboard!! [08:45:23] miriam_ is in London, so one more in our timezone ;) (almost) [08:45:46] miriam__: o/ [08:46:28] miriam_, we will need to do more introductions in the evening when USA folks joins :) [08:46:52] This is the quiet time of the day :) [08:47:19] 10Analytics-Kanban: Add "PhantomJS" to the list of bots in webrequest definition. - https://phabricator.wikimedia.org/T175707#3630869 (10JAllemandou) a:03JAllemandou [08:50:51] (03PS1) 10Joal: Add PhantomJS to the bot_flagging regex [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/380447 (https://phabricator.wikimedia.org/T175707) [08:52:20] 10Analytics-Kanban: Correct (AGAIN ??!!) mediawiki_history cumulative count names - https://phabricator.wikimedia.org/T176600#3630877 (10JAllemandou) [08:52:30] 10Analytics-Kanban: Correct (AGAIN ??!!) mediawiki_history cumulative count names - https://phabricator.wikimedia.org/T176600#3630889 (10JAllemandou) a:03JAllemandou [08:53:33] 10Analytics-Kanban, 10Patch-For-Review: Add mediawiki-history metrics to AQS - https://phabricator.wikimedia.org/T175805#3630894 (10JAllemandou) [08:53:36] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Add edits endpoint to AQS using druid as a backend - https://phabricator.wikimedia.org/T174174#3630896 (10JAllemandou) [08:58:36] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Use Prometheus for Kafka JMX metrics instead of jmxtrans - https://phabricator.wikimedia.org/T175922#3630917 (10fgiunchedi) >>! In T175922#3623734, @elukey wrote: >>>! In T175922#3622156, @Ottomata wrote: >> HM, why are we making a... [09:12:32] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Use Prometheus for Kafka JMX metrics instead of jmxtrans - https://phabricator.wikimedia.org/T175922#3630931 (10fgiunchedi) >>! In T175922#3626668, @elukey wrote: > ``` > # elukey@kafka-jumbo1001:~$ curl http://10.64.0.175:7800/met... [09:13:11] 10Analytics-Kanban: Add monthly unique devices dataset to Druid - https://phabricator.wikimedia.org/T163327#3630932 (10JAllemandou) a:03JAllemandou [09:13:44] (03PS2) 10Joal: Add oozie jobs loading druid monthly uniques [analytics/refinery] - 10https://gerrit.wikimedia.org/r/348052 (https://phabricator.wikimedia.org/T159471) [09:25:44] (03PS3) 10Joal: Add oozie jobs loading druid monthly uniques [analytics/refinery] - 10https://gerrit.wikimedia.org/r/348052 (https://phabricator.wikimedia.org/T159471) [09:57:31] joal: just created https://gerrit.wikimedia.org/r/#/c/380449/4 in puppet to allow a better firewall configuration for druid [09:57:48] so it should be easy to open the broker port when needed [09:57:59] elukey: Thanks mate :) [09:58:00] and the proxy patch is ready [09:58:12] so now it is only a matter of waiting Andrew and decide what to do [09:59:09] elukey: ok great - On my side I'm updating AQS-CR and restbase-PR based on comments - hopefully we'll have something ready laster on tonigh [10:54:36] * elukey lunch! [11:20:23] (03CR) 10Joal: "@Mforns - See replies in comment, most of them are applied :) Thanks for your review!" (0325 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805) (owner: 10Joal) [11:37:23] helloooo [11:38:49] o/ [11:42:30] hm - We have an issue in mobile-apps job :( [11:46:14] (03PS1) 10Addshore: instanceof.php extra logging on sparql result [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/380490 (https://phabricator.wikimedia.org/T176577) [11:46:28] (03PS1) 10Addshore: instanceof.php extra logging on sparql result [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/380491 (https://phabricator.wikimedia.org/T176577) [11:46:33] It happens that there null timestamps in our webrequest logs - This is really unexpected I think !!! [11:47:05] joal, hmmmm [11:47:07] (03CR) 10Addshore: [C: 032] instanceof.php extra logging on sparql result [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/380491 (https://phabricator.wikimedia.org/T176577) (owner: 10Addshore) [11:47:09] (03CR) 10Addshore: [C: 032] instanceof.php extra logging on sparql result [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/380490 (https://phabricator.wikimedia.org/T176577) (owner: 10Addshore) [11:47:15] (03Merged) 10jenkins-bot: instanceof.php extra logging on sparql result [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/380491 (https://phabricator.wikimedia.org/T176577) (owner: 10Addshore) [11:47:18] (03Merged) 10jenkins-bot: instanceof.php extra logging on sparql result [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/380490 (https://phabricator.wikimedia.org/T176577) (owner: 10Addshore) [11:47:40] mforns: I have a patch to mitigate in the job, but I think before applying tha we should double check data quality [11:47:49] joal, yea [11:48:10] omw to look [11:50:53] joal: null timestamps?? [11:51:01] looks like so yeah [11:51:04] weird, hu [11:51:41] is it raw data coming from varnishkafka? [11:51:47] yes [11:57:41] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Audit users and account expiry dates for stat boxes - https://phabricator.wikimedia.org/T170878#3631404 (10elukey) [11:59:51] joal: any specific timeframe? [12:00:00] elukey: last 7 days :( [12:00:09] all of them??? [12:00:24] somewhere (over the rainbow?) [12:01:17] ?? [12:01:43] The job that has failed uses past 7 days of data to compute, so I don' where on those 7 days the error can be [12:03:11] ahhhh okok [12:03:20] so it fails when it encounters a null timestamp [12:03:29] correc [12:03:29] I was a bit worried in the beginning :D [12:03:31] today at 10h utc doesn't have any dt=null [12:03:54] to be precise, it fails when it encouters a null: CAST(ts AS int) [12:04:07] aha [12:06:45] ok, for the same hour all cast(dt as int) are null [12:07:36] mforns: ts, not dt [12:09:01] oh [12:09:42] but are we looking into wmf_raw.webrequest? [12:09:52] it has no ts [12:10:02] mforns: no, the job uses wmf/webrequest [12:10:08] ok ok [12:11:42] ok, 531 are null on that hour [12:12:01] mforns: Wow [12:12:41] (03CR) 10Joal: "@milimetric: Comments on comments, most of them addressed, Thanks for the review!" (0318 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805) (owner: 10Joal) [12:13:42] (03PS7) 10Joal: Add mediawiki-history-metrics endpoints [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805) [12:16:59] milimetric, elukey - Can I take a break an let you investigate on that ts error stuff? [12:17:51] you mean mforns right :] [12:17:56] yes, fine by me [12:18:03] yes mforns - sorry [12:18:10] ok, breaking then [12:18:14] * joal breaks [12:19:32] elukey, here's a result from those weird records: https://pastebin.com/eBfkbgFA [12:21:39] it's strange, almost all results come from requests to PDFs [12:22:05] but, there's also a couple to wikipedia landing pages, like: ar.wikipedia.org [12:22:41] mforns: it would help trying to approximate the dt, maybe checking the prev/next sequence numbers for a specific host [12:22:52] aha [12:22:56] last week we had huge issues with Varnish melting down in some use cases [12:23:02] ic [12:23:12] not sure if related or not [12:28:19] elukey, the dt immediately before a corrupt record is: 2017-09-25T10:26:12 [12:28:25] not around the end of hour at all [12:29:21] but wait! it's curious, within consecutive 20 webrequests I found 2 corrupted ones [12:29:33] let me expand the query [12:39:14] (in a meeting, will read in a bit) [12:54:56] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Audit users and account expiry dates for stat boxes - https://phabricator.wikimedia.org/T170878#3631676 (10GoranSMilovanovic) Hi, I need continuing access to the stat boxes beyond the set expiration date (2018-01-01). I will be wor... [13:35:02] hey all [13:53:08] mforns: holaaaa, I addressed all the stuff you put in the line graph review [13:53:17] mforns: thank you for that :D [13:56:07] mforns: the 2017-09-25T10:26:12 is related to which cp host? [14:04:15] also it seems all pdf/electron related [14:04:31] I don't see landing pages (ar.w.o seems to be pdf related too but I could be wrong) [14:04:54] now if those requests uses for some reason a protocol different than plain HTTP varnish might not add the timestamps [14:05:02] it happens in the past [14:05:07] for example with websockets [14:08:55] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Use Prometheus for Kafka JMX metrics instead of jmxtrans - https://phabricator.wikimedia.org/T175922#3632014 (10elukey) Tested the patch in labs: ``` kafka_server_delayedoperationpurgatory_numdelayedoperations{delayedoperation="De... [14:09:25] hiiiiii :) [14:09:38] elukey: i'm sure you've looked at this, but why do the metrics get all squished like that? [14:09:48] no separator between delayedoperation? [14:10:09] there is any problem in the stat1005? it is super slow [14:10:10] o/ [14:10:24] I think because they are keys in the mbeans [14:10:32] like delayedOperation=somethingcool [14:10:44] and then the exporter lowercase them [14:11:09] for example, directly from jconsole: afka.server:type=DelayedOperationPurgatory,name=NumDelayedOperations,delayedOperation=Fetch [14:11:13] aeye hmm, wasn't there some option to mamke it snake case them? [14:11:50] I think so, wondering if prometheus' best practices allow it? [14:12:30] ah yes there are two options set to lowercase everything [14:14:19] (03CR) 10Ottomata: [V: 032 C: 032] Correct typo in mobile_apps_sessions oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/380446 (https://phabricator.wikimedia.org/T176599) (owner: 10Joal) [14:15:05] elukey, cp3031.esams.wmnet [14:15:21] (03CR) 10Ottomata: [C: 031] Add PhantomJS to the bot_flagging regex [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/380447 (https://phabricator.wikimedia.org/T175707) (owner: 10Joal) [14:16:48] elukey, but it's distributed across all hostnames [14:17:21] some more, others less, but it does not seem a single host problem [14:17:57] hi fdans! np, will look at the changes thank you! [14:18:07] ottomata: ah I just saw the attrNameSnakeCase, going to test it [14:19:39] mforns: I don't see any errors in varnishkafka for sudo journalctl -u varnishkafka-webrequest --since '1 day ago' [14:19:44] (cp3031) [14:19:47] elukey, the ts=NULLs seem to be grouped in bursts that affect sequence numbers that are close to each other, but not necessarily consecutive [14:20:32] hmmm [14:25:09] ah snap attrNameSnakeCase is only for the attribures [14:25:13] *attributes [14:25:14] not the keys [14:25:16] grrrr [14:25:18] * elukey hates mbeans [14:27:09] mforns: so I am running elukey@cp3031:~$ sudo varnishlog -n frontend -g request -q 'not Timestamp:Resp and not HttpGarbage' [14:27:38] and I can see some errors, ending up in VSL timeout [14:27:40] dsaez: yah it looks like flemmerich is doing some heavy stuff with a jupyter notebook on stat1005? [14:27:43] not sure [14:27:46] elukey, I thjink it's a refining problem... [14:27:53] let me confirm [14:28:00] ReqURL /api/rest_v1/page/pdf/Raman_scattering [14:28:37] there's lots of requests to PDFs but also other requests [14:28:51] i.e. to wikipedia landing pages like: ar.wikipedia.org [14:29:09] hmmmm, no, not a refine problem [14:29:28] the corresponding records in the wmf_raw.webrequest table have '-' in their dt field [14:30:50] mforns: do you see landing pages not pdf related? [14:30:56] yes [14:30:56] in pastebin ar.w.o is pdf related [14:31:03] really? [14:31:04] ok [14:31:24] I mean please re-check since I am not the most trustable person in the world checking logs :D [14:31:54] ottomata: ok, could be, I also saw a heavy R [14:32:08] aye [14:32:22] dsaez: if stat1005 is busy, you can also try stat1004 [14:32:29] elukey, there's also an image /wikipedia/commons/thumb/a/aa/RO_MS_Biserica_reformata_din_Gornesti_%2810%29.jpg/120px-RO_MS_Biserica_reformata_din_Gornesti_%2810%29.jpg [14:32:45] elukey, but yea, the rest are pdfs! [14:34:08] ottomata: yep, but home is not shared, and also there is no sshfs [14:34:25] dsaez: there is rsync [14:34:32] mforns: so IIRC we have 1500s of maximum time to from Start-Timestamp to End-Timestamp, otherwise varnishkafka places a nice dt: - and issues a VLS: timeout error [14:34:45] (03CR) 10Milimetric: "replies to your questions" (035 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805) (owner: 10Joal) [14:35:08] elukey, aha, and is that VLS timeout error visible in the webrequest record? [14:35:13] ok! [14:35:17] dsaez: https://wikitech.wikimedia.org/wiki/Analytics/FAQ#How_do_I_transfer_files_between_stat1005_and_stat1006.3F [14:35:21] stat1004 works too [14:35:22] and [14:35:31] lemme verify that... [14:35:31] mforns: nope we don't log it :( [14:35:37] AH [14:35:39] it does not [14:35:40] but it should [14:35:42] fixing... [14:36:06] ottomata: the alternative is kafka_server_DelayedOperationPurgatory_NumDelayedOperations{delayedoperation="DeleteRecords",} 0.0 [14:36:09] but looks horrible :D :D :D [14:36:29] elukey, this is the list of uri_paths with NULL in the 10th hour of today UTC: [14:36:35] https://pastebin.com/5N0F7XiP [14:36:37] ottomata: ok, thanks [14:37:01] elukey: i think that looks better [14:37:07] than all lower case [14:37:24] I respectfully disagree :P [14:37:42] dsaez: actually [14:37:46] you can rsync, but you have to pull from stat1004 [14:37:47] so [14:37:56] rsync from stat1004 [14:38:03] but like it says in the wikitech link [14:38:13] also, ::home is an available rsync module [14:38:14] so you can do [14:38:44] rsync -av stat1005.eqiad.wmnet::home/dsaez/path/to/whatever/ /home/dsaez/path/to/whatever/ [14:38:46] from stat1004 [14:39:16] elukey, 497 out of 531 errors are calls to the api for PDFs [14:39:36] elukey: but with lower case you can't tell where the words end and begin [14:39:59] elukey, 23/531 are thumbnail requests [14:40:40] ottomata: I find it clear to read, but let's sync with Filippo about naming conventions ok? [14:41:24] mforns: I have no idea, investigating it now :( [14:41:45] elukey, 11 are api requests for html [14:42:04] elukey: aye [14:42:08] doesn't that match the bean names then? [14:42:40] ottomata: sorry not following [14:43:44] i assume the jmx bean names are uppercased like that? [14:43:57] ah yes sorry [14:44:06] those are keys in the mbeans yes [14:46:00] elukey: i see words like 'opera' 'lay' 'deleter', 'layed' [14:46:09] in lower case [14:49:07] elukey: i would probably be fine with using something other than 'cluster' for kafka jumbo [14:49:10] but why 'job'? [14:50:37] ottomata: it was confusing for me too, a job is basically a label that you set when you configure the metric pollers on the prometheus master instances [14:51:14] for example, you can configure an analytics prometheus instance that polls all the hadoop hosts, adding to them the job hadoop [14:51:21] same thing for druid, etc.. [14:51:38] so Filippo's proposal is to avoid adding cluster-related tags on the host's agent [14:51:59] but let the prometheus master pollers to tag them (via separate "jobs") [14:55:53] hmm, so we'd have a kafka-jumbo job? [14:56:02] on the master [14:56:08] but no special labels on the instances? [14:56:16] exactly [14:56:24] what about multiple kafka clusters? [14:56:32] what about metrics from different services on the same nodes? [14:56:37] e.g. if we had mirror maker on a kafka cluster [14:56:50] 10Analytics-Kanban: Replace references to dbstore1002 by db1047 in reportupdater jobs - https://phabricator.wikimedia.org/T176639#3632150 (10mforns) [14:56:58] you'd have separate agents on the host + jobs on the master [14:58:07] mforns: I am finding occurrences of the issue, but it looks really weird, since with -T 1500 in varnishlog I can't see the timeout error, only the absence of end timestamp.. This has already happened in the past and it is horrible to debug in Varnish :) [14:58:21] hmmm [14:59:57] mforns: I'll try to explain in post-standup if you have patience [15:00:05] ok [15:01:29] elukey: standup? [15:01:45] milimetric: comingggg [15:43:43] (03PS1) 10Joal: Correct mobile-apps-sessions spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/380533 [15:54:32] 10Analytics-Kanban: Replace references to dbstore1002 by db1047 in reportupdater jobs - https://phabricator.wikimedia.org/T176639#3632407 (10mforns) x1-analytics-slave.eqiad.wmnet points to dbstore1002, so should be replaced as well! [16:04:19] elukey: in case you missed it, it makes sense to keep dchen and dduvall accounts [16:04:28] so i think you can check those boxes and move that task to done [16:08:18] super [16:12:49] milimetric: Any chance you could take a look at https://gerrit.wikimedia.org/r/#/c/379441/ today, or perhaps tomorrow? Would be great to get that moving. [16:13:38] Nettrom: sure, will do, may have to wait until tomorrow [16:15:14] milimetric: Sounds good, thanks! [16:16:18] (03CR) 10Joal: "Thanks @milimetric :)" (032 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805) (owner: 10Joal) [16:16:25] (03PS8) 10Joal: Add mediawiki-history-metrics endpoints [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805) [16:16:35] mforns: --^ this one should be better :) [16:16:50] joal, ok, will review [16:17:18] mforns: should I review too or just focus on the front end stuff? [16:17:56] milimetric, you already did 2 reviews, I can do this one [16:50:41] (03PS6) 10Joal: Update mediawiki-history-reduced oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/379000 (https://phabricator.wikimedia.org/T174174) [16:51:03] Hi a-team, leaving for now - There are a few CR I'll merge tomorrow morning and then deploy if you agree) :) [16:51:21] joal, CRing your code right now [16:51:30] not actually crying though [16:51:40] the code is good :D [16:51:47] mforns: Thanks man - Was thinking of small CRs I pushed today :) [16:51:54] oh ok [16:51:55] The AQS code can wai [16:51:59] oooooh [16:52:11] k [16:52:15] will do the others as erll [16:52:18] well [16:52:22] Thanks as well then :) [17:05:24] 10Analytics-EventLogging, 10Analytics-Kanban, 10Page-Previews, 10Readers-Web-Backlog, and 5 others: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3632663 (10phuedx) Let's leave this open until Thursday, 28th so that we can verify th... [17:09:06] (03CR) 10Mforns: "LGTM! A couple comments, but they are more questions than observations. Awesome work :D" (039 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805) (owner: 10Joal) [17:25:54] are you guys ok with the failing jobs due to the dt:null issue? [17:26:20] I didn't understand from standup if this is preventing to relaunch jobs or if it is only annoying [17:26:33] I don't have a good answer now, need to spend a bit of time tomorrow [17:29:19] I am going to check later on, lemme know if this is super urgent [17:29:22] * elukey off! [17:29:57] 10Analytics-Kanban: Replace references to dbstore1002 by db1047 in reportupdater jobs - https://phabricator.wikimedia.org/T176639#3632150 (10mforns) @elukey What about having another cname to analytics-slave that does not have the word slave in it? [17:31:54] elukey, joal said he was going to deploy fixes tomorrow morning, so I guess troubleshooting would be more for the sake of correctness, not tier1-urgent [17:33:36] 10Analytics-Kanban: Replace references to dbstore1002 by db1047 in reportupdater jobs - https://phabricator.wikimedia.org/T176639#3632795 (10elukey) It would be great not to create more alias/cnames since if not used they will be forgotten (like it happened to `x1-analytics-slave.eqiad.wmnet` recently). Is there... [17:33:54] mforns: ack thanks! [17:48:52] 10Analytics-Kanban: Replace references to dbstore1002 by db1047 in reportupdater jobs - https://phabricator.wikimedia.org/T176639#3632819 (10mpopov) >>! In T176639#3632795, @elukey wrote: > Is there a valid reason why `analytics-slave` shouldn't be used? Are we talking about eventlogging-queries? It's an old a... [17:56:34] (03CR) 10Mforns: [C: 032] Correct mobile-apps-sessions spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/380533 (owner: 10Joal) [18:00:37] (03CR) 10Mforns: [C: 032] Add PhantomJS to the bot_flagging regex [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/380447 (https://phabricator.wikimedia.org/T175707) (owner: 10Joal) [18:01:28] (03Merged) 10jenkins-bot: Correct mobile-apps-sessions spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/380533 (owner: 10Joal) [18:04:50] (03Merged) 10jenkins-bot: Add PhantomJS to the bot_flagging regex [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/380447 (https://phabricator.wikimedia.org/T175707) (owner: 10Joal) [18:09:39] eh... a-team I'm gonna fix that unexpected whitelist values oozie thing, it's annoying me [18:09:50] hey milimetric, ok [18:11:08] Heya mforns - Thanks for the quick answer to Luca and for the reviews :) [18:11:22] hey joal np [18:16:07] (03PS9) 10Joal: Add mediawiki-history-metrics endpoints [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805) [18:18:50] hm, hi.wikivoyage was already in the tsv but not uploaded. Gotta make sure to put that up on hdfs whenever it's changed, wonder if we should just auto-sync it [18:20:22] milimetric: It has just not been deployed - Will do so tomoroww [18:20:30] nono, i synced it [18:20:38] milimetric: ? [18:20:48] it doesn't need to be deployed, it's just the tsv file [18:21:14] maybe there's some confusion about this, though [18:21:20] Right, that is referenced in /wmf/refinery/current - Therefore subject to deploy [18:21:24] :) [18:21:27] heading to a cafe! bbib [18:21:35] Going for diner - later lads [18:21:57] joal, right, but the way we've always done it is sync the latest version separate from deploys so we stop getting the alerts [18:26:20] Are iPython notebooks still a current recommended way to munge and graph data from the cluster? I remember a while ago hearing something about Jupyter being available and more current? [19:19:15] (03CR) 10Joal: "Thanks mforns for the review." (035 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/379000 (https://phabricator.wikimedia.org/T174174) (owner: 10Joal) [19:20:09] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10User-Elukey: thorium - failed git clone of geowiki-data-private - https://phabricator.wikimedia.org/T171923#3633057 (10Ottomata) Can/should we take this out of Kanban? [19:54:46] (03CR) 10Mforns: Update mediawiki-history-reduced oozie job (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/379000 (https://phabricator.wikimedia.org/T174174) (owner: 10Joal) [19:55:45] 10Analytics, 10Analytics-Cluster, 10Operations, 10Research-management: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#3633236 (10dr0ptp4kt) Following up on this, I arranged some time with @Ottomata to take a look into this. [21:29:31] (03PS1) 10GoranSMilovanovic: WDCM Usage Dashboard - Crosstabs [analytics/wmde/WDCM] - 10https://gerrit.wikimedia.org/r/380652