[00:01:28] Analytics, Wikimedia-Stream, service-runner, Services (watching): Support node cluster sticky-session in service-runner - https://phabricator.wikimedia.org/T145805#2711497 (GWicke) p:Normal>Lowest I would really prefer to avoid messing with sticky sessions & related issues. If we do end u... [00:04:54] Analytics, ChangeProp, EventBus, Wikimedia-Stream, Services (watching): Write node-rdkafka event.stats callback that reports stats to statsd - https://phabricator.wikimedia.org/T145099#2711508 (Pchelolo) [00:05:06] Analytics-Kanban, EventBus, Wikimedia-Stream, Services (watching): Kasocki Prototype - https://phabricator.wikimedia.org/T145095#2711509 (Pchelolo) [06:45:35] goood morning! [06:45:45] sigh I am reading oozie emails [06:46:13] something must have gone really wrong [06:51:20] so first of all I am going to relaunch the oozie jobs [06:56:32] !log launched 0029278-160922102909979-oozie-oozi-C to re-run webrequest-load-check_sequence_statistics-wf-upload-2016-10-13-2 with higher error threshold [06:57:30] !log launched 0029282-160922102909979-oozie-oozi-C to re-run webrequest-load-check_sequence_statistics-wf-upload-2016-10-13-5 with higher error threshold [07:07:54] going to put notes in https://etherpad.wikimedia.org/p/analytics-oozie-13102016 [07:20:56] so I don't see a particular datacenter affected [07:23:12] elukey: Thanks for taking care of that :) [07:24:07] joal: hola! See after a while your patience is giving back something :D [07:24:14] huhuhu :) [07:24:37] elukey: You know I'd have done it and explain it without problem :) [07:24:49] elukey: I don't blame people for understand faster than I do ;) [07:29:55] yes I know but you told me the same things about Hadoop probably 10 times before me getting it :P [07:30:28] elukey: I think I had to work on it for more than 10 weeks before getting as well, no bother :) [07:42:24] :) [07:42:53] so I am thinking that we are seeing again a timeout issue [07:42:59] (VSL timeout) [07:43:14] hm [07:43:22] I reported the issue in #traffic and asked the permission to start tmux sessions with varnishlog on most of the upload hosts [07:43:31] k [07:43:34] with the same timeout settings that we use [07:44:33] originally the issue seemed to be related to 503s and varnish misbehaving in upload, but a lot of progress has been made [07:44:44] for example the caching hosts are restarted weekly, not daily [07:46:21] elukey: leaving, will be back in 1 hour or so [07:47:49] o/ [08:39:11] FYI I just restarted aqs on aqs1004 to pick up the new node package [08:39:14] all good for the moment [08:39:20] moritzm: --^ [08:39:29] (I did a depool, restart, re-pool) [08:40:00] will do aqs100[56] in a bit [08:43:10] ok, nice! [09:12:11] (Abandoned) Mforns: [WIP] Add data cube title to config [analytics/pivot] - https://gerrit.wikimedia.org/r/315315 (https://phabricator.wikimedia.org/T138262) (owner: Mforns) [09:13:52] mforns: :( [09:13:58] hey elukey ! [09:14:02] sorry if you spent time on it for nothing :( [09:14:05] o/ [09:14:09] xD no no it was 5 minutes [09:14:44] :D ok feeling a bit better [09:14:49] and the config file is way better [10:41:05] Analytics-Dashiki, Continuous-Integration-Config: Add CI job for Dashiki | jQuery version mismatch - https://phabricator.wikimedia.org/T148019#2712346 (hashar) [10:42:04] Analytics-Dashiki, Continuous-Integration-Config: Dashiki bower has a version conflict for jQuery - https://phabricator.wikimedia.org/T148020#2712358 (hashar) [10:42:49] Analytics-Dashiki, Continuous-Integration-Config: Add CI job for Dashiki | jQuery version mismatch - https://phabricator.wikimedia.org/T148019#2712346 (hashar) [10:47:59] hey elukey, how many nodes has the Hadoop cluster? [10:48:57] so an102[89], an103[0-9], an104[0-9], an105[0-7] IIRc [10:49:26] should be 30 [10:49:31] elukey, so around 30 [10:49:32] ok ok [10:49:34] thanks [10:49:37] :] [10:49:40] ah plus the master nodes [10:49:41] 32 [11:07:14] (PS1) Hashar: build: run karma test with just "npm test" [analytics/dashiki] - https://gerrit.wikimedia.org/r/315659 (https://phabricator.wikimedia.org/T148019) [11:09:32] mforns: hello :) Looks like dashiki / mediawiki-storage need some Jenkins jobs :] [11:09:56] there is a jquery version issue when running bower install for dashiki though :( [11:10:03] hashar, hi! would be cool [11:10:14] aha [11:10:38] * mforns looks [11:11:43] Analytics-Dashiki, Continuous-Integration-Config, Patch-For-Review: Add CI job for Dashiki | jQuery version mismatch - https://phabricator.wikimedia.org/T148019#2712433 (hashar) [11:12:17] Analytics-Dashiki, Continuous-Integration-Config: Add CI job for analytics/mediawiki-storage - https://phabricator.wikimedia.org/T148023#2712435 (hashar) [11:12:44] I am hacking in mediawiki-storage :D [11:13:18] Analytics, Operations, Traffic: The WMF-Last-Access Set-Cookie header should follow RFC 2965 syntax rather than the pre-RFC Netscape format - https://phabricator.wikimedia.org/T147967#2712450 (ema) p:Triage>Normal [11:13:19] hashar, it looks that mediawiki-storage jquery version is conflicting with dashiki, I guess upgrading in mediawiki-storage will remove the conflict [11:16:53] (PS1) Hashar: package.json: mark as private, drop version [analytics/mediawiki-storage] - https://gerrit.wikimedia.org/r/315660 [11:16:55] I guess yes [11:17:07] which lead me to consider adding some Jenkins job for mediawiki-storage as well hehe [11:20:03] mforns: about nodes in cluster: yarn.w.o tells you (first table in page) ) [11:20:21] mforns: gives you even more info: how many cores and RAM [11:20:33] joal, hi! oh, thanks [11:20:47] np mforns, it's an interesting view of the cluxster ;) [11:23:58] (PS5) Joal: Update casssandra loading classes for new AQS [analytics/refinery/source] - https://gerrit.wikimedia.org/r/295663 (https://phabricator.wikimedia.org/T147841) [11:35:12] (PS2) Hashar: build: run karma test with just "npm test" [analytics/dashiki] - https://gerrit.wikimedia.org/r/315659 (https://phabricator.wikimedia.org/T148019) [11:37:03] (PS1) Hashar: build run karma test with just "npm test" [analytics/mediawiki-storage] - https://gerrit.wikimedia.org/r/315664 (https://phabricator.wikimedia.org/T148023) [11:37:36] bah karma start never exists chromium :( [11:43:28] hashar, yea... [11:43:52] hashar, is that a showstopper [11:43:53] ? [11:44:58] (CR) Joal: "Just tested, works like a charm :)" (5 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/295663 (https://phabricator.wikimedia.org/T147841) (owner: Joal) [11:48:16] mforns: found out. Have to pass to karma start : --single-run [11:48:21] (PS2) Hashar: build run karma test with just "npm test" [analytics/mediawiki-storage] - https://gerrit.wikimedia.org/r/315664 (https://phabricator.wikimedia.org/T148023) [11:48:25] hashar, ok [11:48:32] going out for lunch, will add some experimental jenkins job and see what happens :] [11:50:15] (PS3) Hashar: build: run karma test with just "npm test" [analytics/dashiki] - https://gerrit.wikimedia.org/r/315659 (https://phabricator.wikimedia.org/T148019) [12:03:24] Analytics-Kanban: Remove version conflict in Dashiki bower install - https://phabricator.wikimedia.org/T148027#2712544 (mforns) [12:03:49] (PS1) Mforns: Make jquery version compatible with Dashiki [analytics/mediawiki-storage] - https://gerrit.wikimedia.org/r/315666 (https://phabricator.wikimedia.org/T148027) [12:04:24] (CR) Mforns: [C: 2 V: 2] "Self merge" [analytics/mediawiki-storage] - https://gerrit.wikimedia.org/r/315666 (https://phabricator.wikimedia.org/T148027) (owner: Mforns) [12:14:54] joal, do you know how to push tags to the github mirror associated to a gerrit repo? [12:15:46] or elukey ^? [12:15:57] mforns: I have never done it :( [12:16:31] mforns: you'd need to push them to the gerrit repo first [12:16:36] then they'll get mirrored [12:16:56] I do it with varnishkafka [12:17:10] elukey, oh ok [12:17:42] elukey, oh it worked, I though I had tried this before [12:17:44] thanks! [12:39:40] * elukey afk for a bit! [12:42:13] (PS2) Hashar: package.json: mark as private, drop version [analytics/mediawiki-storage] - https://gerrit.wikimedia.org/r/315660 [12:42:15] (PS3) Hashar: build run karma test with just "npm test" [analytics/mediawiki-storage] - https://gerrit.wikimedia.org/r/315664 (https://phabricator.wikimedia.org/T148023) [12:42:30] (CR) Hashar: "check experimental" [analytics/mediawiki-storage] - https://gerrit.wikimedia.org/r/315664 (https://phabricator.wikimedia.org/T148023) (owner: Hashar) [12:48:52] bah karma doesn't run :( [12:49:37] (CR) Hashar: "check experimental" [analytics/dashiki] - https://gerrit.wikimedia.org/r/315659 (https://phabricator.wikimedia.org/T148019) (owner: Hashar) [12:51:01] mforns: thanks for the quick fix about jquery version :D [12:51:20] CI somehow does not run karma though bah [12:53:33] stupid npm [12:56:12] (PS4) Hashar: build: run karma test with just "npm test" [analytics/dashiki] - https://gerrit.wikimedia.org/r/315659 (https://phabricator.wikimedia.org/T148019) [12:56:47] (CR) Hashar: "I have adjusted the npm "test" script to invoke "karma" directly, that is to please npm 2.x which CI uses." [analytics/dashiki] - https://gerrit.wikimedia.org/r/315659 (https://phabricator.wikimedia.org/T148019) (owner: Hashar) [12:56:51] (CR) Hashar: "check experimental" [analytics/dashiki] - https://gerrit.wikimedia.org/r/315659 (https://phabricator.wikimedia.org/T148019) (owner: Hashar) [12:57:54] (PS4) Hashar: build run karma test with just "npm test" [analytics/mediawiki-storage] - https://gerrit.wikimedia.org/r/315664 (https://phabricator.wikimedia.org/T148023) [12:58:06] (CR) Hashar: "I have adjusted the npm "test" script to invoke "karma" directly, that is to please npm 2.x which CI uses." [analytics/mediawiki-storage] - https://gerrit.wikimedia.org/r/315664 (https://phabricator.wikimedia.org/T148023) (owner: Hashar) [12:58:13] (CR) Hashar: "check experimental" [analytics/mediawiki-storage] - https://gerrit.wikimedia.org/r/315664 (https://phabricator.wikimedia.org/T148023) (owner: Hashar) [12:58:29] mforns: might have something working. We can continue tomorrow if you have any interest [12:58:37] gotta run SWAT then move out for some shopping [12:59:09] (CR) Hashar: [C: 1] build run karma test with just "npm test" [analytics/mediawiki-storage] - https://gerrit.wikimedia.org/r/315664 (https://phabricator.wikimedia.org/T148023) (owner: Hashar) [12:59:46] (CR) Hashar: [C: -1] "Something is not right:" [analytics/dashiki] - https://gerrit.wikimedia.org/r/315659 (https://phabricator.wikimedia.org/T148019) (owner: Hashar) [13:06:35] (PS15) Joal: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - https://gerrit.wikimedia.org/r/307903 (owner: Milimetric) [13:23:18] hashar, sure, let's continue tomorrow :] [13:27:15] yup [13:46:27] milimetric: hiyaaa, yt? [14:02:32] mforns: , yt? [14:02:38] ottomata, hey! [14:02:49] hiyaaa! i need some js brain bounce! [14:02:51] you got a sec? [14:02:59] sure, batcave? [14:03:02] ja [14:03:06] omw [14:32:39] mforns: come back? you got 5 more min? [14:32:44] milimetric, sure! [14:34:01] hi, would it be possible to have recent user agent statistics? [14:45:22] jynus: you can try pivot [14:45:30] https://pivot.wikimedia.org [14:45:43] is it for the OCSP issue? [14:45:57] yes [14:46:05] I would like to see impact [14:46:27] also I think that https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-os would be good too [14:47:03] pivot might be more useful if you want to run more complex query [14:47:19] but if you want raw number the above dashboard should be good [14:47:35] that is not very granular [14:48:25] I need today's statistics compared, for example, to yesterday [14:48:38] I suppose we do not have that [14:49:09] I checked https://pivot.wikimedia.org/#pageviews-hourly and it goes up to the 12th [14:54:02] joal: qq - let's say that I have a webrequest with dt: "-", should I find it in webrequest_raw? I'd say no since camus should have already replaced the '-' with a date right? [14:54:53] elukey: camus isn't going to change the data [14:55:02] it will just choose a timestamp for bucketing [14:55:09] so '-' will be in webrequest_raw [14:55:17] and camus will just assign it to current timestamp bucket during import [14:55:18] elukey: you know the truth, ottomata is always right ;) [14:55:41] haha [14:56:06] elukey: between joal and I: one of us always tells the truth, and the other always lies [14:56:16] i always tell the truth [14:56:19] ahhahaah [14:56:39] I am trying to verify some data on hive but my brain is a bit melted [14:57:02] The jottoal paradox ! [14:57:24] I was sure about the '-' stored but then I haven't found what I was expecting so I started to doubt about my understanding, that when comes to hadoop is always -2 [14:59:07] elukey: concern with the '-' is that it is stored in camus runtime partition [15:00:11] (I reworked https://grafana.wikimedia.org/dashboard/db/varnishkafka today) [15:01:23] elukey, ottomata : standdduppppp [15:01:49] Oo oyaaaa nice elukey [15:01:50] AHHHk [15:02:26] trying to join... [15:02:53] weird [15:03:22] am I in the meeting? [15:03:52] You were for a few seconds ottomata [15:12:39] Analytics-Kanban, Patch-For-Review: Remove version conflict in Dashiki bower install - https://phabricator.wikimedia.org/T148027#2713051 (hashar) [15:12:42] Analytics-Dashiki, Continuous-Integration-Config: Dashiki bower has a version conflict for jQuery - https://phabricator.wikimedia.org/T148020#2713053 (hashar) [15:13:04] Analytics-Kanban, Patch-For-Review: Remove version conflict in Dashiki bower install - https://phabricator.wikimedia.org/T148027#2712544 (hashar) [15:13:07] Analytics-Dashiki, Continuous-Integration-Config, Patch-For-Review: Add CI job for Dashiki | jQuery version mismatch - https://phabricator.wikimedia.org/T148019#2713055 (hashar) [15:13:18] Analytics-Kanban, Patch-For-Review: Remove version conflict in Dashiki bower install - https://phabricator.wikimedia.org/T148027#2712544 (hashar) Looks good to me now :] Thank you! [15:13:29] Analytics-Dashiki, Continuous-Integration-Config, Patch-For-Review: Add CI job for Dashiki - https://phabricator.wikimedia.org/T148019#2712346 (hashar) [15:25:10] Analytics, EventBus, Wikimedia-Stream: Fix consumer.disconnect() node-rdkafka bug - https://phabricator.wikimedia.org/T148043#2713087 (Ottomata) [15:25:33] elukey: whats up? [15:26:42] ottomata: today I have built https://etherpad.wikimedia.org/p/analytics-oozie-13102016 [15:27:07] and I was trying to figure out what the numbers meant [15:27:08] day hour hostname count_expected count_different [15:27:09] 13 0 cp4015.ulsfo.wmnet 5323737 496794 [15:27:18] the "count_different" [15:27:45] because I can find only 1024 dt: "-" for the combination host/hour in hive [15:27:49] count different is calculated from count_expected and count_actual (uhh) [15:27:55] yeah yeah [15:28:13] yeah count_actual [15:28:22] but why that different? the 1024 '-' are all facebook's crawler for https://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Miley_Cyrus_on_2015_Rock_and_Roll_Hall_of_Fame_Induction_Ceremony_%28cropped%29.jpg/720px-Miley_Cyrus_on_2015_Rock_and_Roll_Hall_of_Fame_Induction_Ceremony_%28cropped%29.jpg [15:28:27] that ends up in a 400 [15:28:27] :O [15:28:57] elukey: not sure what you are saying [15:28:58] 1024 - [15:28:59] ? [15:29:07] ahahhaa sorry [15:29:08] oh [15:29:09] so [15:29:13] in 13 0 [15:29:15] day hour [15:29:18] there are 1024 events [15:29:19] with dt - [15:29:20] ? [15:29:23] 1024 records* [15:29:26] yes exactly [15:29:28] ok [15:29:39] all of them requesting the same thing [15:29:42] over and over [15:29:44] hm, i don't think the dt - will have much to do with count_different [15:30:13] count_expected is just max(seq) - min(seq) [15:30:16] wasn't it the cause of bucket misplacements ? [15:30:23] hmm [15:30:25] hmmm [15:30:28] yes but if there are holes? [15:30:30] if it is that misplaced yes [15:30:42] maybe 1024 holes cause weird calculations [15:30:42] usually it doesn't happen like that, but it could [15:30:52] elukey: you can examine sequence max and min [15:30:57] the table has that [15:31:11] maybe the - records have a really large seq max that cause a large hole [15:31:49] so ja [15:31:50] if [15:31:55] Analytics-Kanban, ChangeProp, EventBus, Wikimedia-Stream, Services (watching): Write node-rdkafka event.stats callback that reports stats to statsd - https://phabricator.wikimedia.org/T145099#2713117 (Nuria) [15:32:21] if the dt - record had a really low seq, say seq 100 [15:32:44] but the hour of the current camus run really started with seq 1000 [15:32:44] say [15:32:51] current hour = 0 [15:32:56] (current time of camus run) [15:32:58] and [15:33:06] lowest seq with record with dt hour == 0 [15:33:07] is 1000 [15:33:17] AND, there are record coming from kakfa with dt - [15:33:19] but seq low, like 100 [15:33:20] then yeah [15:33:23] that would make a big hole [15:33:38] * elukey cries in a corner [15:34:42] so I can double check listing for that hour the first say 10 min seq and the last 10 max seq [15:39:55] elukey: https://github.com/wikimedia/analytics-refinery/blob/master/hive/webrequest/select_missing_sequence_runs.hql [15:39:57] might help [15:40:18] that will print holes :) [15:40:35] then you can examine the records that match the seqs at the borders of the hole [15:41:21] Analytics-Kanban, Research-and-Data, Research-collaborations, Research-management, Patch-For-Review: Oozie job to extract data for WDQS research - https://phabricator.wikimedia.org/T146064#2713157 (leila) Thank you all! [15:43:19] Analytics-Kanban, Operations: MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713164 (Nuria) [15:43:45] Analytics-Kanban, Operations: MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713145 (jcrespo) The workaround for now is to use Firefox, as it has its own TLS stack different from the OS one. [15:43:56] Analytics-Kanban, Operations: MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713167 (jcrespo) p:Triage>Unbreak! [15:44:27] Analytics-Kanban, Operations, Traffic: MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713169 (jcrespo) [15:47:23] Analytics-Kanban, EventBus, Wikimedia-Stream: Fix consumer.disconnect() node-rdkafka bug - https://phabricator.wikimedia.org/T148043#2713087 (Nuria) [15:47:26] Analytics-Kanban, Operations, Traffic: MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713178 (Zppix) Doesn't appear to affect the iOS 8.1 app for Wikipedia. [15:48:50] I found the repro,https://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Miley_Cyrus_on_2015_Rock_and_Roll_Hall_of_Fame_Induction_Ceremony_%28cropped%29.jpg/720px-Miley_Cyrus_on_2015_Rock_and_Roll_Hall_of_Fame_Induction_Ceremony_%28cropped%29.jpg generates a weird VSL log... [15:49:14] so dt: '-', etc.. [15:49:14] Analytics-Kanban, Operations, Traffic: MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713181 (jcrespo) These has been some of the updates we had recently: > We don't yet understand the full scope or specifics of either the > underlying issue GlobalSign is having, or any impa... [15:49:19] Analytics-Kanban, Operations, Traffic: MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713182 (Zppix) [15:52:08] Analytics-Kanban, Operations, Traffic: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713203 (Zppix) [15:55:10] Analytics, Easy: Standardize logic, names, and null handling across UDFs in refinery-source {hawk} - https://phabricator.wikimedia.org/T120131#1845910 (Nuria) Null treatment might be different per UDFs as nulls in hive depend on the columns they operate on . Changes need to be backwards compatible so i... [15:55:28] Analytics-Kanban, Operations, Traffic: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713217 (Zppix) Clearing CertUlti on edge doesn't fix the issue [15:55:47] Analytics-Kanban, Easy: Standardize logic, names, and null handling across UDFs in refinery-source {hawk} - https://phabricator.wikimedia.org/T120131#1845910 (Nuria) [15:56:49] Analytics-Kanban, Operations, Traffic: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713223 (ema) [15:58:27] Analytics-Kanban, Operations, Traffic: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713243 (Zppix) [15:59:23] Analytics-Kanban, Operations, Traffic: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713145 (Zppix) Edge is completely blocking access to WMF sites as shown in screenshot number 2 in the task description [15:59:47] Analytics-Kanban, Operations, Traffic: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713248 (BBlack) [15:59:51] Analytics: Evaluate a unit testing framework and add tests for the formatter function - https://phabricator.wikimedia.org/T147440#2692917 (Nuria) Goal is to apply unit testing to 1 function to start, "format function" seems the easiest to get started with. https://github.com/wikimedia/varnishkafka/blob/maste... [16:01:20] Analytics-Kanban: Evaluate a unit testing framework and add tests for the formatter function - https://phabricator.wikimedia.org/T147440#2692917 (Nuria) [16:02:11] Analytics-Kanban, Operations, Traffic: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713281 (Zppix) [16:02:23] Analytics-Kanban, Operations, Traffic: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713145 (Zppix) [16:05:12] Analytics-Kanban, Patch-For-Review: Better Compiler warnings in Makefile - https://phabricator.wikimedia.org/T147436#2713303 (Nuria) [16:06:04] Analytics-Kanban, Operations, Traffic, HTTPS: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713305 (Zppix) [16:09:23] milimetric: sqoop path doesn't exists it seems :( [16:09:49] joal: /user/milimetric/wmf/data/raw/mediawiki/tables [16:09:51] sorry? [16:10:02] oh, sorry mate [16:10:08] milimetric: --^ [16:10:27] milimetric: I thought it was from root, not your user ! my mistaje [16:10:36] my bad [16:10:49] Analytics-Kanban, Operations, Traffic, HTTPS: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713344 (Zppix) [16:12:25] Analytics-Kanban, Operations, Traffic, HTTPS: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713145 (Zppix) [16:20:13] Analytics-Kanban, Operations, Traffic, HTTPS: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713388 (Zppix) An user reports on IRC: earlier today (9-10 a.m. US eastern time) only *.wikimedia.org and wikimediafoundation.org sites were affected by the cert pro... [16:26:09] Analytics-Kanban: Examine puppet code for Event Logging and make sure monitoring is using the best counts - https://phabricator.wikimedia.org/T147321#2713408 (Nuria) Changes; - remove server-side as it is no longer relevant - alarm on EventError values rather than raw/valid [16:27:15] Analytics-Kanban: Examine puppet code for Event Logging and make sure monitoring is using the best counts - https://phabricator.wikimedia.org/T147321#2713410 (Nuria) [16:30:50] Analytics-Kanban: Optimize Edit History denormalized table extraction for big wikis - https://phabricator.wikimedia.org/T146481#2713427 (Nuria) [16:31:07] Analytics-Kanban: Examine puppet code for Event Logging and make sure monitoring is using the best counts - https://phabricator.wikimedia.org/T147321#2688873 (Ottomata) While we're at it, let's remove unused references to server_side events in `role::eventlogging`. [16:36:42] Analytics-Kanban, Operations, Traffic, HTTPS: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713447 (Zppix) A user in ENWIKI's help irc channel reports the error on Windows 10 Professional latest version, on Chrome - Version 54.0.2840.59 beta-m [16:37:35] Analytics-Kanban, Operations, Traffic, HTTPS: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713448 (Zppix) [16:40:30] Analytics, Analytics-Dashiki: Migrate from bower to yarn or just npm - https://phabricator.wikimedia.org/T147884#2706880 (Nuria) Dependencies need to exist on npm after the replacement is a drop in. Remove references to bower. [16:42:13] Analytics-Dashiki, Analytics-Kanban: Migrate from bower to yarn or just npm - https://phabricator.wikimedia.org/T147884#2706880 (Nuria) [16:44:57] Analytics-Kanban, Operations, Traffic, HTTPS: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713503 (Zppix) [16:45:56] Analytics, Analytics-Dashiki: Switch to fetch away from jquery - https://phabricator.wikimedia.org/T148053#2713455 (Nuria) We need a polyfill otherwise our level of support drops too much. We need to remove references to jquery , we use it little but some, also chnages need to be done on tests. [16:47:44] Analytics-Dashiki, Analytics-Kanban: Switch to fetch away from jquery - https://phabricator.wikimedia.org/T148053#2713455 (Nuria) [16:48:33] Analytics-Kanban, Operations, Traffic, HTTPS: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713526 (Paladox) Chrome works for me still, seems to be spreading so may affect firefox soon. [16:49:27] milimetric: When cleaning up your hdfs folder, you deleted the namespace_db_mapping file :( [16:49:53] Analytics-Kanban, Operations, Traffic, HTTPS: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713529 (Joe) @Paladox: firefox will keep working fine as it uses a different TLS stack from the one provided by the OS. [16:50:00] joal: from what I can see all the dt: '-' use cases are the fb crawler trying to get https://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Miley_Cyrus_on_2015_Rock_and_Roll_Hall_of_Fame_Induction_Ceremony_%28cropped%29.jpg/720px-Miley_Cyrus_on_2015_Rock_and_Roll_Hall_of_Fame_Induction_Ceremony_%28cropped%29.jpg [16:50:05] over and over [16:50:32] (that generates a VSL timeout for some reason) [16:52:29] Analytics-Kanban, Operations, Traffic, HTTPS: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713145 (Pietrodn) > GlobalSign suggested the following workaround, it's unclear whether it actually works or not: https://support.globalsign.com/customer/portal/article... [16:53:15] Analytics-Kanban, Operations, Traffic, HTTPS: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713539 (Zppix) @Pietrodn so firefox doesnt work on mac? [16:54:19] Analytics-Kanban, Operations, Traffic, HTTPS: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713543 (Mholloway) [16:54:50] Analytics-Kanban, Operations, Traffic, HTTPS: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713544 (Pietrodn) >>! In T148045#2713539, @Zppix wrote: > @Pietrodn so firefox doesnt work on mac? Wikipedia on Firefox works fine on macOS Sierra. Seems to be the onl... [16:55:35] Analytics-Kanban, Operations, Traffic, HTTPS: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713545 (Zppix) @Pietrodn ack [16:58:35] elukey: Weird !!! [16:59:01] merging your patch in the meantime! [16:59:10] I applied one of it manually to https://pivot.wikimedia.org/ [16:59:12] looks good [16:59:50] Cool elukey, thanks :) [17:00:50] milimetric: ping? [17:00:56] hey joal [17:01:08] I've got my 1/1 soon, what's up [17:01:09] milimetric: I bet you have not seen my previous message :) [17:01:24] 18:49:27 < joal> milimetric: When cleaning up your hdfs folder, you deleted the namespace_db_mapping file :( [17:01:24] joal: oh no, sorry! [17:01:25] shit [17:01:28] milimetric: no prob :) [17:01:33] milimetric: pinging again :) [17:01:37] I'll import it now, I'll tell you where it goes to [17:01:44] thanks ! [17:02:06] milimetric: like that I can finish testing with the latest sqooped data [17:04:02] joal: https://pivot.wikimedia.org/ - done :) [17:04:16] Yay ! [17:04:24] as you said elukey, looking good :) [17:09:01] milimetric, do you want to look at the task together? [17:09:11] (CR) Nuria: "We are actually removing bower/npm in favor of yarn so now might not be the best time for these changes?" [analytics/dashiki] - https://gerrit.wikimedia.org/r/315659 (https://phabricator.wikimedia.org/T148019) (owner: Hashar) [17:09:31] Analytics-Kanban, Operations, Traffic, HTTPS: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713587 (Paladox) More detailed GlobalSign explanation of the problem https://twitter.com/globalsign/status/786612660397715456 [17:10:14] (CR) Nuria: ">Uncaught Error: Module name "lodash" has not been loaded yet for context" [analytics/dashiki] - https://gerrit.wikimedia.org/r/315659 (https://phabricator.wikimedia.org/T148019) (owner: Hashar) [17:10:18] Analytics-Kanban, Operations, Traffic, HTTPS: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713594 (Pietrodn) More detailed explanation of the technical problem by GlobalCert: https://downloads.globalsign.com/acton/fs/blocks/showLandingPage/a/2674/p/p-008f/t/p... [17:10:59] mforns: yeah, I promise to look at all of them by the end of today, but I have to meet with Nuria first and I should eat eventually [17:11:11] Analytics-Kanban, EventBus, Wikimedia-Stream, Services (watching), User-mobrovac: Public Event Streams - https://phabricator.wikimedia.org/T130651#2713595 (Ottomata) Hm, EventSource kinda sucks? I'd like to send a proper HTTP error response status if something is wrong before I start sending... [17:11:32] milimetric, ok, I'll do one now, and let the comments without voting [17:12:24] Analytics-Visualization: Limn ignores timespan: {start: ..., end: ..., step: ...} configuration - https://phabricator.wikimedia.org/T74088#2713601 (Nuria) Open>declined [17:12:31] Analytics-Visualization: Limn ignores timespan: {start: ..., end: ..., step: ...} configuration - https://phabricator.wikimedia.org/T74088#768662 (Nuria) Limn is deprecated [17:13:04] Analytics-Visualization: Don't show current date on Limn graphs - https://phabricator.wikimedia.org/T64339#2713603 (Nuria) Open>declined Limn is deprecated [17:13:21] Analytics-Visualization: Remove /graphs/new for creating charts - https://phabricator.wikimedia.org/T56809#2713605 (Nuria) Open>declined Limn is deprecated [17:13:33] Analytics-Kanban, Operations, Traffic, HTTPS, Patch-For-Review: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713607 (Legoktm) [17:13:48] Analytics-Visualization: Split RC UV charts into region and country charts. - https://phabricator.wikimedia.org/T49316#2713609 (Nuria) Open>declined Limn is deprecated. Declining. [17:14:27] Analytics-Visualization: The examples should be hosted on limn1 and include proper examples of limn functionality - https://phabricator.wikimedia.org/T65637#2713611 (Nuria) Open>declined Limn is deprecated.declining. [17:14:51] Analytics-Visualization: Delta/change rates displaying incorrectly - https://phabricator.wikimedia.org/T56305#2713618 (Nuria) Open>declined Limn is deprecated. Declining. [17:15:57] Analytics, Analytics-Visualization: country data identified by row number rather than actual country name (or ISO code) - https://phabricator.wikimedia.org/T56359#2713621 (Nuria) [17:17:27] (PS20) Milimetric: Script sqooping mediawiki tables into hdfs [analytics/refinery] - https://gerrit.wikimedia.org/r/306292 (https://phabricator.wikimedia.org/T141476) [17:23:31] Analytics: Mark documentation about limn as deprecated - https://phabricator.wikimedia.org/T148058#2713676 (Nuria) [17:26:00] joal: /user/milimetric/wmf/data/raw/mediawiki/project_namespace_map/project_namespace_map [17:26:28] (I'm using that path for both of these because it'll be very close to the final production path, just the /user/milimetric would be deleted when we go live) [17:27:03] joal: so let me know if all this new data looks good and I'll start a full import right away if it does [17:27:16] * milimetric tries really hard to remember to use screen :) [17:30:22] (CR) Nuria: [C: 2] Update casssandra loading classes for new AQS [analytics/refinery/source] - https://gerrit.wikimedia.org/r/295663 (https://phabricator.wikimedia.org/T147841) (owner: Joal) [17:32:39] Analytics-Kanban, Operations, Traffic, HTTPS, Patch-For-Review: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713690 (Pietrodn) Working workaround for Chrome and Safari on macOS Sierra: http://apple.stackexchange.com/a/257112/33925 ``` $ sqlite3 ~/Library... [17:37:18] Analytics-Kanban, Operations, Traffic, HTTPS, Patch-For-Review: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713145 (BBlack) We've received an updated intermediate cert from GlobalSign that's compatible with our existing end-certs and supposedly fixes the... [17:38:03] ottomata: i ran select sequence,uri_path from webrequest where webrequest_source = 'upload' and year = 2016 and month = 10 and day = 13 and hour = 14 and hostname = 'cp4014.ulsfo.wmnet' order by sequence limit 20; [17:38:17] that was a host/hour showing the data loss [17:38:34] Analytics-Visualization: Split RC UV charts into region and country charts. - https://phabricator.wikimedia.org/T49316#2713712 (ezachte) comScore data are also a thing of the past [17:38:44] and all the first seqno, showing holes, are related to the Miley_Cyrus link [17:39:02] huh! [17:39:05] ok interesting [17:39:17] so that's good to know [17:39:24] so the root cause of the issue is the Miley_Cyrus link get dt:'-' for $reason (that I need to figure out), then get bucketed in the wrong hour [17:39:28] this is related to what we talked about in june a little right? potentially discarding records with -? [17:40:04] not sure, in this case I'd like to solve the problem [17:40:14] elukey: when requesting manually the url you pasted, it seems to be related to genereting thumnails [17:40:26] yeah.. I get a 400 [17:40:46] but I was not able to repro a VSL timeout or similar with varnishlog [17:41:25] (PS16) Joal: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - https://gerrit.wikimedia.org/r/307903 (owner: Milimetric) [17:41:25] so I am thinking that some of the requests that the fb bot makes for Miley_Cyrus are ending up in a weird state (like VSL timeout) [17:41:57] elukey: would be interesting to see if thumbs sizes in urls are different (just thinking loud) [17:42:40] joal: are you able to get a 200 from that link? [17:42:47] elukey: didn't even try [17:43:01] ahh okok [17:43:21] anyhow the URI is always the same [17:43:33] ok, makes sense elukey [17:44:02] elukey: was just wondering if it could be realted to big back treatment taking long time (like huge image to thumb) [17:46:15] I don't think that it even cause any generation since it ends up in a 400 [17:46:18] https://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Miley_Cyrus_on_2015_Rock_and_Roll_Hall_of_Fame_Induction_Ceremony_%28cropped%29.jpg/720px-Miley_Cyrus_on_2015_Rock_and_Roll_Hall_of_Fame_Induction_Ceremony_%28cropped%29.jpg [17:46:24] and I can see the 400 in the varnish log [17:46:30] for the ones not ending up in a VSL timeout [17:47:05] anyhow, gotta go, will restart tomorrow morning the investigation [17:47:15] at least now we have a sort of root cause :) [17:47:46] talk with you tomorrow team!! thanks for the help ottomata and joal! [17:51:12] milimetric: user and logging table for user works without casts [17:51:17] milimetric: But it fails for pages [17:53:06] milimetric: I'll keep casts everywhere, being safe in case [17:55:29] (PS17) Joal: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - https://gerrit.wikimedia.org/r/307903 (owner: Milimetric) [18:02:45] wow actually milimetric, I get the same error with casting in spark !!! [18:02:57] sigh [18:03:05] ok, done with my meet joal [18:03:08] let's batcave? [18:03:12] milimetric: sorry for the bad news :( [18:03:20] oh - other meeting [18:03:20] milimetric: we are in research meeting now [18:03:22] after? [18:03:23] k [18:03:48] so maybe I should just not map whatever field fails... we can talk after [18:04:47] milimetric: I don't have the field which is failing :( [18:07:07] mforns: wanna chat in like 20 min? I'll try to run out and get some food [18:08:38] milimetric, not sure I can, the owners of the house are coming to talk in 10 mins, but I'll ping you after [18:32:48] Analytics-Kanban, Operations, Traffic, HTTPS, Patch-For-Review: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713909 (BBlack) We're working through the other minor one-off cert issues now on smaller (mostly for technical folks sites), I'm breaking off a se... [18:40:38] milimetric: around for a quick chat on sqoop? [18:41:18] Analytics-Kanban, Operations, Traffic, HTTPS: GlobalSign intermediate updates for one-offs - https://phabricator.wikimedia.org/T148069#2713969 (BBlack) [18:42:06] milimetric, back [18:49:30] mforns: ok, great, you still working? [18:49:42] milimetric: was about to say I'm going for diner [18:49:43] milimetric, yes, for 1 hour [18:50:09] sorry joal, we can catch up tomorrow on these tables then? [18:50:13] milimetric: sounds good [18:50:27] later a-team ! [18:50:29] mforns: to the batcave, let's coordinate [18:50:33] ok [18:59:01] Analytics-Kanban, Operations, Traffic, HTTPS: GlobalSign intermediate updates for one-offs - https://phabricator.wikimedia.org/T148069#2714049 (BBlack) [19:06:01] Analytics-Kanban, Operations, Traffic, HTTPS: GlobalSign intermediate updates for one-offs - https://phabricator.wikimedia.org/T148069#2714092 (BBlack) [19:11:01] Analytics-Kanban, Operations, Traffic, HTTPS: GlobalSign intermediate updates for one-offs - https://phabricator.wikimedia.org/T148069#2713969 (MoritzMuehlenhoff) seaborgium and serpens use certs from our internal CA, not from GlobalSign. [19:12:40] Analytics-Kanban, Operations, Traffic, HTTPS: GlobalSign intermediate updates for one-offs - https://phabricator.wikimedia.org/T148069#2714139 (BBlack) The ones in the puppet repo under files/ssl/ are signed by GlobalSign.... I wonder what's out of sync here? [19:19:01] Analytics-Kanban, Operations, Traffic, HTTPS: GlobalSign intermediate updates for one-offs - https://phabricator.wikimedia.org/T148069#2714151 (MoritzMuehlenhoff) When we setup the openldap replacement servers for the OpenDJ setup, we started with an internal cert from the beginning. From what I... [19:20:16] Analytics-Kanban, Operations, Traffic, HTTPS: GlobalSign intermediate updates for one-offs - https://phabricator.wikimedia.org/T148069#2714152 (BBlack) [19:25:56] Analytics-Kanban, Operations, Traffic, HTTPS: GlobalSign intermediate updates for one-offs - https://phabricator.wikimedia.org/T148069#2714157 (BBlack) These are all fixed up now I believe, except for the 3x externally-hosted sites, which still link to the R1 root.... [19:27:33] Analytics-Kanban, Operations, Traffic, HTTPS, Patch-For-Review: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2713145 (hashar) OCG on ocg1001 ocg1002 ocg1003, started yielding CERT_UNTRUSTED error at 17:30 UTC One can monitor it via Grafana backend success... [19:30:46] Analytics-Kanban, Operations, Traffic, HTTPS: OCG failing with new GlobalSign intermediate workaround - https://phabricator.wikimedia.org/T148076#2714176 (BBlack) [19:32:53] Analytics-Kanban, Operations, Traffic, HTTPS: OCG failing with new GlobalSign intermediate workaround - https://phabricator.wikimedia.org/T148076#2714191 (BBlack) @akosiaris found https://github.com/nodejs/node/blob/db1087c9757c31a82c50a1eba368d8cba95b57d0/src/node_root_certs.h [19:54:44] Analytics-Kanban, Operations, Traffic, HTTPS, Patch-For-Review: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2714251 (Nuria) Will get numbers for Mac OS requests on Chrome and Safari per hour for the last 3 days to quantify impact, let me know if you no lo... [21:02:19] (PS1) Milimetric: Report session funnel weekly instead of daily [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/315829 (https://phabricator.wikimedia.org/T147492) [21:02:54] (CR) Milimetric: "NOTE: this needs to be deployed in sync with running the scripts/aggregate-and-filter-sessions.py script on the existing output." [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/315829 (https://phabricator.wikimedia.org/T147492) (owner: Milimetric) [21:26:25] Analytics-Kanban, Operations, Traffic, HTTPS, Patch-For-Review: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2714531 (BBlack) [21:26:29] Analytics-Kanban, Operations, Traffic, HTTPS: OCG failing with new GlobalSign intermediate workaround - https://phabricator.wikimedia.org/T148076#2714528 (BBlack) Open>Resolved a:BBlack Resolved for now. To recap: Initial symptom was lots of errors the ocg logs after we deployed the... [21:30:26] Analytics-Kanban, Operations, Traffic, HTTPS, Patch-For-Review: Windows 10 & MacOS Sierra Cert errors - https://phabricator.wikimedia.org/T148045#2714536 (BBlack) @Nuria - it would have to be specifically for MacOS Sierra (the new version that came out less than a month ago). There were othe... [21:32:07] Analytics-Kanban, Operations, Traffic, HTTPS: OCG failing with new GlobalSign intermediate workaround - https://phabricator.wikimedia.org/T148076#2714537 (hashar) That is a very nice fix and summary. Thank you! [21:38:14] Analytics-Kanban, Operations, Traffic, HTTPS: OCG failing with new GlobalSign intermediate workaround - https://phabricator.wikimedia.org/T148076#2714544 (Volans) FYI it's worth noticing that the upgrade of NodeJS for this service looks a bit broken by design to me, given that `apt-get` will over... [21:47:50] Analytics-Kanban, EventBus, Wikimedia-Stream, Services (watching), User-mobrovac: Public Event Streams - https://phabricator.wikimedia.org/T130651#2714575 (GWicke) We discussed error handling on IRC, and found that HTTP status codes / headers for SSE requests are reported in the dev console i...