[05:48:04] 10Analytics, 10Jupyter-Hub, 10Operations, 10SRE-Access-Requests: JupyterHub access for meps not working (was: Requesting access to analytics servers for mepps) - https://phabricator.wikimedia.org/T192472#4281768 (10Dzahn) @mepps Try again now. I added you to "wmf". Seems you were actually _not_ in that yet... [06:12:58] hello :) [06:46:19] joal: morningggg [06:46:30] if you are ok I'd do the following this morning [06:47:31] 1) increase the number of journal nodes from 3 to 5. The idea is to merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/440130/, set up the new hosts (copying the edit journal, etc..) and then do the namenodes restart [06:47:49] 2) upgrade cassandra on aqs to the new patch version (in labs it seems working fine [06:53:18] Wow elukey - Morning! Big plans :) [06:53:24] elukey: I'll follow you :) [07:03:51] \o/ [07:04:01] doing a code review, then I'll start with the journal nodes [07:50:39] ok joal, journal nodes partitions created on analytics1069 and analytics1072 [07:51:00] now I am going to stop one journalnode, backup its data, and then transfer over to them [07:51:34] elukey: +1 [07:51:41] elukey: anything I can help with? [07:52:09] joal: just sanity check my craziness, that's all :) [07:52:14] :) [08:04:09] joal: the new journalnodes are up and running [08:04:16] \o/ ! [08:04:24] in the logs I don't see errors but also I don't see any "initializing journal etc.." [08:04:33] so I'd restart all the other three journal nodes now [08:04:35] before the name nodes [08:08:42] (03CR) 10Fdans: Add glue code to turn "ceiled" pageview values into intervals (031 comment) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/440136 (https://phabricator.wikimedia.org/T188928) (owner: 10Fdans) [08:11:04] ooook done! [08:11:38] nothing really changed, I think it needs a restart of the namenode [08:11:43] I'll do analytics1002 now [08:11:48] and then check hdfs health [08:20:42] currently watching metrics in https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1 [08:20:47] the new ones are coming in [08:33:09] just restarted an1002, dfsHealth.html looks good, I can see the new journal nodes [08:33:20] I'll do a failover to it in a bit [08:33:26] check that everything is good [08:33:30] and then restart an1001 [08:33:36] (and finally failover again) [08:39:04] first failover done, an1002 is the new hdfs master [08:40:08] dfsHealth.html looks good on an1002 [08:43:45] namenode restarted on 1001 (currently standby) [08:48:12] all good [08:49:24] \o/ [08:49:49] last step is the failover to 1001 [08:49:56] will do it in a bit when metrics are settled [08:50:01] buuut seems working fine so far :) [08:51:09] I was looking at that as well :) [08:54:34] done, maintenance completed :) [08:55:19] * joal bows to master elukey :) [08:55:22] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Expand the Hadoop Journal nodes from 3 to 5 to improve resiliency - https://phabricator.wikimedia.org/T189105#4282173 (10elukey) [08:56:03] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Expand the Hadoop Journal nodes from 3 to 5 to improve resiliency - https://phabricator.wikimedia.org/T189105#4031525 (10elukey) Maintenance done today, the new journal nodes are analytics1069 and analytics1072 [08:58:45] \o/ [09:22:24] elukey: the journal nodes GC count drop is surprising :) [09:24:37] it might be due to the restarts, but who knows :D [09:24:58] I am not even sure what kind of GC algorightm would suit best the journal nodes [09:26:34] joal: now it is cassandra's turn :D [09:27:04] :D [09:27:08] (maybe after lunch, let's not chain too many things at once) [09:27:37] elukey: You've taken over the task-slaying role - Thanks you for that: ) [09:30:56] ahahhah :D [09:31:07] it is good to have some time for these tasks [09:39:09] (brb) [10:05:44] * elukey lunch! [11:19:17] back! Upgrading cassandra on aqs :) [11:22:52] * joal continues to follow in the steps of elukey [11:23:51] * elukey is fearless with joal on his side [11:24:18] aqs1004-a up and running with the new cassadra, all good so far [11:24:34] elukey: do you give me a minute to query it? [11:24:53] yep sure, I just restarted aqs1004-b as well [11:25:27] (the only change is in dependencies so I wasn't too worried) [11:25:37] elukey: problem with cqlsh [11:26:02] --verbose :) [11:26:14] ah ImportError: cannot import name cql_keywords_reserved [11:26:23] elukey: correct [11:27:22] mmm seems https://issues.apache.org/jira/browse/CASSANDRA-11850 but it should have been fixed by Eric's work [11:27:35] but maybe not on jessie [11:27:57] could the thing [11:28:16] elukey: AQS happy, so me rather happy as well (even with the cqlsh issue) [11:30:39] weird https://phabricator.wikimedia.org/T196044 [11:30:41] it shold be fixed [11:32:03] confirmed that it is patched [11:32:55] elukey: I confirm that CQLSH_NO_BUNDLED=TRUE cqlsh doesn't work for us as is [11:33:10] elukey: Should we reinstall python-cassandra? [11:34:45] joal: what do you mean? [11:35:09] elukey: should we install/upgrade pyhon-cassandra? [11:35:28] given the problem seems to be coming from that package (just saying after reading the ticket you pasted) [11:35:54] ah! it is already there, version 2.5.1-1~bpo8+1 [11:36:21] I am checking the stretch version [11:36:31] k [11:36:41] 3.7.1-2.1 [11:36:43] there you go [11:37:16] ooooook [11:37:49] elukey: how do we go with that? Stretch upgrade and then test I assume [11:39:02] it might be possible to simply backport python-cassandra 3.7 in jessie-wikimedia [11:42:02] I should have checked cqlsh, ufffff [11:43:31] weird thing is that one of the suggested workarounds is use python <= 2.7.11 [11:43:34] elukey@aqs1004:~$ python -V [11:43:36] Python 2.7.9 [11:44:00] but surely a different thing (in the bug cqlsh doesn't connect) [11:44:11] so yes python-cassandra needs to be upgraded probably [11:48:04] tried to force python == python3.4 but nope [11:53:29] :( [11:55:05] so joal we could either rollback aqs1004, or just wait for the reimage [11:55:19] but without cqlsh might be hard to manage cassandra if anything goes wrong [11:55:25] we don't really have any other client right? [11:55:26] elukey: that's the thing [11:55:41] elukey: Could we reimage and see how it behaves? [11:56:16] I am pretty sure it would work since G*ehel reimaged maps-test2004 with stretch, and everything works [11:56:57] elukey: from a procedure perspective, do we need to upgrade every cassandra instance before starting to reiamge one of them? [11:56:58] heyaaaaa [11:57:03] \o mforns [11:57:38] joal: nope [11:58:19] elukey: then maybe we can reimage and see if the cqlsh problem is fixed? [11:58:37] elukey: cqlsh aqs1004-a works from aqs1005 [11:58:56] elukey: So now we still have a way to manage cassadra [11:59:03] joal: I am pretty sure that it will be fixed since we have a maps node that works with the same config [11:59:55] * joal trusts elukey [12:03:36] ok joal let's do this [12:04:05] I am going to re-set the previous version in the puppet role, and add an override for aqs1004 [12:04:21] then we stop the upgrade as it is, not using cqlsh on aqs1004 as you were saying [12:04:27] and then we reimage all the nodes [12:04:59] hm - I don't get it [12:05:35] ok I'll try to explain it better sorry [12:05:40] If we reimage all the nodes, we won't have any backup solution to use cqlsh in case of issue [12:05:54] at the moment puppet is disabled on all aqs nodes except aqs1004 [12:06:00] the change was applied only in there [12:06:05] ok [12:06:43] (doorbell sorry) [12:06:47] np [12:10:48] here I am [12:11:18] so if I rollback in puppet to the previous version, then it will try to downgrade cassandra on aqs1004 [12:11:32] and the previous package is not there anymore in our apt repo [12:11:53] so what I want to do is temporarily allow a special override for aqs1004, to use this new version [12:11:56] so puppet will be happy [12:12:10] and rollback for the other nodes (basically a no op since they haven't been upgraded) [12:12:23] so we'll be able to use cqlsh except on aqs1004 [12:12:29] and that will be resolved once we reimage [12:13:30] joal: makes sense? [12:14:48] makes sense [12:15:02] elukey: they have not been upgraded because not restarted, right? [12:16:00] also because puppet is disabled (so it didn't try to install any package) [12:16:50] yessir [12:17:10] So what you suggest is to get back to a more stable puppet conf [12:17:30] Where aqs1004 is the special case instead of the other ones [12:18:12] exactly [12:18:43] ok - And then, we'll reimage aqs1004 and check that everything is fine, right? [12:19:52] yep this is the plan [12:20:56] joal: this is the patch https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/440337/ [12:21:26] elukey: sorry for asking - I'm just making sure I can follow :) [12:22:12] joal: nono please keep asking, I am happy to get your opinion, it prevents pebcaks :) [12:22:28] I completely forgot to check cqlsh [12:24:13] elukey: Currently, what is the value of cassandra::version: (I have the impression it is not set in the hieradata files) [12:25:38] after the patch, it is 2.2.6-wmf3 for the role aqs and -wmf5 for the aqs1004 host [12:28:04] elukey: no worriesyes, but before the patch? [12:28:10] Since it's not set? [12:28:21] it was set, I removed it as part of the upgrade [12:28:25] (2.2.6-wmf3) [12:28:34] Ah ! [12:28:37] and since this morning the default is 2.2.6-wmf5 [12:28:39] :( [12:28:52] Makes sense -- I get it now [12:29:06] Ok - Let's go :) I feel I understand neough :) [12:29:50] thanks for the brainbounce :) [12:29:59] I am going to upgrade the tasks and alert Eric [13:12:56] ottomata: o/ [13:14:06] !log re-run failed webrequest-upload/text jobs (namenodes restarted) [13:14:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:15:41] hiyaa [13:20:47] ottomata: I might join 5 mins late to the meeting that we have in a bit [13:22:40] Thanks elukey for the restarts - Didn't notice them [13:24:37] joal: I am not sure though what is the issue, oozie reports failures for refine but the mapred logs show success [13:24:41] I am a bit confused [13:24:45] :( [13:24:52] I don't elukey [13:24:59] +know [13:25:59] let's see what the rerun will do [13:27:15] ok elukey no probs [13:40:19] 10Analytics, 10Patch-For-Review, 10User-Elukey: Upgrade Cassandra on AQS to 2.2.6-wmf5 - https://phabricator.wikimedia.org/T197062#4282812 (10elukey) The new python-cassandra package seems to break cqlsh on Jessie: ``` elukey@aqs1004:~$ sqls -bash: sqls: command not found elukey@aqs1004:~$ cqlsh Traceback (... [13:45:37] elukey: Have we done something special in journal-nodes or HDFS around the 5th?? [13:46:12] elukey: 15:12:24 -!- ottomata [~ottomata@2604:2000:12c1:273:ac0a:744d:5d38:e748] has quit [Changing host] [13:46:23] meh???? [13:46:34] elukey: https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1&panelId=64&fullscreen&from=now-30d&to=now [13:54:15] joal: what is the problem that you are seeing? IIRC nothing done on the 5th [13:54:47] ah interesting, increase in gc count [13:54:53] Same with GC time [13:55:00] So I wonder :) [13:56:07] I don't see anything special in the sal logs [13:56:21] I am wondering if it is more pressure on the edit log due to increased activity [13:56:46] but it doesn't seem so [13:57:43] and it doesn't make a lot of sense, a lot of old gen collections when there is plenty of space [13:58:45] hah? [14:00:59] Gone to catch kids [14:02:06] but after today's restart it went back down to normal levels [14:02:07] weird indeed [14:34:44] so the upload/text jobs failed agai [14:34:46] *again [14:34:51] buuuut the next ones did not [14:35:25] oozie reports a failure in refine, the mapred logs for the related jobs no errors [14:35:48] not sure if I am looking in the right directyion [14:35:52] *direction [14:38:51] ah no oozie is a bit weird [14:39:00] oozie -log shows this [14:39:01] org.apache.oozie.command.CommandException: E0800: Action it is not running its in [OK] state, action [0047251-180510140726946-oozie-oozi-W@mark_raw_dataset_done] [14:39:18] info this instead [14:39:19] 0047251-180510140726946-oozie-oozi-W@mark_raw_dataset_done OK 0047268-180510140726946-oozie-oozi-WSUCCEEDED - [14:39:22] ------------------------------------------------------------------------------------------------------------------------------------ [14:39:25] 0047251-180510140726946-oozie-oozi-W@refine ERROR job_1527607587036_60161FAILED/KILLED2 [14:49:25] ACTION[0047249-180510140726946-oozie-oozi-W@refine] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [2] [14:49:32] * elukey hates oozie logs [15:01:52] nuria_: sorry, omw [15:03:27] 10Analytics, 10EventBus, 10MediaWiki-JobQueue, 10Operations, and 3 others: Clean up cpjobqueue metrics - https://phabricator.wikimedia.org/T196067#4282976 (10fgiunchedi) [15:10:06] 10Analytics: Measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419#4282994 (10Varnent) [15:11:43] 10Analytics: Measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419#4007049 (10Varnent) With that in mind - is internal hosting possible or should we look at third-party hosting for Piwik? We are looking to finalize that in the next few days in case we need to negotiate a t... [15:22:15] elukey: when you get a moment: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/440162/ [15:23:36] heya Amir1 did you see https://phabricator.wikimedia.org/T197000 [15:23:37] ? [15:23:55] 10Analytics, 10Jupyter-Hub, 10Operations, 10SRE-Access-Requests: JupyterHub access for meps not working (was: Requesting access to analytics servers for mepps) - https://phabricator.wikimedia.org/T192472#4283037 (10mepps) That was it! I'm in. Thank you @Dzahn and @Ottomata!! [15:26:38] ottomata: from a quick pass it looks good! If you want to go ahead please do, it might take me a bit to figure out how all the things are working (and I have meetings :() [15:30:59] (03CR) 10Nuria: "Per our conversation, we agreed to keep 100-999 intervals so new and older data (preexisting in cassandra) match." [analytics/aqs] - 10https://gerrit.wikimedia.org/r/440136 (https://phabricator.wikimedia.org/T188928) (owner: 10Fdans) [15:33:15] ok! [15:33:16] :) [15:52:50] yo a-team CHECK IT OUT! [15:52:51] curl https://stream.wikimedia.org/v2/stream/page-create,page-delete?since=2018-06-13T00:00:00Z [15:53:30] wwwwoooooooooo [15:55:03] Wow this incredible :) [15:55:04] \o/ [15:55:55] oldest revision create curr avail [15:55:56] curl -s https://stream.wikimedia.org/v2/stream/revision-create?since=0 | grep data: | head -n 1 | awk -F 'data: ' '{print $NF}' | jq .meta.dt [15:56:00] "2018-06-05T08:05:21+00:00" [15:56:06] wow that's a lot [15:56:26] milimetric: we just modified these mw topics to keep 31 days [15:56:28] ! [15:56:42] That'll be even more :) [15:57:07] heh, love that oneliner, so good [15:59:50] 10Analytics, 10Analytics-Kanban, 10Readers-Web-Backlog: Change virtualpageview agreggation so it does not use source_url - https://phabricator.wikimedia.org/T197243#4283139 (10Nuria) p:05Triage>03High [16:00:02] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904#4283163 (10Nuria) [16:01:17] ping fdans [16:01:23] ping joal [16:01:26] standdduppp [16:04:50] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10Services (doing): Move EventStreams to main Kafka clusters - https://phabricator.wikimedia.org/T185225#4283180 (10Nuria) please update wikitech docs about awesome time based consumption [16:08:05] (03PS4) 10Mforns: Trying to improve routing logic [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/438030 (owner: 10Milimetric) [16:08:09] milimetric, ^ [16:08:34] mforns: oh that was fast :) [16:08:41] heh [16:12:11] 10Analytics, 10Jupyter-Hub, 10Operations, 10SRE-Access-Requests: JupyterHub access for meps not working (was: Requesting access to analytics servers for mepps) - https://phabricator.wikimedia.org/T192472#4283214 (10herron) 05Open>03Resolved a:03herron [16:12:13] 10Analytics, 10Patch-For-Review, 10User-Elukey: Upgrade Cassandra on AQS to 2.2.6-wmf5 - https://phabricator.wikimedia.org/T197062#4277872 (10Eevans) >>! In T197062#4282812, @elukey wrote: > The new python-cassandra package seems to break cqlsh on Jessie: > > ``` > elukey@aqs1004:~$ sqls > -bash: sqls: comm... [16:14:06] 10Analytics, 10Patch-For-Review, 10User-Elukey: Upgrade Cassandra on AQS to 2.2.6-wmf5 - https://phabricator.wikimedia.org/T197062#4283220 (10elukey) >>! In T197062#4283216, @Eevans wrote: > * Fix the patch to source a templated file that defines `CQLSH_NO_BUNDLED` conditionally (i.e. don't set it on Jessie)... [16:21:05] elukey: Super bizarre --> https://yarn.wikimedia.org/jobhistory/attempts/job_1527607587036_60367/m/FAILED [16:21:25] also elukey - Never realized our hadoop icon on the yarn UI was having some red on it :) [16:22:14] ahhh I wanted to ask to you how to get these logs [16:22:26] I went crazy trying via mapred job -logs [16:22:28] * elukey cries [16:22:59] * joal pads elukey on the back and will show after grooming :) [16:23:47] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Cassandra on AQS to 2.2.6-wmf5 - https://phabricator.wikimedia.org/T197062#4283233 (10fdans) [16:23:54] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Cassandra on AQS to 2.2.6-wmf5 - https://phabricator.wikimedia.org/T197062#4277872 (10fdans) p:05Triage>03Normal [16:24:31] 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000#4283240 (10fdans) p:05Triage>03High [16:29:46] 10Analytics, 10Analytics-Wikistats: Improve UI for ‘All wikis’ & ‘Explore Topics’ dropdown - https://phabricator.wikimedia.org/T196982#4283260 (10Milimetric) Some quick opinions: * I like the secondary information being in boxes. In the case of metrics it echoes the Detail page metric boxes nicely. * I don't... [16:32:56] 10Analytics, 10Analytics-Kanban, 10Readers-Web-Backlog: Change virtualpageview agreggation so it does not use source_url - https://phabricator.wikimedia.org/T197243#4283139 (10fdans) p:05High>03Normal [16:34:09] 10Analytics, 10Analytics-Wikistats: Improve UI for ‘All wikis’ & ‘Explore Topics’ dropdown - https://phabricator.wikimedia.org/T196982#4283291 (10Nuria) >I don't like the lines separating the options, it just makes the selectors visually busy. +1 Like pink too. [16:34:11] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Correct mixed case naming of metrics WIkistats 2.0 - https://phabricator.wikimedia.org/T197103#4283292 (10fdans) p:05Normal>03Triage [16:34:45] 10Analytics-Kanban: Fix issue with prod/labs jars for sqoop - https://phabricator.wikimedia.org/T196737#4283298 (10JAllemandou) [16:35:28] 10Analytics, 10Analytics-Kanban: Scoop jars , automate generation at the beginning of job - https://phabricator.wikimedia.org/T196912#4272521 (10JAllemandou) a:03JAllemandou [16:36:42] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10Services (doing): Move EventStreams to main Kafka clusters - https://phabricator.wikimedia.org/T185225#4283309 (10Ottomata) [16:37:02] 10Analytics, 10Analytics-Kanban, 10Wikimedia-Stream, 10Patch-For-Review: Support timestamp based consumption in KafkaSSE and EventStreams - https://phabricator.wikimedia.org/T196009#4283311 (10Ottomata) Docs added: https://wikitech.wikimedia.org/w/index.php?title=EventStreams&type=revision&diff=1794735&old... [16:37:15] joal: do you have a minute to explain me how to find the stdout? [16:37:26] yessir ! [16:37:29] To the cave? [16:37:50] sure! [16:40:24] elukey: i'm making some lunch real quick, then will you sit with me to add ACLs [16:40:25] should be quick [16:49:40] (03CR) 10Milimetric: [C: 031] "Will test and add another patch, then ping mforns and we can deploy." (032 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/438030 (owner: 10Milimetric) [16:49:58] milimetric, I think I got it [16:50:40] oh, cool, push? [16:50:47] I was just about to test, finished reviewing the code [16:51:11] that user preferences stuff looks too spiffy, all fancy code, I haven't quite groked it yet, everything else looks great [16:53:32] milimetric, yes, it works, will push [16:53:40] sweet [16:53:54] milimetric, do you think the userPreferences "defaultdict" could be moved to utils? [16:54:16] ottomata: going afk for a bit as well! [16:54:43] mforns: I wish there was a lightweight defaultdict implementation somewhere that we can just install. But like... nothing's lightweight anymore [16:55:52] milimetric, I can make that setUserPreference(path, value) generic like set(userPreferences)(path, value) [16:56:13] and move it to util, so we have a "safeObject" implementation [16:56:24] mforns: nah, leave it like this for now [16:56:27] ok :] [16:56:49] (03PS5) 10Mforns: Trying to improve routing logic [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/438030 (owner: 10Milimetric) [16:56:59] there are bigger fish to fry and I think the two of us would probably ship this particular feature in 2021. But it would be AMAZING [16:57:14] xDDDD [16:57:16] yea [16:57:17] :) [16:57:23] ok, it's there [16:58:10] jajaja 2021 [17:03:37] elukey: lemme know when you back [17:05:08] https://www.irccloud.com/pastebin/0n02ZTpE/ [17:05:11] mforns: ^ [17:05:42] ottomata: I am [17:05:54] k! [17:06:11] https://phabricator.wikimedia.org/T196081#4277186 [17:06:12] elukey: ^ [17:06:53] i'm going to add ANONYMOUS to main-codfw [17:06:54] k? [17:07:05] i'm watching kafka server log [17:07:06] s [17:07:43] yep yep [17:08:37] ok, is fine [17:08:41] going to add mirror [17:08:41] I don't remember what happens when you have no acls [17:08:51] everything allowed? [17:08:52] still fine [17:08:53] yes [17:09:01] to be super sure, i'm going to bounce kafka2003 kafka broker, ok? [17:09:12] just to make sure it doesn't have any problems reconnecting and replicating [17:09:18] mforns: never mind that doesn't even work, sorry [17:09:24] it should be fine anyway IIRC [17:09:27] yup [17:09:28] but yes we can do it [17:09:40] i just had an issue in deployment prep i didn't totally explain where it wasn't ok [17:09:44] but, it broke before it got to this point [17:09:51] must have been some weird thing i had wrong there for a short period of tim [17:09:55] just want to be super user [17:09:57] sure [17:10:48] great looks good as expected [17:10:57] super [17:11:07] ok, doing the same in eqiad, not going to bother bouncing a broker there [17:11:30] +! [17:11:31] +1 [17:11:48] cool, done [17:12:05] thanks elukey, if you are fine with it, i'll proceed with mirror maker tls stuff now, but you don't need to stick around for that [17:12:06] thank you! [17:12:15] didn't do much :) [17:12:17] that was easy, just wanted you around in the rare chance it wasn't [17:12:21] yep everything seems ok! [17:12:40] if you need me ping me on hangouts and I'll join in a minute [17:12:48] I am at home anyway [17:14:06] milimetric, the code looks more organized, I think the only thing needed is switch the order of the assign [17:14:28] oh, I always get that wrong, so unintuitive [17:14:29] like override defaults with preferences, not viceversa [17:15:21] (03CR) 10Ottomata: [C: 032] Allow partial whitelisting of map fields [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/437269 (https://phabricator.wikimedia.org/T193176) (owner: 10Mforns) [17:21:16] thanks ottomata :] [17:21:35] * elukey off! [17:23:09] (03PS6) 10Milimetric: Improve routing logic [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/438030 (https://phabricator.wikimedia.org/T179444) [17:23:24] mforns: dude, this is already amazing [17:23:27] like, so smooth [17:23:46] love the clarity you had to refactor bars/lines to "time" [17:23:52] milimetric, :] looks good [17:24:12] (03Merged) 10jenkins-bot: Allow partial whitelisting of map fields [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/437269 (https://phabricator.wikimedia.org/T193176) (owner: 10Mforns) [17:24:32] ok, mforns, nuria_, I'm ready to deploy this [17:24:36] it's kind of epic [17:24:40] milimetric, wanna pair? [17:24:53] well, first I'll put it up on staging [17:24:54] pair-deploy? [17:24:57] ok [17:25:06] so I can hit it with the phone and make sure it's all good [17:25:12] yea [17:25:17] good idea [17:25:23] then I might need to take a break and deploy later, and you might be gone by then [17:25:36] ok no prob [17:26:02] mforns: I'm sorry, have you not done a deploy before? [17:26:18] you can do it if you want, after we test staging [17:26:18] yes [17:26:26] you want me to deploy? [17:26:32] sure! [17:26:34] (03CR) 10Milimetric: [C: 032] Improve routing logic [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/438030 (https://phabricator.wikimedia.org/T179444) (owner: 10Milimetric) [17:27:16] milimetric, just let me know when you've tested it in mobile [17:27:29] mforns: should I merge https://gerrit.wikimedia.org/r/#/c/analytics/wikistats2/+/439978/ ? [17:27:56] (03CR) 10Mforns: [V: 032 C: 032] "LGTM!" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/439978 (https://phabricator.wikimedia.org/T196983) (owner: 10Sahil505) [17:28:04] milimetric, I was doing that now [17:28:05] thanks [17:28:15] cool, grat [17:28:18] *great [17:29:17] (03PS3) 10Milimetric: Corrected Dashboard Metric value CSS [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/439978 (https://phabricator.wikimedia.org/T196983) (owner: 10Sahil505) [17:29:22] (03CR) 10Milimetric: [V: 032] Corrected Dashboard Metric value CSS [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/439978 (https://phabricator.wikimedia.org/T196983) (owner: 10Sahil505) [17:34:11] ok, a-team, bookmarks are ready to test in staging: https://wikistats-canary.wmflabs.org/bookmarks/#/all-projects/reading/total-page-views/normal|line|1-Month~2018040100~2018061400|access~mobile-app*mobile-web [17:35:57] mforns: feel free to deploy, I'll ping if I see anything crazy on my phone, and for now I'll be off for a couple hours [17:36:50] ok, thanks milimetric :] [17:37:25] heh, mforns it’s totally broken on mobile :) [17:37:37] uoooo [17:38:43] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904#4283563 (10Jdlrobson) [17:39:28] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Correct mixed case naming of metrics WIkistats 2.0 - https://phabricator.wikimedia.org/T197103#4283566 (10sahil505) p:05Triage>03Normal [17:39:52] it keeps track of the state but there’s a bug on reading it from the url, must be because some components are hidden [17:40:23] milimetric, I'm seeing now that the production site is also broken in mobile [17:40:40] at least, the wikiselector does not work [17:43:23] mforns: the wiki selector? [17:43:40] yes [17:43:56] it doesn't suggest for me [17:43:58] mforns: in prod? [17:44:07] yes, for mobile [17:44:14] milimetric, mforns works in chrome iphone [17:44:24] mforns: you are using which browser? [17:44:33] nuria_, but does it suggest on edits? [17:44:39] android chrome [17:45:04] I mean, suggest when typing? [17:45:16] mforns: ah one sec [17:45:51] mforns: yes [17:45:59] ha, not for me... [17:45:59] 10Analytics, 10Analytics-Wikistats: Improve UI for ‘All wikis’ & ‘Explore Topics’ dropdown - https://phabricator.wikimedia.org/T196982#4283587 (10sahil505) @Milimetric @Nuria : Thanks for your views on this. Seems like team is quite happy with the pink background, so I'll keep it the same. So, in this task w... [17:46:00] mforns: both safari and chrome [17:46:03] ok [17:46:47] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Improve UI for ‘All wikis’ & ‘Explore Topics’ dropdown - https://phabricator.wikimedia.org/T196982#4283589 (10sahil505) p:05Triage>03Normal [17:46:48] huh, more testing on mobile seems to be a thing [17:50:44] milimetric, what is completely broken on your side, everything else besides the wiki selector works for me [17:50:45] mforns: but it is strange that it is broken on android chrome, do you have latest chrome [17:51:35] nuria_, I have 67.0.3396.81 [17:51:52] should be up to date [17:52:05] mforns: do give this a shot , see what consile says, it takes a minute: https://developers.google.com/web/tools/chrome-devtools/remote-debugging/ [17:52:21] mforns: you just send request to your desktop via usb [17:52:25] mforns: and see console [17:52:49] milimetric: let me take a look at staging [17:54:04] milimetric: did you deploy it? [17:54:27] milimetric: ah, not yet [17:58:44] nuria_: it’s deployed, for some reason the html is cashed hard, go to /bookmarks [18:00:10] milimetric, I see html fine in canary! [18:00:44] nuria_, after following that tutorial, my laptop does not recognize my phone... [18:00:47] back in 1 hr [18:00:49] That root page with the sub-links like mobile and bookmarks is cashed [18:01:39] but I opened bookmarks directly [18:04:57] (03PS1) 10Joal: Update sqoop script to include jar generation [analytics/refinery] - 10https://gerrit.wikimedia.org/r/440382 (https://phabricator.wikimedia.org/T196912) [18:05:34] (03CR) 10Joal: [V: 031] "Tested on cluster with labsdb." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/440382 (https://phabricator.wikimedia.org/T196912) (owner: 10Joal) [18:21:01] 10Analytics, 10Analytics-Kanban, 10Services (watching): Re-enable cross DC mirroring of job and change-prop Kafka topics over TLS - https://phabricator.wikimedia.org/T197254#4283667 (10Ottomata) p:05Triage>03Normal [18:22:04] 10Quarry, 10Cloud-Services: GoogleDocs bot has download 125 000 csv exports in the last month - https://phabricator.wikimedia.org/T197256#4283696 (10Framawiki) [18:23:34] 10Analytics, 10Analytics-Kanban, 10Services (watching): Re-enable cross DC mirroring of job and change-prop Kafka topics over TLS - https://phabricator.wikimedia.org/T197254#4283708 (10Ottomata) I'm a little worried about {T196032} breaking MirrorMaker instances, but I think we should try this and see. [18:26:35] (03PS1) 10Sahil505: Corrected mixed case naming of metrics [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/440387 (https://phabricator.wikimedia.org/T197103) [18:31:07] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904#4283720 (10mforns) Hey all, The source_url has the advantage that it gets project and language_variant the exact way pageview_hourly do... [18:40:24] 10Quarry: Ask python scripts to use custom user agents - https://phabricator.wikimedia.org/T197258#4283738 (10Framawiki) [18:45:24] (03CR) 10Mforns: "I think we should make all metric names capitalized only in the first word, like: "Total page views". They are just metric names, not page" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/440387 (https://phabricator.wikimedia.org/T197103) (owner: 10Sahil505) [18:46:47] 10Analytics, 10Analytics-Kanban, 10Readers-Web-Backlog (Tracking): Change virtualpageview agreggation so it does not use source_url - https://phabricator.wikimedia.org/T197243#4283756 (10mforns) See my comment on the other task: https://phabricator.wikimedia.org/T196904#4283720 [18:51:29] 10Quarry: Ask python scripts to use custom user agents - https://phabricator.wikimedia.org/T197258#4283791 (10Framawiki) [18:53:08] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904#4283794 (10Jdlrobson) That's definitely an option. I'm not sure what the limit would be though - we'd need to fine tune that. @Tbayer an... [18:59:23] 10Quarry: Ask python scripts to use custom user agents - https://phabricator.wikimedia.org/T197258#4283799 (10Framawiki) % of python requests: ``` 60797 42.92% GET HTTP/1.1 / 40367 28.50% GET HTTP/1.1 /login?next=/query/new 20240 14.29% GET HTTP/1.1 /query/new 20214 14.27% GET HTTP/1.1 /query/runs/all ```... [19:04:13] 10Analytics, 10Analytics-Kanban, 10Readers-Web-Backlog (Tracking): Change virtualpageview agreggation so it does not use source_url - https://phabricator.wikimedia.org/T197243#4283811 (10Nuria) ah, that would work too, can @Jdlrobson trim the url? [19:04:42] 10Analytics, 10Analytics-Kanban, 10Readers-Web-Backlog (Tracking): Change virtualpageview agreggation so it does not use source_url - https://phabricator.wikimedia.org/T197243#4283813 (10mforns) @Nuria he responded in the other task. [19:07:18] bearloga: yt? [19:07:28] i will merge this if you can test that cron works as expected [19:14:21] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Cassandra on AQS to 2.2.6-wmf5 - https://phabricator.wikimedia.org/T197062#4283822 (10Eevans) >>! In T197062#4283220, @elukey wrote: >>>! In T197062#4283216, @Eevans wrote: >> * Fix the patch to source a templated file that defines... [19:18:46] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904#4283826 (10mforns) Yea, I also don't know what the limit should be. I think it depends on how large the rest of the schema is. IIRC the... [19:23:42] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904#4283844 (10mforns) Not sure also if base64 would make url's shorter, probably not! :/ [19:24:19] mforns: maybe they could just send the domain without the path? [19:24:23] that's all you need, rigght? [19:24:27] rather than trucating, just send domain? [19:24:46] ottomata, maybe for some requests, it looks at other things [19:24:55] oh ya? [19:25:05] but you have source_page_title, right? [19:25:20] like API ones, but yea, that schema does not have those [19:25:28] yes [19:26:45] I thought of 1000, cause if the URL is reasonable, it might be of use later, but yea [19:29:38] !log try rerunning webrequest-load-wf-upload-2018-6-14-11 [19:29:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:05:54] (03Abandoned) 10Joal: Correct sqoop-jar-generation script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/438213 (owner: 10Joal) [20:36:56] 10Analytics: Report updater should support Graphite mapping plugins - https://phabricator.wikimedia.org/T152257#4284061 (10mpopov) [20:40:01] Team, the pageview job I tried to relauch has failed again with the same reason [20:40:21] I'm going to investigate tomorrow [22:22:46] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Partially purge MobileWikiAppiOSUserHistory eventlogging schema - https://phabricator.wikimedia.org/T195269#4284298 (10chelsyx) I'm so sorry for the late response. Things has been a bit crazy on my end... @mforns Yes, I agree with you that with `os_minor... [23:18:29] 10Analytics: turnilo x axis improperly labeled - https://phabricator.wikimedia.org/T197276#4284396 (10JKatzWMF)