[00:12:04] bearloga: reworked friendly intro and friendly 1st paragraph, please take a look cc joal [00:13:27] 10Analytics: Pivot "MediaWiki history" data lake: Feature request for "Time" dimension to sp\lit by calendar month / quarter / year -- needs druid 0.10 - https://phabricator.wikimedia.org/T161186#3876892 (10Jdforrester-WMF) >>! In T161186#3875841, @fdans wrote: > https://superset.wikimedia.org/ > > We can creat... [00:15:57] James_F: ottomata told me that iam admin and can create users let me see if i can do it through ui [00:17:51] James_F: what is your username in ldap? [00:18:26] 10Analytics: Pivot "MediaWiki history" data lake: Feature request for "Time" dimension to sp\lit by calendar month / quarter / year -- needs druid 0.10 - https://phabricator.wikimedia.org/T161186#3124009 (10Nuria) @Jdforrester-WMF username in ldap? [01:42:31] 10Analytics, 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-kafka03 due to full disk - https://phabricator.wikimedia.org/T184235#3877064 (10Krenair) p:05Triage>03Normal [01:43:05] 10Analytics, 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-kafka03 due to full disk - https://phabricator.wikimedia.org/T184235#3877064 (10Krenair) ```krenair@deployment-kafka03:~$ sudo puppet agent -tv Warning: Setting configtimeout is deprecated. (at /usr/lib/ruby/vendor_ruby/pup... [01:46:46] 10Analytics, 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-kafka03 due to full disk - https://phabricator.wikimedia.org/T184235#3877076 (10Krenair) 2.7G /var/log/daemon.log 2.6G /var/log/daemon.log.1 221M /var/log/kafka/controller.log 257M /var/log/kafka/kafka-mirror-main-deployment-pr... [01:54:53] 10Analytics, 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-kafka03 due to full disk - https://phabricator.wikimedia.org/T184235#3877117 (10Krenair) Repeat of T174742 ? [08:24:01] hello people! [08:24:15] alter table reached 244/314 [08:24:46] hope that the last 70 tables will be done by EOD so we'll be able to re-enable eventlogging before the weekend [08:24:49] that would be awesome [08:34:46] elukey: +1 ! [08:34:48] Hi elukey :) [08:38:21] elukey: I deployed yesterday, will restart mediacounts archive (even if the job that failed yesterday actually managed to finish ... There some thing sI don't understand here) [08:41:02] weird! Might have been on the edge of failing for OOM and succeeding [08:41:56] elukey: I think that;s the thing [08:42:46] elukey: I think that when cluster is under pressure, node-managers are more eager to track jobs that don't keep their limits well, while when there is no resource fight they might be less strict [08:44:38] elukey: Yesterday's deploy also mean we have the jar for automation of Banner-stream job :D [08:44:48] * joal run and hides [08:46:31] :) [08:48:50] I'd need to do some refactoring for the refinery stuff first, but I'll do it asap [08:48:58] I am fighting with hadoop in labs now [08:49:05] because puppet is acting weirdly [08:49:15] elukey: please please please let me know if there is anything I can help with [08:50:31] I think I found the issue, it might be in the labs puppetmaster [08:50:34] it is not in sync [08:50:40] * elukey cries in a corner [08:56:36] * joal offers some nice coffee to elukey [09:20:24] joal: new kernel on analytics1030, shall we drain + reboot and see what changes? [09:20:34] elukey: +1 ! [09:22:16] all right doing it [09:35:31] joal: https://security.googleblog.com/2018/01/more-details-about-mitigations-for-cpu_4.html [09:40:01] elukey: http://www.commitstrip.com/en/2018/01/04/reactions-to-meltdown-and-spectre-exploits/ [09:40:31] ahahahah [09:40:41] :D [09:44:32] supr good article, thanks elukey ::) [09:53:11] analytics1030 up and running [09:53:29] elukey: monitoring from grafana [09:54:14] KPTI is active there, verified with "sudo dmesg | grep isolation" [09:54:25] great moritzm [09:54:35] elukey: hadoop tells me the machine already runs ontainers [10:38:07] ah now I found why the cluster doesn't bootstrap [10:38:14] firewall rules for prod applied to labs [10:40:38] 10Analytics-EventLogging, 10Analytics-Kanban, 10Tracking: Update client-side event validator to support (at least) draft 3 of JSON Schema - https://phabricator.wikimedia.org/T182094#3877689 (10phuedx) [10:42:41] elukey, joal: looking at the CPU/load metrics of analytics1030: https://grafana-admin.wikimedia.org/dashboard/file/server-board.json?refresh=1m&orgId=1&var-server=analytics1030&var-network=eth0&from=now-3h&to=now and https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cluster=analytics&var-instance=analytics1030 [10:43:07] it seems highly variable (maybe due to differnet jobs running), but not fundamentally more loaded than before the reboot [10:43:36] but I have really no idea on usual load patterns, input welcome [10:44:38] moritzm: I was monioring that as well - Very difficult indeed to see if there are performance impacts [10:45:28] I don't see dramatic changes too [10:52:10] I'd say we keep an eye on it throughout the day and if it stays as it is, hadoop is good to go for reboots next week? [10:52:30] can we reboot a kafka node or does that suck too much on a Friday? [10:52:32] works for me moritzm :) [10:52:53] maybe one of the kafka-jumbo ones which are not fully in production? [10:55:06] moritzm: kafka-jumbo would work better :) [10:55:18] feel free to install the updates on kafka-jumbo1001 [10:56:29] doing that now [10:57:53] kafka-jumbo1001 upgraded [10:58:36] moritzm: whenever you have time, would you mind to review https://gerrit.wikimedia.org/r/#/c/402323/1 ? [11:00:54] having a look [11:03:45] added a comment [11:04:58] ahhhh didn't know that! [11:05:40] those ferm constants are generated from constants.pp [11:05:52] moritzm: but I don't know in advance the IPs/hostnames of the clusters, since I build them on the fly to test features [11:06:42] but you could just reuse the definition for labs_networks, there, then all of WMCS is granted access (which seems fine) [11:08:37] 10Analytics-EventLogging, 10Analytics-Kanban, 10Tracking: Update client-side event validator to support (at least) draft 3 of JSON Schema - https://phabricator.wikimedia.org/T182094#3877734 (10phuedx) [11:11:41] moritzm: so something like [11:11:41] 'analytics' => $labs_networks, [11:11:41] 'druid_public_hosts' => $labs_networks, [11:13:59] or maybe 'analytics_networks' [11:18:15] I think so, but best to ask Alex for confirmation, he's the most familiar with constants.pp [11:19:20] thanks! [11:49:32] unrelated news, yesterday was my wikimedia birthday [11:49:35] time flies people [11:52:24] * elukey lunch! [12:02:14] Happy wikibirthday elukey :) [12:46:51] ;) [12:46:57] rebooting kafka-jumbo1001 now [13:05:24] (03PS1) 10Fdans: Update aqs to 23cb4de [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/402352 [13:05:55] joal: “A CPU predicts you will walk into a bar, you do not. Your wallet has been stolen” [13:05:58] — The Internet [13:06:08] :D [13:06:18] it is so brilliant [13:06:24] It is indeed !!! [13:08:48] joal: is it ok for me to merge this? I just verified submodule hash [13:08:50] https://gerrit.wikimedia.org/r/#/c/402352/ [13:15:54] fdans: if hash is good, you can go :) [13:16:14] merciiii [13:16:45] (03CR) 10Fdans: [V: 032 C: 032] Update aqs to 23cb4de [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/402352 (owner: 10Fdans) [13:18:41] !log deploying AQS [13:18:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:19:45] revisiting analytics1030 after some more time at https://grafana.wikimedia.org/dashboard/db/prometheus-machine-stats?orgId=1&from=now-6h&to=now-1m&var-server=analytics1030&var-datasource=eqiad%20prometheus%2Fops it seems performance is in fact not measurably changed [13:20:45] I agree, no relevant changes afaict. One host is also not enough though, but it is a good indicator that we can start the rollout next week [13:20:54] I [13:21:08] I'll try to reboot other 9 hosts by Tue [13:21:17] and then let it boil for a couple of days more [13:26:05] ok! given that all our hosts support PCID, I don't expect any significant surge either [13:26:20] elukey: I'm getting permission denied on aqs1004 when running scap deploy [13:26:25] https://www.irccloud.com/pastebin/1AjqnyNh/ [13:26:30] Taking a break folks - see you laer [13:26:35] this sounds familiar? <3 [13:28:41] (brb - 5min) [13:33:37] fdans: try to ssh to aqs1004 from tin, it should ask you to accept the ssh key [13:33:44] type "yes" and retry [13:34:28] but it might not work, it is indeed a known issue, checking what I did the last time [13:37:02] fdans: ? [13:38:26] also when you do scap deploy please log a after that [13:38:32] otherwise in #ops you'll see fdans@tin Started deploy [analytics/aqs/deploy@792c95d]: (no justification provided) [13:38:36] :) [13:47:47] thank youuu elukey sorry, had to go out to do a quick errand [13:47:52] gonna try this now [13:49:23] elukey: hm, I don't think I have a key in tin, because it's asking me for a password [13:50:14] sure, the goal was only to cache the part of aqs1004 [13:50:27] quit ssh and try to deploy [13:53:25] elukey: same thing I'm afraid [13:55:45] fdans: can I try? [13:55:53] is the repo ready for the deploy? [13:55:56] elukey: prego! [13:55:59] git pull, submodules, etc.. ? [13:56:00] yes it is [13:56:04] okok [13:58:30] fdans: it works for me, so there is something weird with your user. It has already happened to me, I need to figure out how I solved it :) [13:58:40] the next issue is that I had to rollback on aqs1004 [13:58:41] 13:56:51 [aqs1004.eqiad.wmnet] Check 'endpoints' failed: /analytics.wikimedia.org/v1/pageviews/top-by-country/{project}/{access}/{year}/{month} (Get top countries by page views) is CRITICAL: Test Get top countries by page views returned the unexpected status 404 (expecting: 200) [14:00:06] goddammit [14:00:22] thank you elukey [14:00:47] ah wait you are not in the deploy-aqs group [14:01:35] ahhh there you go, I think you can't access the keyholder Fran [14:01:41] hence the ssh failure [14:01:44] it makes sense [14:02:24] oh I see [14:03:03] I'm getting all crazy about the error below tho [14:20:02] it might have been a temp issue, let's discuss it with joal [14:23:51] ottomata o/ [14:24:05] whenever you have time I'd have a question about zk on the master nodes [14:24:45] so zkfc init fails due to the absence of /usr/lib/zookeeper/bin/zkCli.sh [14:25:10] and afaics that file is only in the cdh pkg version of zookeeper, not in the debian on [14:25:14] *one [14:25:30] but apt-cache policy on my labs host seems to prefer the debian one [14:25:33] over the cdh one [14:29:43] o/ [14:30:44] hello :) [14:30:55] so I am confused by apt-cache policy [14:33:56] now that I think about it, we used to force a zookeeper version for the clients IIRC [14:36:30] fdans: any luck? [14:37:22] elukey: I'm working on the restbase patch until we can discuss it with joal, should I try again? [14:38:02] elukey: [14:38:09] not sure if this is your problem [14:38:13] but cdh has a version of zookeeper [14:38:14] fdans: ack [14:38:16] that hadoop depends on [14:38:24] so hadoop clients need that installed [14:38:30] but we use debian's package for zookeeper server [14:39:07] yeah I remember that, but on my labs nodes apt-cache policy prefers the debian version, not the cdh one [14:39:24] are you running zookeeper there too? [14:39:24] hmmm [14:40:12] it is on a separate host [14:40:15] hm [14:41:04] elukey: when is this happening? when running puppet to install hadoop? [14:41:17] yep exactly [14:41:41] I was debugging why zkfc wasn't starting and /usr/lib/zookeeper/bin/zkCli.sh showed up as no such file or etc.. [14:42:10] and dpkg -S /usr/lib/zookeeper/bin/zkCli.sh showed the diff (it is in the cdh version but not on the debian one) [14:42:32] ahhhh [14:42:33] elukey@hadoop-master-2:/etc/apt/sources.list.d$ dpkg -S zkCli.sh [14:42:33] zookeeper: /usr/share/zookeeper/bin/zkCli.sh [14:43:21] ? [14:43:38] oh you were just looking in the wrong place? [14:43:39] orrrr? [14:43:41] this is the debian version, /usr/share vs /usr/lib [14:43:46] ah [14:43:49] but you need the cdh one? [14:44:21] yeah, from the comments that (you ?) put in puppet it should prevent issues like java.lang.NoClassDefFoundError: org/apache/zookeeper/KeeperException [14:44:36] ya hm [14:44:41] e.ma just showed to me a trick with dpkg [14:44:43] dpkg --compare-versions 3.4.5+cdh5.10.0+104-1.cdh5.10.0.p0.71~jessie-cdh5.10.0 gt 3.4.5+dfsg-2+deb8u2 [14:45:29] 3.4.5+dfsg-2+deb8u2 might be preferred because 'gt' than the cdh one? [14:46:38] elukey: trying to find...where is my comment? :) [14:47:06] cdh::hadoop::nodemanager :D [14:47:25] 'Install the CDH [14:47:26] # zookeeper package here explicitly.' [14:47:26] * fdans lunch 🙈 [14:47:32] i don't see the explicit version... [14:47:36] time for git blame! [14:47:59] I am betting it is my fault [14:48:28] hmmmm no way [14:48:46] I remember that we had it in hiera [14:48:56] we did? [14:48:58] but it isn't set... [14:49:16] joal: when you said "Time boundaries - [start; end [" did you mean [start, end) ? (as in, inclusive of start, exclusive of end?) [14:49:28] ahhh no no ottomata [14:49:29] # To avoid version conflics with Cloudera zookeeper package, this [14:49:29] # class manually specifies which debian package version should be installed. [14:49:30] elukey: is it possible the zk version has changed in apt? [14:49:32] profile::zookeeper::zookeeper_version: '3.4.5+dfsg-2+deb8u2' [14:49:56] yeah hm, but that will only affect zookeeper server stuff, right? [14:50:01] not hadoop clients [14:50:12] yep exactly, this is what I was remembering, so partially related [14:50:15] aye [14:50:17] ok [14:50:24] i wonder if the debian version got updated in apt [14:50:26] and you are right [14:50:31] in that now it is newer so has priority [14:50:32] ? [14:50:52] https://apt.wikimedia.org/wikimedia/pool/main/z/zookeeper/ has mtime may 2017 [14:50:56] dunno the last time we had to do this... [14:52:02] elukey: i betcha thats it [14:52:17] i'm looking in email and i have reprepro changes notifcations for that package may 31 [14:52:35] dunno what version was there before, but iirc we got it from upstream debian apt, not ours [14:52:39] so someone updated it in ours (was it me?!) [14:53:30] ah no it was moritzm! IIRC there was a CVE for 4-letters commands causing a potential ddos to zookeeper [14:54:02] ahhh! [14:54:04] ok cool [14:54:04] so [14:54:06] hm [14:54:13] i guess we need to specifcy version for hadoop then too :/ [14:54:51] elukey: maybe we need a simple cdh::zookeeper class with $ensure paramter [14:54:55] and we can set $ensure to version [14:55:05] and set it in hiera somewhere, HOPEFULLY in only one palce [14:55:15] I updated zookeeper both in Debian and for apt.wikimedia.org: https://lists.debian.org/debian-security-announce/2017/msg00131.html [14:55:38] and I've set that version in https://gerrit.wikimedia.org/r/#/c/354449/42/hieradata/role/common/configcluster.yaml [14:55:41] okok now I remember [14:56:06] moritzm: basically that version is now preferred over 3.4.5+cdh5.10.0+104-1.cdh5.10.0.p0.71~jessie-cdh5.10.0 :( [14:56:06] ya, but we need the cdh version for cdh [14:56:09] actually>>>..>> [14:56:15] those are both 3.4.5 [14:56:17] hmmm [14:56:26] oh no, the cdh version installs in to /usr/lib, right? [14:56:33] yeah, and hadoop will expect the jars to be there [14:56:46] I can see this for zkfc Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/zookeeper/KeeperException [14:56:51] that is what we expected [14:57:29] aye, but the debian version is isntalled elukey? [14:57:43] elukey: alternatively, we could mayyyybe get away with symlinking the debian jar in somewhere... [14:58:20] strange that it can't find zookeeper jar in the regular /usr/share/java classpath though [14:58:39] i dunno, probably best to just ensure cdh version is installed via pupet [14:58:41] it uses 3.4.5+dfsg-2+deb8u2 [14:59:22] aye, elukey if you want, see what happens if you do something like [14:59:41] mv /usr/lib/hadoop/lib/zookeeper.jar /tmp/ # This is a symlink anyway, probably broken currently [14:59:55] ln -sf /usr/share/java/zookeeper.jar /usr/lib/hadoop/lib/zookeeper.jar [15:00:11] see if that error goes away [15:00:20] so I tried dpkg -L on analytics1002 and in labs [15:00:25] for the zookeeper package [15:00:29] actually, even if ^^ works [15:00:34] on an1002 I can see a lot of jars [15:00:36] in labs no [15:00:52] its probably a bad idea, because you'd have to symlink it in more places [15:00:58] e.g. /usr/lib/hive/lib [15:00:59] etc. [15:01:08] sooo, best to ensure cdh version is installed with puppet yaaaa [15:01:43] elukey: may I try and do ^ real quick in puppet? [15:02:43] ottomata: sure! [15:04:12] * elukey grabs a coffee [15:06:03] does the cdh release contain the security fixes for CVE-2017-5637? then we might also move to them [15:06:39] moritzm: when were those released? 2017 may? [15:06:58] if so, probably not, the cdh packages haven' been updated in a while afaik [15:07:13] but, we don't run zookeeper servers with the cdh packages [15:07:16] just use client libs [15:07:32] ah, ok. I thought you needed the server [15:08:00] yeha, we'd use the debian ones if we could [15:08:08] but cdh and debian install the .jars into different locations [15:08:26] /usr/lib/zookeeper vs/usr/share/java [15:08:39] and all the stuff builds classpaths expecting it to be in /usr/lib/zookeeper [15:08:42] elukey: CPU/load on kafka-jumbo1001 seems fine with the new kernel: https://grafana.wikimedia.org/dashboard/db/prometheus-machine-stats?orgId=1&from=now-3h&to=now-1m&var-server=kafka-jumbo1001&var-datasource=eqiad%20prometheus%2Fops [15:08:42] cdh stuff* [15:08:46] agreed? [15:08:47] :) [15:09:03] moritzm: those boxes aren't yet used much anyway [15:09:05] so hard to tell [15:09:10] but i'm for proceeding with the others :) [15:09:30] elukey: , when you get back, maybe we can talk about this hadoop profile yaml stuff [15:09:40] its bothering me as I try to do zk [15:09:48] and i don't know how some of it works... [15:10:58] sure, we picked these as a test since we didn't want to break a kafka/analytics node on a Friday, but the main point is that there's no visible spike, we'll have a closer look after the first real world kafka node is running KPTI [15:12:33] gr8 [15:15:53] 10Analytics-Cluster, 10Analytics-Kanban, 10Language-Team, 10MediaWiki-extensions-UniversalLanguageSelector, and 3 others: Migrate table creation query to oozie for interlanguage links - https://phabricator.wikimedia.org/T170764#3878438 (10Milimetric) 05Resolved>03Open [15:25:37] moritzm: ack! [15:25:52] ottomata: sure, do you want to bc ? [15:46:16] db1107 status 250/314 alter tables done (one popup table alter took 6h) [15:46:51] so I guess that it will finish tomorrow morning, I'll have to re-enable EL tomorrow [15:56:35] holaaa [15:57:16] o/ [15:58:55] a-team: if you are ok I'd need to leave in ~1h today, so I'd skip standup and then send e-scrum [15:59:16] sure elukey np for me [16:00:41] np man, have a nice weekend [16:03:15] elukey: let's batcave! [16:03:31] elukey: ok to you if I retry scap deploy? [16:04:43] ah it works ok [16:04:46] had wrong priority in my test [16:04:48] ok patch coming... [16:04:51] fdans: I'd avoid to do it on a Friday buuut if Andrew is ok you can proceed :) [16:05:01] ottomata: shall we batcave? [16:05:14] not sure if you need me or not :D [16:05:20] yeah that makes sense elukey will hold until Monday [16:05:48] <3 [16:05:55] elukey: i'm cool if you are ok with me merging stufff...:) [16:06:56] ottomata: I am cool with that, but I'd disable puppet where zookeeper is installed just in case, deploy on one and then run apt-get update && apt-cache policy just in case [16:07:07] (it is really easy with cumin, I can do it now) [16:07:21] hmm actually elukey...i'm not going to mess with it today, it is friday as you say [16:07:28] and if you are almost done your workday you don't need it in labs yet, ya? [16:07:40] https://gerrit.wikimedia.org/r/#/c/402370/ [16:08:16] oops standby... [16:10:10] seems ok, Class['::profile::cdh::apt_pin'] -> Exec['apt-get update'] -> Class['::cdh::hadoop'] is a bit long but should be safer [16:10:29] yeah, its what we use in profile common for cdh::apt [16:10:37] i'd prefer if we could pin in common [16:10:45] buut, i don't want to pin on stat boxes, etc. [16:11:04] i have some refactoring ideas here...we'll see ;) [16:11:07] joal or mforns you around? Got a weird oozie bug I don't understand [16:11:25] ottomata: sure :) Do you want me to test that on Monday morning? [16:11:34] ya! :) [16:11:44] I'm here, but I need a couple mins to set my camera env up, I'm in the dark right now [16:11:48] elukey: i manually put the pin file on hadoop-master-1 and ran apt-get update [16:11:53] milimetric, ^ [16:11:53] now: [16:11:58] np mforns [16:11:59] thx [16:12:06] gimme 5 mins [16:12:09] apt-show-versions | grep zookeeper [16:12:12] zookeeper:all/jessie-wikimedia 3.4.5+dfsg-2+deb8u2 downgradeable to 3.4.5+cdh5.10.0+104-1.cdh5.10.0.p0.71~jessie-cdh5.10.0 [16:12:21] niceee [16:13:06] just installed zookeeper, works fine [16:13:59] gr8 [16:15:07] applied to hadoop-master-2, now let's see if the cluster is up [16:15:41] Notice: /Stage[main]/Cdh::Hadoop::Namenode/Exec[hadoop-hdfs-zkfc-init]/returns: executed successfully [16:18:19] milimetric, omw to bc [16:18:51] yeehaw [16:18:58] (I'm there) [16:20:38] (03CR) 10Fdans: "Just a few minor comments about naming/typos, rest looks good!" (034 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/401814 (https://phabricator.wikimedia.org/T183192) (owner: 10Nuria) [16:21:14] elukey@hadoop-master-1:~$ sudo -u hdfs hdfs dfs -ls / [16:21:14] Found 3 items [16:21:14] drwxrwxrwt - hdfs hdfs 0 2018-01-05 16:17 /tmp [16:21:14] drwxr-xr-x - hdfs hadoop 0 2018-01-05 16:18 /user [16:21:14] drwxr-xr-x - hdfs hdfs 0 2018-01-05 16:17 /var [16:21:17] \o/ [16:22:16] 10Analytics, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Data request for logs from SparQL interface at query.wikidata.org - https://phabricator.wikimedia.org/T143819#3878555 (10Nuria) I think notes look good. @mforns main point that I missed is that we probably also want to remove geolocation fro... [16:22:21] Good job elukey ! [16:22:46] milimetric: first question: Yes, to [start, end[ is (inclusive, exclusive) [16:23:05] milimetric: I didn't know the [ ) style :) [16:23:31] milimetric: Now that I'm here, can I help with oozie? [16:23:48] fdans: Have you found the issue for AQS not willing to deploy? [16:23:49] joal: I'm talking to mforns but if you wanna help you can this one looks weird [16:23:56] sure - in da cave? [16:24:32] joal: nope, I mean it's giving me a mean error about the endpoint responding with a 404 [16:24:55] but the tests in aqs are all okay and the data is in cassandra so I don't understand what's going on [16:29:25] hm fdans [16:29:40] fdans: We'll review that on monday, with some logging in place [16:29:50] sounds good joal [16:30:25] could be related to cassandra schema creation for instance, or some other weird things happenning in AQS [16:34:36] (03PS1) 10Milimetric: Fix bad input events [analytics/refinery] - 10https://gerrit.wikimedia.org/r/402379 (https://phabricator.wikimedia.org/T170764) [16:35:28] (03CR) 10Milimetric: "Apologies for this mistake, the data has to be re-created from the beginning. The query was fine but the oozie job was wrong." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/402379 (https://phabricator.wikimedia.org/T170764) (owner: 10Milimetric) [16:38:01] ottomata: https://gerrit.wikimedia.org/r/#/c/402382/1 - makes sense? [16:38:09] it is the only puppet thing that breaks now [16:51:09] all right leaving for today people, will check later on! [16:51:11] * elukey off! [16:51:39] 10Analytics-Kanban, 10Analytics-Wikistats: Wrong y-axis labels on wikistats graph - https://phabricator.wikimedia.org/T184138#3878596 (10fdans) a:03fdans [16:51:44] Bye elukey :) [16:51:44] 10Analytics, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Data request for logs from SparQL interface at query.wikidata.org - https://phabricator.wikimedia.org/T143819#3878597 (10JAllemandou) @Nuria , @Smalyshev : Given all wikidata-query tagged rows belong in misc, which is super small, I have no o... [16:51:47] Care your wallet :) [16:57:07] (03PS1) 10Mforns: [WIP] Improve WikiSelector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/402387 (https://phabricator.wikimedia.org/T179530) [17:00:27] holaaa elukey ottomata milimetric [17:00:41] holaa fdans [17:00:44] standduppp [17:01:32] ottomata: coming? [17:01:43] joal:? [17:03:36] OO [17:03:55] sorryyy [17:04:07] (03CR) 10Joal: [C: 032] "Good for me! Please merge whenever you want" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/402379 (https://phabricator.wikimedia.org/T170764) (owner: 10Milimetric) [17:10:55] 10Analytics, 10Analytics-Wikimetrics: Problem opening Alfagems cohort - https://phabricator.wikimedia.org/T95530#3878630 (10Aklapper) Hi @Kipala! You assigned this issue to yourself a while ago. Could you please share a status update? Are you still working (or still plan to work) on this issue? Is there anythi... [17:15:30] James_F: what's your wikitech ldap username? [17:17:20] nuria_ mforns I shouted felices reyes at the last second in standup :) [17:17:30] hehehe [17:17:32] fdans: MY FAVORITE Day [17:17:38] fdans: I SO MISS IT [17:18:06] James_F: I think it is JForrester [17:18:06] mforns is actually celebrating them on the 4th of january because they are from oriente [17:18:12] i added you in superset if you want to try to log in [17:18:14] xD [17:18:27] yes, they pass earlier here :] [17:18:45] they can do anything! [17:18:54] they are here and there AT THE SAME TIME [17:18:55] I'm off to A Coruña in a lil bit, family tradition [17:18:58] 10Analytics, 10Analytics-EventLogging: Multiple user_ids per username in account creation events from ServerSideAccountCreation log - https://phabricator.wikimedia.org/T68101#3878662 (10Aklapper) @Tnegrin: This issue has been assigned to you a while ago. Could you please share a status update? Are you still wo... [17:19:06] last year I was on my way to SF, but not this time :) [17:20:47] nuria_: the Reyes Magos aren't the parents, they're actually a cluster of distributed gift delivery nodes [17:21:25] 10Analytics, 10Analytics-EventLogging: Multiple user_ids per username in account creation events from ServerSideAccountCreation log - https://phabricator.wikimedia.org/T68101#3878671 (10Nuria) a:05Tnegrin>03None [17:22:31] 10Analytics, 10Analytics-EventLogging: Multiple user_ids per username in account creation events from ServerSideAccountCreation log - https://phabricator.wikimedia.org/T68101#685364 (10Nuria) Not sure who owns this now but likely to toby. Removing hi, adding @Tgr just in case he knows, removing analytics tag [17:25:31] (03CR) 10Nuria: "Let's plan on deploying this next week." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/402379 (https://phabricator.wikimedia.org/T170764) (owner: 10Milimetric) [17:50:38] (03PS1) 10Ottomata: Update build_wheels.sh to python3; update build artifacts for python3 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/402397 (https://phabricator.wikimedia.org/T182688) [17:52:06] 10Analytics, 10CirrusSearch, 10Discovery, 10Discovery-Search: Load cirrussearch data into druid - https://phabricator.wikimedia.org/T156037#3878750 (10TJones) This ticket came up again in a discussion earlier in the week, and we decided that adding a few more use cases wouldn't hurt, even if we don't work... [18:09:47] (03PS2) 10Ottomata: Update build_wheels.sh to python3; update build artifacts for python3 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/402397 (https://phabricator.wikimedia.org/T182688) [18:10:25] (03PS3) 10Ottomata: Update build_wheels.sh to python3; update build artifacts for python3 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/402397 (https://phabricator.wikimedia.org/T182688) [18:10:51] (03CR) 10Ottomata: [V: 032 C: 032] Update build_wheels.sh to python3; update build artifacts for python3 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/402397 (https://phabricator.wikimedia.org/T182688) (owner: 10Ottomata) [18:20:18] ottomata: It's jforrester, though I think thanks to the magic of MediaWiki some things think I'm [Jj]Forrester and some think I'm [Jj]forrester. My shell account is "jforrester". [18:21:37] James_F: cool, I added you as JForrestor, maybe try that first? [18:21:51] although, I'm working on superset now, it might be broken a bit. but you shoudl be able to log in i think [18:24:11] ottomata: Yup, I'm in. Thanks! [18:24:38] coool [18:24:57] James_F: as I said, i'm wroking on something, you might see errors atm [18:25:16] Kk. [18:27:44] James_F: You can copy a dashboard and maker it your own [18:28:00] James_F: if you try to modify one of the existing ones it iwll give you that option [18:28:40] James_F: it is a bit dry but at the same time much more powerful than pivot [18:28:54] * James_F nods. [18:29:57] James_F: data lake data is accessible just like it is in pivot, much less friendly than pageview data for sure, let us know if you need help [18:30:08] I'll have a play. [18:33:12] 10Analytics, 10Analytics-EventLogging: Multiple user_ids per username in account creation events from ServerSideAccountCreation log - https://phabricator.wikimedia.org/T68101#3878798 (10Tgr) @Nuria it's not clear what needs to be owned here. Do you want to clean up past logs somehow? Or do you think this is st... [18:35:53] 10Analytics, 10Analytics-EventLogging: Multiple user_ids per username in account creation events from ServerSideAccountCreation log - https://phabricator.wikimedia.org/T68101#3878801 (10Nuria) The initial proble is a bunch of accounts geberated with same user-name, that seems to be a problem. found via logging... [18:36:26] o.O [18:36:33] milimetric, ready to talk about DevSummit [18:37:47] sorry halfak [18:42:56] joal: read this one https://wikitech.wikimedia.org/wiki/User:Joal/Clickstream [18:43:30] joal: but i think we still need a bit ore substance (if small) for blogpost, even if we include that at the end as an example of how-to [18:44:06] joal: on more jupyter notebook style [18:47:25] nuria_: https://superset.wikimedia.org/superset/dashboard/4/ is early days but pretty impressive. [18:51:15] 10Analytics, 10Analytics-EventLogging: Multiple user_ids per username in account creation events from ServerSideAccountCreation log - https://phabricator.wikimedia.org/T68101#3878823 (10Tgr) The query does not give me any dupes. Presumably the issue is not happening anymore and old EventLogging entries have al... [18:51:16] James_F: looking [18:52:41] James_F: WOW you got that done fast, do evangelize on your team please Deskana might be interested too, we are working on user creation and will announce it more broadly in the next couple weeks [18:53:21] nuria_: Totally. The system is a bit unwieldy as you say, but once you understand the source/slice model it works out. [18:54:26] James_F: ya, not my favorite I have to say but best we can do with OS [18:54:40] James_F: if you run a workshop for PMS on it it will be AWESOME [18:54:53] James_F: we can create users before hand as needed be [18:56:06] * James_F nods. [18:56:32] The key thing is getting the right data into the system as sources and picking out what needs showing. ;-) [18:57:01] James_F: right, which for edits is a lot harder than pageviews [18:57:59] 10Analytics, 10Analytics-EventLogging: Multiple user_ids per username in account creation events from ServerSideAccountCreation log - https://phabricator.wikimedia.org/T68101#3878865 (10Nuria) 05Open>03Resolved [19:03:43] 10Analytics: Make superset more scalable - https://phabricator.wikimedia.org/T182688#3878870 (10Ottomata) Ok! Currently running python3.4 and with async gthread worker mode. Things seem snappier...and MySQL locking isn't happening, sooo GREAT! Along the way, I ran into some nasty python2 -> python3 database c... [19:12:33] 10Analytics, 10Patch-For-Review: Make superset more scalable - https://phabricator.wikimedia.org/T182688#3878886 (10Ottomata) [19:12:54] 10Analytics-Kanban, 10Patch-For-Review: Make superset more scalable - https://phabricator.wikimedia.org/T182688#3831095 (10Ottomata) [19:13:22] 10Analytics-Kanban, 10Patch-For-Review: Make superset more scalable - https://phabricator.wikimedia.org/T182688#3831095 (10Ottomata) https://gerrit.wikimedia.org/r/#/c/402397/ [19:14:47] ok James_F I'm good for now, it should be working! lemme know if you run into any problems [19:15:42] (03PS5) 10Nuria: Replacing JSON download with CSV download [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/401814 (https://phabricator.wikimedia.org/T183192) [19:18:21] (03PS6) 10Nuria: Replacing JSON download with CSV download [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/401814 (https://phabricator.wikimedia.org/T183192) [19:18:55] (03CR) 10Nuria: Replacing JSON download with CSV download (032 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/401814 (https://phabricator.wikimedia.org/T183192) (owner: 10Nuria) [19:19:06] nuria_: i think superset is a wee better now! [19:19:09] seems snappier to me... [19:19:25] ottomata: nice, does it have user creation or we create those? [19:19:39] no [19:19:53] we have to wait for both flask-app-builder and superset to make releases i think [19:19:54] or fork them... [19:21:59] the other thing is celery workers [19:22:05] not sure if we should do that or not [19:22:05] i guess we'll need to [19:33:49] ottomata: celery for processing what? [19:34:34] async queries [19:34:36] long running ones [19:34:46] https://superset.incubator.apache.org/installation.html#sql-lab [19:39:29] take it back James_F, i'm still messing with it :) [19:43:57] nuria_: , yt? [19:44:08] ottomata: i see yes [19:44:21] ottomata: i am trying to get a dashboard together and feeling like a moron [19:44:26] can you try logging into superset? [19:44:32] does it still work for you? [19:44:48] i'm going to change the login to use shell username rather than wikitech username [19:45:01] this will help with hadoop authentication, like Hue [19:46:32] nuria_: ^? [19:46:33] ottomata: i ma logged in, want me to log out? [19:46:39] *i am [19:47:20] hmm, ok so case insensitive, i think your ldap and shell usernames are the same [19:47:31] nuria_: , and now? [19:47:33] still good? [19:47:53] ottomata: still logged in [19:48:25] ok great [19:48:40] ok nuria_ i need some help :) [19:48:47] ottomata: yessir [19:49:12] hmm one min... [19:50:12] o/ [19:50:21] hola elukey [19:50:30] Ah, nm i think i'm ok [19:50:36] i thought iw as going to lock myself out changing my username [19:50:37] but i'm cool [19:50:51] great ok [19:50:58] now now now, how to make hive work..HMMMmMMm [19:52:04] ottomata: I think we can leave that for later [19:52:32] ottomata: let's get it so users create their dashboards with pageview/edit data on druid [19:52:45] ottomata: once that has widespreda use we can do hive [19:52:56] ottomata: let me know what you think [19:53:14] wanna poke around for a few mins [19:53:18] not going to deal with celeery stuff [19:53:19] for now [19:55:23] ottomata: creating dashboard as of now [19:55:36] ottomata: will be on meeting for next 30 mins [19:56:10] cool [19:56:12] thanks i'm good [19:56:21] i thought i'd need you to change my username ,but i don't! :) [20:08:57] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): Have "Last Attracted Developers" information for Gerrit (already exists for Git) automatically updated - https://phabricator.wikimedia.org/T151161#3879020 (10Aklapper) For the records, we need this data 8 times a year (beginning of Jan,M... [20:12:00] ok so hopefully the last huge tables will be done during the next few hours, and tomorrow morning I'll be free to re-enable EL's consumers [20:15:51] nice [20:16:03] elukey: https://gerrit.wikimedia.org/r/#/c/402424/ and https://gerrit.wikimedia.org/r/#/c/402425/ thoughts? they can also wait, no hurry at all [20:18:07] 10Analytics-Tech-community-metrics: "Patchsets Statistics Per Review" widget on "Gerrit" is incomprehensible (due to missing units and misleading custom label) - https://phabricator.wikimedia.org/T151218#3879030 (10Aklapper) 05Open>03Resolved a:03Aklapper Panel is now called "Patchset Statistics per Change... [20:23:47] ottomata: lgtm, only one question - what is the the line that marks an "extra property" that goes in the hash vs a regular one? [20:28:32] there isn't really one [20:28:34] i guess [20:28:37] for this the line is clear [20:28:50] since the property names themselves are variable [20:28:54] elukey: ^ [20:29:08] but, yeah, if i were doing this from scratch, i might do more like druid [20:29:27] and a single has for all/most properties [20:29:30] hash* [20:31:07] okok, makes sense [20:31:18] druid's hashes are indeed a lot more flexible [20:36:18] all right going offline again, will log in here tomorrow morning for el :) [20:36:25] and also send an email [20:37:55] ottomata: can I kill your screen session on an1003? [20:38:23] it is a CRITICAL on icinga for the long standing sessions on hosts etc.. [20:41:10] done [20:41:20] thanks! [21:35:42] 10Analytics, 10CirrusSearch, 10Discovery, 10Discovery-Search: Load cirrussearch data into druid - https://phabricator.wikimedia.org/T156037#3879173 (10debt) @TJones: 🌓 +🦄 == 🍰 🤣 [22:19:29] I can't seem to ssh to analytics-slave.eqiad.wmnet after switching to new ssh keys. Does someone need to update my public key there? [22:19:45] ottomata: ^ ? [22:21:03] although it's been a long time since I've sshed there, so it might be a totally different server and not even have my user on it [22:54:06] Dear analytics people, [22:54:13] https://meta.wikimedia.org/wiki/Schema_talk:ChangesListFilters is the correct way to indicate a purging policy, right? [22:54:45] Cause it looks like events for this schema are getting purged (at least from MySQL) *completely* after 90 days, instead of partially as I requested on that page [22:55:10] (select min(timestamp) from blah returned 20171006nnnnnn yesterday, and 20171007nnnnnn today) [23:04:19] Nevermind, it looks like I have to go through stat1006 :) [23:04:30] which I forgot