[08:22:45] heya elukey o/ [08:22:57] hola! [08:23:04] wassup? [08:23:57] trying to make pivot working, need to work on puppet because there are some issues :( [08:24:02] the rest is good [08:24:06] ) [08:24:06] great :) [08:24:28] anything Ican help for decommissioning ol d aqs'? [08:25:13] joal: all good from your side? [08:25:33] if you could just stop the oozie machinery to update the old cluster it would be enough [08:25:46] I have looked at graphs regularly, and spotted nothing [08:25:47] after that it is only a matter of merging [08:25:52] elukey: ok [08:26:13] elukey: Stopping old aqs loading is reasonably easy (easier than starting it at least ;) [08:26:30] joal: with "from your side" I meant "how's going?" :) [08:26:39] Ah ;) [08:27:48] Well, looking in hive if the ratio views / distinct UserAgent doesn't give too many false positive, and continuing to move on scala for edit history (this one is my never ending one ...) [08:28:45] :D [08:49:12] (PS1) Elukey: Updated scap configuration [analytics/pivot/deploy] - https://gerrit.wikimedia.org/r/315050 (https://phabricator.wikimedia.org/T138262) [08:49:39] (CR) Elukey: [C: 2 V: 2] Updated scap configuration [analytics/pivot/deploy] - https://gerrit.wikimedia.org/r/315050 (https://phabricator.wikimedia.org/T138262) (owner: Elukey) [09:04:24] elukey: ping? [09:05:20] pong [09:06:13] elukey: you probably have missed my previsous mesage :) [09:06:25] elukey: Do you mind if I deploy refinery now ? [09:09:32] ah yes I didn't see it :( [09:09:34] not at all [09:09:38] elukey: no prob :) [09:09:41] ok [09:09:45] Deploy now then ! [09:09:47] whenever you have time I'd need some help with Pivot's first deploy [09:10:13] elukey: sure ! I'm no javascript nor web-server expert, but whatever I can help with :) [09:10:27] it is more scap first time deployment and permissions [09:10:29] :) [09:10:45] hm, I'm no scap expert neither :-P [09:10:47] huhu [09:11:03] I can however do some rubberducking :) [09:16:03] !log Deploy refinery [09:48:34] elukey: i can help [09:48:45] * mobrovac last famous words [09:49:52] mobrovac: I am pretty sure it is a PEBKAC as always :D [09:50:01] haha [09:51:43] so I have on tin /srv/deploymnet/analytics/pivot/deploy and both scap deploy on it and scap deploy-local on stat1001 (the target) are failing.. the former due to perms even opening the log files [09:52:23] so I am surely missing something because I have little experience with new repos in scap [09:52:50] k, lemme take a look on tin [09:57:38] elukey: in /srv/deployment/analytics/pivot/deploy, you need to chmod g+w scap [09:57:56] scap can't create its log dir under your user so it fails flat out [10:00:23] mobrovac: yep I noticed that but I was wondering if the problem is "normal" or if my perms were not right due to some misconfiguration from my side [10:00:33] for example, after the chmod that you suggested I get "error: insufficient permission for adding an object to repository database .git/objects" [10:00:51] that follows the same pattern [10:01:47] elukey: that is indeed strange that .git has trebuchet:trebuchet perms [10:02:10] elukey: easiest way out: chown -R trebuchet:wikidev . in the deploy repo [10:02:40] yeah [10:02:43] the first-time set-up for scap isn't really streamlined [10:02:57] so let's not fight windmills here [10:06:20] :) [10:09:28] mobrovac: done, much better now [10:09:42] cool [10:12:00] now I have a weird 10:11:39 deploy failed: Command '/usr/bin/git submodule foreach --recursive '/usr/bin/git update-server-info'' returned non-zero exit status 1 [10:12:13] but probably some other issue more related to the repo [10:12:17] no more perm issues [10:13:49] ah yes [10:13:50] error: unable to update /srv/deployment/analytics/pivot/deploy/.git/modules/src/info/refs+ [10:14:05] uf [10:14:10] the perms are all messed up there [10:15:20] elukey: chmod -R g+w .git in the deploy repo ought to fix it though [10:27:52] (was afk sorry, back now) [10:28:59] mobrovac: did you fix them by any chance? [10:30:31] puppet runs fine now, deploy-local on stat1001 went fine [10:37:57] no ok still broken on tin side [10:39:33] yessss now it works :) \o/ [10:43:01] cool [10:44:00] i couldn't have fixed it, i don't have sudo rights on tin [10:45:03] okok [10:59:17] (PS1) Elukey: Update src submodule's SHA [analytics/pivot/deploy] - https://gerrit.wikimedia.org/r/315067 (https://phabricator.wikimedia.org/T146389) [10:59:45] joal_: do you have a min? [10:59:52] elukey: I do ! [11:00:04] thanks!! would you mind to double check the above code review? [11:02:27] joal_: --^ [11:02:29] elukey: Just a quick headsup for me: Same idea as with aqs, right? [11:02:42] There is a pivot repo, and a pivoit/deploy repo [11:02:59] We sync up pivot repo as a submodule of pivot/deploy one [11:03:26] yes pivot is src of pivot deploy [11:03:57] the pivot systemd unit is not able to find the build dir [11:04:11] and I checked on tin that src is not the one that I need [11:04:25] I don't get this part elukey :) [11:04:43] ah ok :) [11:05:39] so basically the systemd unit responsible to run the pivot service on stat1001 is instructed to execute the build/bin/pivot script [11:05:52] but at the moment we don't have it deployed :) [11:06:22] in https://phabricator.wikimedia.org/T146389 Marcel updated the deploy repo [11:06:33] but I think he didn't follow up with the submodule change [11:06:43] so I am trying to finish the work to allow pivot to rise and shine! [11:06:44] hm [11:06:51] I think I get it :) [11:07:19] elukey: sha1 of the new src submodule is indeed the last one [11:08:20] elukey: building my understanding while checking the CR [11:08:26] sorry for the delay :) [11:08:32] we do the same with the operations/puppet submodules, because otherwise git submodules update --init does not pull the last commit but the one that it knows [11:08:55] nono please do sanity check my assumptions [11:08:59] :) [11:09:01] elukey: I recall having had that issue with aqs as well [11:09:46] elukey: Ok ! [11:09:57] elukey: I'll +2 your CR :) [11:10:03] super thanks! [11:10:12] elukey: ooops, talked too fast :) [11:10:23] elukey: A better commit message would be good though :) [11:10:33] you are absolutely right [11:10:43] (CR) Joal: [C: -1] "Please update commit message to be more explicit :)" [analytics/pivot/deploy] - https://gerrit.wikimedia.org/r/315067 (https://phabricator.wikimedia.org/T146389) (owner: Elukey) [11:12:57] (PS2) Elukey: Update the src submodule's SHA [analytics/pivot/deploy] - https://gerrit.wikimedia.org/r/315067 (https://phabricator.wikimedia.org/T146389) [11:14:51] (PS3) Elukey: Update the src submodule's SHA to the latest commit [analytics/pivot/deploy] - https://gerrit.wikimedia.org/r/315067 (https://phabricator.wikimedia.org/T146389) [11:14:59] elukey: joal_: the useful practice we've found useful is to have the sha1 in the commit message [11:15:00] joal_: --^ better? [11:15:16] like: Update src to [11:15:42] nice I like it! [11:15:58] elukey: I think mobrovac approach is good, but the last message you provided works for me :) [11:16:06] elukey: Ok, I let you update ;) [11:16:54] but if you use service-runner to build the deploy repo, it'll do it for you [11:16:55] :P [11:17:05] we can leave this discussion for another time though [11:17:47] (PS4) Elukey: Update the src submodule's SHA to 2b19082 [analytics/pivot/deploy] - https://gerrit.wikimedia.org/r/315067 (https://phabricator.wikimedia.org/T146389) [11:19:04] mobrovac: yes I completely support it, and we tried but at some point Andrew and Dan suggested to proceed for the moment without it.. I think that they tried but the pivot nodejs code was a bit difficult to integrate with service runner. A good next step could be to follow up with upstream (when their repo will be available again) [11:19:44] joal_: ready! [11:19:55] elukey: Mergiiiiiing ! [11:20:00] i'm 99% sure that the service doesn't actually need to be using service-runner in order to use it to build the dpeloy repo, though [11:20:27] (CR) Joal: [C: 2 V: 2] "LGTM, merging." [analytics/pivot/deploy] - https://gerrit.wikimedia.org/r/315067 (https://phabricator.wikimedia.org/T146389) (owner: Elukey) [11:20:39] elukey: Done ! [11:20:43] this is a good point that I haven't investigated (elukey's ignorance++) [11:20:59] thanksss! [11:38:21] 13:35 RECOVERY - pivot on stat1001 is OK: TCP OK - 0.002 second response time on port 9090 [11:38:24] \o/ [11:41:50] joal_: I moved temporarily httpd proxy rules for https://pivot.wikimedia.org to the new localhost:9090 pivot [11:41:52] on stat1001 [11:42:04] but I am missing the data labels [11:42:05] weird [11:42:12] same thing for you? [11:43:03] elukey: I can't acccess http://pivot.wikimedia.org/ [11:43:18] worst than me then :D [11:43:28] what do you see? error while login [11:43:29] ? [11:45:35] elukey: yes [11:45:52] elukey: It has slow to answer, but now issue is with logging [11:46:16] can you retry? [11:46:25] elukey: sure [11:46:39] Yay :) [11:47:24] elukey: Bizarre though: everything works except for the datasource labels (fields label are ok) [11:47:35] all right now we have the same issue, good :) [11:49:55] maybe it could be firejail [11:50:40] mmm doesn't make any sense [11:50:47] and from the logs it recognizes the labels [11:52:41] elukey: :( [11:52:48] elukey: maybe some caching thing? [11:53:50] it is really weird since if you check the links on the pivot page they have the right names :D [11:54:06] elukey: Indeed ! [12:01:12] AFK for a bit [12:18:53] Back [12:21:45] Oct 10 12:21:34 stat1001 firejail[27954]: Got the latest time for 'pageviews-daily' (2016-09-30T00:00:00.000Z) [12:21:48] Oct 10 12:21:34 stat1001 firejail[27954]: Got the latest time for 'edit-history-test' (2016-09-07T21:47:21.000Z) [12:21:51] Oct 10 12:21:34 stat1001 firejail[27954]: Got the latest time for 'pageviews-hourly' (2016-10-09T23:00:00.000Z) [12:22:05] maybe firejail somehow filters them? [12:22:47] ah snap the logs are going to sysllog [12:25:50] moritzm: question for you about firejail if you have time :) [12:28:03] sure [12:28:59] thanks! So the pivot nodejs service is now running fine on stat1001, rendering https://pivot.wikimedia.org/# [12:29:31] you can log in if you want with you Wikitech credentials [12:30:06] basically the javascript should display pageviews-daily, edit-history-test and pageviews-hourly rather than "no description available" [12:30:16] the info are retrieved on the fly from druid [12:30:42] and with the "screen" version on stat1002 this problem was never noticed [12:30:47] having a look [12:30:52] thanks! [12:31:07] just wanted to know if in your opinion there might be something trivial to fix [12:36:44] can I restart pivot.service for debugging? [12:37:08] sure [12:37:16] ok [12:43:22] this is unrelated to firejail, I tried minimising the profile and also removing firejail from the systemd unit entirely, made no diffeence [12:43:46] also added some ferm logging to druid1001 to see whether there's anything possibly dropped, but didn't happen either [12:44:25] not sure what's going on here, where does it pull "pageviews-daily" from? [12:44:39] oh wow thanks a lot, I was about to do something similar but you anticipated me, sorry for the extra work :) [12:45:02] it should pull them from druid itself, while connecting [12:45:12] probably a but somewhere in pivot [12:45:32] maybe it's a setting in druid somewhere [12:45:42] that it still expects stat1002 somewhere or so [12:46:13] will check, thanks again! [12:46:37] yw [13:05:16] mornin! [13:06:15] hola! o/ [13:06:25] hiiiiii [13:06:31] who's workin today? cause I am! [13:06:42] heya ottomata :) [13:06:45] I am as well [13:08:38] yeehaw [13:08:49] sounds like enough for a standup to me! i miss yall! [13:09:02] huhuhu [13:09:04] :) [13:09:13] :) [13:11:07] ottomata: whenever you have time can we chat about the kafka broker list in hiera? Varnishkafka is spamming that kafka1018 is not reachable (librdkafka complains) and also Brandon found https://gerrit.wikimedia.org/r/#/c/267243/ (never reverted) [13:12:24] should we update also puppet when a broker is down? WDYT? [13:13:21] elukey: hmMmmmM, since the list is used for bootstrapping, i would say we shouldn't [13:13:29] but, it seems some clients are annoying if a broker is down [13:13:37] maybe we should use lvs? [13:13:42] pool/depool? [13:13:44] instead of puppet? [13:13:45] hmm [13:13:53] not sure if that is possible [13:14:53] confd could be used but it might be a bit overkill [13:15:24] Analytics-Kanban: Migrate the simplest limn dashboards to dashiki tabular {frog} - https://phabricator.wikimedia.org/T126358#2703620 (mforns) a:mforns [13:16:48] (Abandoned) Mforns: [WIP] Add 2 wikistats reports [analytics/reportupdater-queries] - https://gerrit.wikimedia.org/r/311961 (https://phabricator.wikimedia.org/T141479) (owner: Mforns) [13:17:13] yeah [13:17:32] elukey: i think if we know the broker is going to be down for long enough (like this one) then it'd be fine to take it out of hiera [13:22:07] !log moved pivot apache vhost to localhost:9090 [13:22:39] ottomata: all right, maybe we can discuss post-standup? [13:22:49] aye k [13:23:21] so the new pivot daemon on stat1001 works fine [13:23:27] except the front page of https://pivot.wikimedia.org/# [13:23:48] I am not sure why it is not displaying the labels [13:45:40] joal: pivot on stat1002 is 0.8.34 meanwhile on stat1002 is 0.9.39 [13:45:50] hmmmmm [13:45:59] sorry the first is stat1002 the second stat1001 [13:46:23] so 0.9.39 seems to have a bug [13:46:24] k elukey [13:46:32] mwarf :( [13:52:35] (PS1) Elukey: Add scap configuration to restart the Pivot service after deploy [analytics/pivot/deploy] - https://gerrit.wikimedia.org/r/315092 (https://phabricator.wikimedia.org/T146389) [13:52:54] (CR) Elukey: [C: 2 V: 2] Add scap configuration to restart the Pivot service after deploy [analytics/pivot/deploy] - https://gerrit.wikimedia.org/r/315092 (https://phabricator.wikimedia.org/T146389) (owner: Elukey) [14:11:15] hi team :] [14:11:17] milimetric, yt? [14:11:44] mforns: o/ [14:11:46] hi mforns, what's up [14:12:01] it's a holiday here [14:12:36] milimetric, I'm reviewing a candidate's task and trying to execute es6 on node, have you done it before? I'm having problems [14:12:59] milimetric, oh!!! sorry then [14:13:09] mforns: whenever you have time can we chat about https://pivot.wikimedia.org? The version on the gerrit repo, that I've deployed to stat1001 via scap, is 0.9.39 meanwhile the one running in screen on stat1002 is 0.8.something [14:13:12] go have fun :] [14:13:20] Yeah, you need that harmony extension thing or something, or to compile it with Babel or some nonsense like that [14:13:31] mforns: and there is a visualization bug of the data labels :( [14:13:47] milimetric, ok, thanks, in nodejs' site it says you don't but anyway, thanks! [14:13:53] but nice, I like the effort! Also, ottomata might know, he's doing latest js on node [14:14:00] ok [14:14:43] elukey, sure, whenever you want [14:15:15] elukey: I can try pulling in more recent changes to my fork, see if it fixes it. We can do that tomorrow? [14:15:58] But yall can do it on the gerrit repo. Let me know if you want push to my github, just in case [14:16:00] milimetric: sure! everything works fine now I just wanted to know what do to with the bug :) [14:16:12] reverting to 0.8.x, etc.. [14:16:14] no hurry [14:16:32] I wouldn't worry, especially if it's easy to upgrade I can pull tomorrow and do that [14:16:35] I assumed that the version on stat1002 and on the repo was the same [14:16:41] sure sure [14:16:43] thanks! [14:16:55] thx [14:33:05] just created https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Druid#Pivot_Administration [14:51:57] ottomata: I had a chat with Erik Zachte about statistics::wikistats, more specifically the stat1-cron-script.sh cron [14:52:12] he said that it is a very old script that could be deprecated [14:52:35] but afaics the statistics::wikistats class is there for the cron right? [14:52:42] so we could in theory nuke it entirely [14:52:59] (or maybe only the cron) [14:54:31] ! [14:54:39] elukey: anything we can nuke there I am for [14:59:46] ottomata: [14:59:47] https://gerrit.wikimedia.org/r/#/c/315101/1 [14:59:52] maybe this one as starter [14:59:57] :) [15:00:13] ni ce elukey thanks for the docs [15:01:00] a-team: standduppp [15:01:56] coming... [15:37:55] Analytics, Analytics-EventLogging: Ensure no dropped messages in analytics eventlogging processors when stopping broker - https://phabricator.wikimedia.org/T142430#2703856 (Nuria) a:Ottomata [15:41:06] Analytics: Puppetize job that saves old versions of geoIP database - https://phabricator.wikimedia.org/T136732#2703862 (Nuria) @Milimetric : should we do this or rather kill cron and task? We have not had the need to do this thus far (in two years). [15:46:26] Analytics, New-Readers, Easy: Split opera mini in proxy or turbo mode - https://phabricator.wikimedia.org/T138505#2703880 (Nuria) [15:48:28] Analytics, Pageviews-API, Easy: Add monthly request stats per article title to pageview api - https://phabricator.wikimedia.org/T139934#2703889 (Nuria) [15:56:28] Analytics: Pre-generate mysql ORM code for sqoop - https://phabricator.wikimedia.org/T143119#2557510 (Nuria) @Milimetric : is this still relrevant for Q2 if we run scoop once a week? [15:57:42] Analytics, Easy: Don't accept data from automated bots in Event Logging - https://phabricator.wikimedia.org/T67508#2703917 (Nuria) [16:00:28] Analytics, Pageviews-API: Improve user management for AQS - https://phabricator.wikimedia.org/T142073#2703934 (Nuria) [16:10:31] (CR) Nuria: [C: 2 V: 2] "Merging, will be deployed this week." [analytics/refinery] - https://gerrit.wikimedia.org/r/311964 (https://phabricator.wikimedia.org/T146064) (owner: Nschaaf) [16:12:53] ottomata: so PROBLEM - Kafka MirrorMaker main-eqiad_to_analytics on kafka1012 is CRITICAL: PROCS CRITICAL: 0 happened on Oct 4th [16:12:59] and I restarted it manually [16:13:06] but I also have other recurrences [16:14:00] ah yes the ones from kafka1018 [16:26:37] hm, interesting [16:26:39] anything in logs elukey? [16:28:18] I should have opened a phab dask sigh [16:28:27] maybe journalctl has still some logs [16:29:11] yes! [16:29:42] so ottomata - [2016-10-04 07:13:54,513] on kafka1012, sudo journalctl -u kafka-mirror-main-eqiad_to_analytics.service [16:43:53] hm, interesting, looks like it couldn't produce at some point [16:44:02] and now it doesn't have any conusmer groups assigned [16:44:16] am headed to a cafe, bbs [17:13:00] going afk team! talk with you tomorrow! [17:13:01] o/ [17:21:17] laters! [17:35:30] Analytics: Missing raw pageview data for 7/29/2015 - 03:00 - https://phabricator.wikimedia.org/T147801#2704063 (Nuria) [18:55:17] elukey: (for tomorrow) changed commit msg , let me know if it is Ok to merge: https://gerrit.wikimedia.org/r/#/c/314722/ [19:54:27] nuria, do you know how the apache config in dashiki-01 is altered? directly editing /etc/apache2/sites-available/50-simplestatic.conf? [19:54:47] mforns: yes, there is a git depot local to teh box with a bunch of commits [19:54:50] *the [19:55:17] mforns: ah sorry, no [19:55:25] mforns: i read *limn* somehow [19:55:39] it's a hiera puppet config in wikitech [19:55:47] mforns: we are talking the new good config? if so it is hiera [19:55:51] right, [19:56:02] ok milimetric thanks [19:56:11] :] [20:17:45] (PS7) Nuria: [WIP] Service Worker to cache locally AQS data [analytics/dashiki] - https://gerrit.wikimedia.org/r/302755 (https://phabricator.wikimedia.org/T138647) [20:34:15] (PS8) Nuria: [WIP] Service Worker to cache locally AQS data [analytics/dashiki] - https://gerrit.wikimedia.org/r/302755 (https://phabricator.wikimedia.org/T138647) [20:42:02] nuria, :] I see very few data in http://glam-metrics.wmflabs.org, and it is not updated any more. Do we want to keep it as historical reference? Also, there is a world-map chart, is the idea to create a new visualizer in dashiki, or should be represent it in another way? [20:43:47] mforns: given that data is pretty old (2014?) and note updated since then ( am I right?) i vote for kill [20:44:17] mforns: dashiki needs a map yes, but I am not sure it needs one for this project [20:44:28] mforns: i think we will need to develop one eventually [20:47:05] (PS1) Nuria: Changing link to browsers to avoid a 301 [analytics/analytics.wikimedia.org] - https://gerrit.wikimedia.org/r/315184 [20:47:13] nuria, aha, agree, but since dan listed it as to move to dashiki, I guess there is a reason to it no? although it's pretty old... [20:47:48] cc milimetric but .. i do not think so.. I think at the time we though data was updating [20:47:56] we can talk about this tomorrow [20:49:10] nuria, ok :] [20:50:01] mforns: take a look at this small change: https://gerrit.wikimedia.org/r/#/c/315184/ [20:50:15] mforns: I think it will avoid 1 extra redirect in the dashboards [20:50:20] nuria, ok [20:50:29] mforns: thank you [21:00:33] (CR) Mforns: [C: 2 V: 2] Changing link to browsers to avoid a 301 [analytics/analytics.wikimedia.org] - https://gerrit.wikimedia.org/r/315184 (owner: Nuria) [21:00:48] nuria, cool