[09:37:50] 10Analytics: Create robots.txt policy for datasets - https://phabricator.wikimedia.org/T159189#3059607 (10Peachey88) Is there any reason we are actually concerned about bandwidth usage? [10:30:21] 10Analytics-Tech-community-metrics, 10Phabricator, 06Developer-Relations (Jan-Mar-2017): Decide on wanted metrics for Maniphest in kibana - https://phabricator.wikimedia.org/T28#3060641 (10Aklapper) p:05Low>03Normal a:03Aklapper [10:32:17] 10Analytics-Tech-community-metrics, 06Developer-Relations (Jan-Mar-2017): Identify Wikimedia's most important/used info panels in korma.wmflabs.org - https://phabricator.wikimedia.org/T132421#2197987 (10Aklapper) p:05Low>03Normal [10:32:40] 10Analytics-Tech-community-metrics, 06Developer-Relations (Jan-Mar-2017): Go through default Kibana widgets; decide which ones are not relevant for us and remove them - https://phabricator.wikimedia.org/T147001#3060649 (10Aklapper) p:05Low>03Normal [11:10:03] hello a-team! [11:10:27] Moritz upgraded apache on thorium and I was reviewing all the websites as consistency check [11:10:30] but https://analytics.wikimedia.org/dashboards/browsers/ looks weird [11:10:50] same thing for https://analytics.wikimedia.org/dashboards/vital-signs/#empty [11:10:59] (they seems empty or not functioning correctly) [11:11:11] probably I am missing something but can somebody please double check? [11:38:00] elukey: it indeed look weird ! [11:38:50] both browser report and vital sign seem broken :( [11:40:26] elukey: Also, do you want me to pause / stop oozie jobs in preparation for cluster upgrade? [11:41:27] 10Analytics, 10Analytics-Cluster: Apply Xms Java Heap settings to all the Hadoop daemons - https://phabricator.wikimedia.org/T159219#3060783 (10elukey) [11:45:02] joal: we can do it later if you want, it is just a suspend and wait.. so we'll let a bit more jobs going [11:45:14] sure elukey [11:45:16] but I am fine whatever you want to do :) [11:45:18] you are the master [11:45:22] huhu [11:45:35] elukey: about website, what can we try to do? [11:46:07] I checked apache logs and JS console logs in Chrome but nothing comes up [11:46:19] it might be a subtle JS problem [11:46:26] I am going to check the apache changelog [11:47:46] elukey: it's as if there was no data / nothing to display [11:53:33] joal: I restarted apache reverting a new setting (that is the big change) but nothing changes, so it must be something else.. [11:54:22] everything else seems to work fine [11:54:38] maybe we'd need to wait for Marcel [11:54:53] (03CR) 10Joal: "2 comments:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/340093 (https://phabricator.wikimedia.org/T156312) (owner: 10Fdans) [11:55:37] elukey: marcel is off this week :( [11:55:45] elukey: I think we need Dan [11:56:26] argh you are righttttt [11:57:58] (03PS3) 10Fdans: Use v2 table in Cassandra, switch to padded day timestamp [analytics/refinery] - 10https://gerrit.wikimedia.org/r/340093 (https://phabricator.wikimedia.org/T156312) [11:58:39] we can ping our new JS expert and lover fdans!! [11:58:55] what do you mean by your lover? [11:59:05] 😛 [11:59:21] nono JS lover, don't read sentences as you would like to [11:59:27] hahahah [11:59:32] :D :D :D [11:59:52] elukey looking... [11:59:58] jokes aside, I am checking why https://analytics.wikimedia.org/dashboards/browsers/index.html and https://analytics.wikimedia.org/dashboards/vital-signs/#empty looks weird [11:59:59] (03CR) 10Joal: [C: 04-1] "Comments inline" (032 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/339419 (https://phabricator.wikimedia.org/T156312) (owner: 10Fdans) [12:00:14] Hi fdans, thanks for the changes :) [12:00:18] I am not sure if they were working before the apache upgrade though [12:00:24] all the other websites are good [12:03:22] so I'm looking at https://analytics.wikimedia.org/dashboards/browsers/ [12:03:48] and the following request is getting a 400 https://piwik.wikimedia.org/piwik.php?action_name=Simple%20Request%20Breakd…p=0&wma=0&dir=0&fla=1&java=0&gears=0&ag=0&cookie=1&res=3360x2100>_ms=148 [12:06:07] Hey yall [12:06:45] Hm... piwik shouldn't stop stuff from working... [12:07:03] I double checked after deploying last and everything was ok [12:07:12] Lemme take a closer look [12:08:05] Thanks fdans and milimetric (i'm really bad at front end debugging :( [12:08:17] I got it, no worries [12:09:23] of course they did :) [12:09:35] the pages on meta vanished [12:09:38] (that stored the configs) [12:09:46] after we deployed dashiki last night probably [12:09:54] that doesn't make sense, must've been moved [12:10:54] (talking about pages like https://meta.wikimedia.org/wiki/Config:VitalSigns) [12:11:42] also, I got permissions revoked on meta.... [12:11:44] wtf is going on [12:12:59] ok so this is not related to the apache upgrade [12:13:02] no [12:13:18] choices are: migrate all configs from backups to the new Config:Dashiki namespace [12:13:45] or... I guess that's the only choice because I don't have rights to create Config: pages anymore [12:13:53] maybe that's what happened during the deploy [12:14:13] which is not how it worked on beta... [12:14:29] biggest problem right now is how do I get old text... [12:14:54] sorry talking out loud, morning brain :) [12:15:01] nono it is really useful :) [12:15:09] I am wondering if another change wiped the config [12:15:13] and not the deployment [12:15:22] did you guys check the websites after the deployment? [12:15:33] no, didn't think to, was late last night [12:15:36] (just to understand when this mess could have happened) [12:17:09] ok, so page is still in the page table, that means it's being masked probably, and then it's definitely the deploy [12:17:43] yeah, 'cause if it was moved or deleted it would either have page_is_redirect or be in the archive table instead [12:18:39] wonder if the API will get me the text [12:18:50] duh obviously not, that's what dashiki does :) [12:18:51] hahaha [12:20:08] milimetric: I am going to step away a bit to eat something before the CDH upgrade, will be back in a bit if you need me ok? [12:20:23] yeah, no problem, elukey, this is all me I think [12:20:54] just gotta query and find this text, move to the Config:Dashiki: namespace, update Dashiki and re-deploy all dashboards [12:20:55] no biggie [12:21:35] super :) [12:21:48] * elukey lunch! [12:26:41] (03PS3) 10Fdans: Format timestamps correctly in per-project aggregation [analytics/aqs] - 10https://gerrit.wikimedia.org/r/339419 (https://phabricator.wikimedia.org/T156312) [12:28:51] hm, anyone know where I can get revision text from? joal did you happen upon the db that has that? [12:29:08] milimetric: which wiki? [12:29:10] like I have page_id, rev_id, rev_text_id on meta [12:29:28] give me a minute milimetric [12:31:11] milimetric: I can find that yes, but I don't have very recent stuff [12:31:20] oh from dumps [12:31:25] ummm, how recent? [12:31:25] milimetric: correct [12:31:37] milimetric: currently checking how recent I can get [12:32:06] milimetric: 2017-02-20 [12:32:09] (8 days ago) [12:32:21] oh plenty, all updated last before that [12:32:28] ok, great, where do I go? [12:32:51] milimetric: I need to import the file, then parse it with my utilities (shouldn't be long) [12:33:05] awesome, thx joal [12:33:24] milimetric: np, those things I'm working will at some point be useful ;) [12:33:32] I should really figure out where to get this from the db as well [12:35:27] milimetric: before I go for the full thing, can you give me the list of rev_ids? [12:36:17] milimetric: cause I actually don't have history data from 2017-02-02, but only from 2017-02-01 [12:36:47] So if rev_ids are the last ones for each page, should be ok, but I prefer double check [12:36:51] https://www.irccloud.com/pastebin/CdzOmB7r/ [12:37:46] milimetric: from the rev_timestamp, things should be ok even with 2017-02-01 [12:37:47] (sorry pasted some bad ones in there joal) [12:37:57] yep, all old [12:38:07] milimetric: waiting for an updat? [12:38:16] what do you mean? [12:38:23] milimetric: which ones should I take? [12:38:31] oh I pasted them in, you don't see it? [12:38:36] it's a pastebin [12:38:45] milimetric: I see them, but you said there were too many [12:38:50] ohoh [12:39:08] https://www.irccloud.com/pastebin/NQllyN5b/ [13:07:18] joal: how's it going? [13:07:47] milimetric: I got results, trying to wrote them [13:10:31] milimetric: stat1004:/home/joal/metamili.txt [13:11:03] thx joal, looking [13:11:07] format: rev_id\ntxt\n\n [13:11:54] milimetric: (probably old) but I saw this on #operations: 14:03 @ !log ran namespaceDupes on meta to fix some Config pages [13:14:17] yep, been talking to him in -databases, he's being very nice and explaining a lot [13:14:45] so the dashboards should be ok now that he did that actually [13:14:46] https://meta.wikimedia.org/wiki/Config:SimpleRequestBreakdowns [13:15:15] so we're not against the clock anymore, I'll slow down and see the right way to move these pages [13:15:30] milimetric: indeed, vital signs back in place [13:15:47] heh, Seinfeld's so smart. He said pain is lots of knowledge rushing in to fill a gap [13:15:52] what happened?? [13:16:12] you mean overall elukey? [13:25:56] yes, just curious [13:29:23] Hi ottomata [13:29:29] I'm sorry I missed your ping yesterday [13:29:33] hi! [13:29:55] ottomata: o/ [13:30:02] joal: I think that we can start the suspend procedure [13:30:15] elukey: sure ! [13:30:26] elukey: please go ahead with camus, I'll care oozie [13:30:38] sure [13:30:51] !log stopping camus as prep step for the CDH upgrade [13:30:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:31:05] !log Suspend webrequest-load bundle for CDH upgrade [13:31:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:31:21] bearloga: Hi ! [13:31:26] thanks! [13:31:36] joal: uhhhh i think iw as gonna talk about spark 2 stuff with ya [13:31:39] let's chat later [13:31:44] sure ottomata [13:31:45] about taht [13:32:49] bearloga: We are going to upgrade hadoop soon - Can you please stop launching requests (or they'll fail soon) [13:35:23] ottomata: do we need to add the apt source.d config before starting ? [13:35:35] yeah [13:35:37] on that in a min... [13:35:42] super [13:36:58] want to get through morning emails before starting :) [13:38:37] sure sure [13:38:43] I am going through the prep steps [13:38:48] (silencing alarms, etc..) [13:41:42] k awesome :) [13:48:02] joal: can't stop to my knowledge; they're automatically spawned by reportupdater and crontab :\ no big deal if they fail, though :) [13:48:16] bearloga: ok :) [13:54:52] all right puppet disabled and all hosts silenced (drud100[123], an1027->an1057, stat100[234]) [13:57:50] 10Analytics-Tech-community-metrics, 06Developer-Relations (Jan-Mar-2017): Kibana's Mailing List data sources do not include recent activity on wikitech-l mailing list - https://phabricator.wikimedia.org/T146632#3061018 (10Aklapper) 05Open>03Resolved Thanks Lcanasdiaz! Closing as resolved as I can confirm t... [13:58:30] great [13:58:55] elukey: pcc looks good to me [13:59:05] (cept drud1001 :p) :) [13:59:21] haha, elukey actually we probably want to RUN puppet everywhere [13:59:23] to pick up this change :p [13:59:26] i guess we can do it one by one [13:59:28] as we upgrade them [13:59:29] yaaa [13:59:53] ah yeah I forgot that :D [14:00:17] yeah pcc looks good to me too! [14:00:26] drud is not good :P [14:00:27] ottomata: still a webrequest job running on cluster [14:00:56] Arf, actually, gone as I speak :) [14:01:01] yeah i see that joal [14:01:01] we'll wait for it to finish [14:01:01] oh great! [14:01:06] joal: you are suspending the jobs then? [14:01:09] elukey: hm [14:01:16] ooooook [14:01:17] ottomata: Did that a while ago [14:01:27] great :) [14:01:32] eliu icinga is silenced? [14:01:35] elukey: ? [14:02:14] !log Suspend mediawiki-load jobs as well (forgot about those) [14:02:16] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:02:42] ottomata: I updated the etherpad, all silenced :) [14:02:50] I think I'd need to re-enable puppet for the moment [14:02:55] great :) [14:03:03] gimme 1 min [14:03:06] k [14:03:10] joal: i see a few bundles running [14:03:11] still [14:03:16] they ok? [14:03:55] and also quite a few coords (but they might belong to the bundles, not sure) [14:03:56] they should be blocked by webrequest, joal is a fan of minimal suspention [14:04:00] oh ok [14:04:07] even the elasticsearch ones? [14:04:20] 'transfer_to_es-eqiad-coord' ? [14:04:21] etc.? [14:04:28] don't know them, probably it is safer to suspend [14:04:47] hm, i think they are blocked by camus stop too [14:04:48] pretty sure those come from mw avro search stuff [14:04:49] ottomata: I think they depend on cirrussearch being loaded, but not sure [14:04:53] which is imported by camus [14:04:54] yeah [14:04:56] ottomata: give me aminute [14:05:20] /wmf/data/discovery/popularity_score/ ? [14:05:20] hm [14:05:30] puppet re-enabled! [14:05:35] ottomata, joal: feel free to disable this one if it causes issues [14:05:54] no real issue, just trying to prevent disabling everything:) [14:06:00] ok :) [14:06:07] ok i think those come from pageview_hourly [14:06:15] i think we are good joal, afaict [14:06:45] sounds good indeed ottomata [14:06:51] pv hourly -> discorvery popularity score -> transfer to es [14:07:05] no prod job running on the cluster, ready to be stopped [14:07:11] correct ottomata [14:07:20] elukey: ok, let's run puppet everywhere [14:07:20] and then disable, ja? [14:07:51] ottomata: change already merged? [14:08:00] ohp... [14:08:42] it is now elukey :). oh we should be careful about puppet vs. camus crons on an27 [14:09:11] yeah :( [14:09:20] all right running puppet! [14:09:25] ooook! [14:10:05] also we'd need apt-get update [14:10:16] (not sure if puppet does it automagically) [14:10:21] puppet should do it [14:10:29] pretty sure puppet runs update every run [14:10:31] Scheduling refresh of Exec[apt-get update] [14:10:35] :) [14:15:05] 1027 and all the worker nodes done, just a min to finish the rest [14:15:35] cool [14:15:56] nice, see Version: 2.6.0+cdh5.10.0+2102-1.cdh5.10.0.p0.72~trusty-cdh5.10.0 avail [14:15:58] great :) [14:17:42] ooook [14:18:00] bearloga: joal ok if i kill that job? [14:18:11] i know you said it was above, but i just want to double check [14:18:15] ottomata: I asked earlier, bearloga said yes [14:18:19] Ah ok :P) [14:18:23] ok [14:19:29] killed [14:19:55] ok elukey ready when you are, i guess i'll follow along? if you want me to do a part let me know? [14:21:01] mmmm an1003 also reports W: Failed to fetch http://ubuntu.wikimedia.org/ubuntu/dists/trusty-backports/Release.gpg Could not resolve 'ubuntu.wikimedia.org' [14:21:08] that is not a blocker but let's remember to check it [14:21:11] hm [14:21:23] that's weird [14:21:24] I think that the domain was killed [14:21:26] just an03? [14:21:49] maybe that's not related to the thirdparty-cloudera stuff [14:22:01] ah no all of them [14:22:08] nono I don't think it is [14:22:11] it was there before [14:22:26] maybe we were using backports from ubuntu.w.o and it has been killed? [14:22:39] i see that domain in sources.list on these hoses [14:22:41] hosts [14:22:43] hm [14:22:51] ok, dunno, let's move forward though, seems unrelated [14:23:53] elukey, ottomata: Can I take a 1 hour brake or do you need me around? [14:24:35] /etc/apt/sources.list.distUpgrade [14:24:35] joal: ya no problem, go ahead [14:24:52] joal: we always need you but you can go :D [14:24:53] k, will be back for restart (normnally:) [14:25:02] * joal blushes [14:25:35] anyhow, super weird but we can proceed [14:25:53] ja [14:27:03] one min for final checks and I'll start [14:28:58] ottomata: ready! [14:29:38] if you are ok I am going to proceed with the stop of the daemons [14:32:14] proceed! [14:32:17] elukey: ! [14:34:03] * elukey proceeds! [14:38:48] elukey@stat1002:~$ sudo lsof -n | grep "mnt/hdfs" [14:38:48] bash 16517 ebernhardson cwd DIR 0,24 4096 17590204 /mnt/hdfs/user/ebernhardson/discovery-analytics/current [14:39:01] I think I can just kill this right [14:39:01] ? [14:39:14] it is preventing me to umount /mnt/hdfs [14:39:44] ottomata: --^ [14:39:59] hm [14:40:00] oh [14:40:03] yea [14:40:04] kill that [14:40:30] gooood [14:48:10] (03PS1) 10Milimetric: Move dashboard configuration to Config:Dashiki: [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/340318 [14:49:12] ottomata: interesting thing :) [14:49:13] sudo -i salt 'analytics*' cmd.run "for service in $(ls /etc/init.d/hadoop-*); do echo $service; done" [14:49:25] (it is a modified version) [14:49:36] tries to eval the $() as local on neodymium [14:49:50] that is not what I expected :D [14:50:00] (03CR) 10Milimetric: [V: 032 C: 032] Move dashboard configuration to Config:Dashiki: [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/340318 (owner: 10Milimetric) [14:50:28] 10Analytics-Tech-community-metrics: "Email senders" widget empty though "Mailing Lists" widget states that there are email senders - https://phabricator.wikimedia.org/T159229#3061153 (10Aklapper) [14:50:28] hm, also, won't ls /etc/init.d/hadoop-* do full path [14:50:31] ? [14:50:34] oh sorry [14:50:40] you aren't doing service $service [14:50:44] you are just lsing them? [14:51:00] yeah I wanted to know why it wasn't doing anything or trying to stop "stop" :D [14:51:08] so I modified the script [14:51:16] IIRC I never encountered this issue [14:51:21] oh ha [14:51:28] oh you have the cd in there now [14:51:29] right? [14:51:32] wait, am confused [14:51:38] 10Analytics-Tech-community-metrics: Fix incorrect mailing list activity of AKlapper (=Phabricator) in Technical Community Metrics user data - https://phabricator.wikimedia.org/T132907#3061165 (10Aklapper) 05Open>03Resolved a:03Aklapper This specific issue is not a problem anymore (maybe fixed by {T146632}... [14:51:38] are you running what is in the etherpad? [14:51:45] I tried but it fails [14:52:23] example [14:52:24] elukey@neodymium:~$ sudo -i salt 'analytics*' cmd.run 'for service in $(cd /etc/init.d; ls hadoop-*); do echo $HOSTNAME $service; done' [14:52:27] analytics1049.eqiad.wmnet: neodymium neodymium [14:52:31] horrible paste [14:52:40] but you get what happens [14:53:17] OHHHH [14:53:17] i see, hm, but its in single quotes [14:53:17] checking too [14:53:32] hm, it works for me [14:53:33] hm [14:53:47] really? [14:54:07] maybe it is tmux? [14:54:07] elukey: its the -i [14:54:07] on your sudo [14:54:07] don't know why [14:54:07] but if you remove that, it works [14:54:18] I tried even with -i [14:54:21] *without [14:54:23] weird [14:54:31] I'll redo yours without -i [14:54:33] 10Analytics-Tech-community-metrics: Updated data in mediawiki-identities DB not deployed onto wikimedia.biterg.io? - https://phabricator.wikimedia.org/T157898#3061177 (10Aklapper) >>! In T157898#3024337, @Lcanasdiaz wrote: > @Aklapper I confirm this is broken right now. I'm appliying it manually today and talkin... [14:54:56] it works! [14:54:59] weeeeeird [14:55:08] 10Analytics-Tech-community-metrics, 07Regression: Git repo blacklist config not applied on wikimedia.biterg.io - https://phabricator.wikimedia.org/T146135#3061178 (10Aklapper) >>! In T146135#2815705, @Lcanasdiaz wrote: > I confirm that blacklist is not working. Working on it .. @Lcanasdiaz: Any news to share? [14:55:14] thanks, I'll fix my ignorance later on :) [14:55:21] no, remove -i [14:55:21] oh? [14:55:21] sudo salt 'analytics*' cmd.run 'for service in $(cd /etc/init.d; ls hadoop-*); do echo $HOSTNAME $service; done' [14:55:21] analytics1049.eqiad.wmnet: [14:55:21] hadoop-hdfs-datanode [14:55:21] hadoop-yarn-nodemanager [14:55:57] yes yes without the -i works for me too [14:56:00] no idea why [14:56:46] proceeding with the backups [14:57:15] k [15:00:21] ready to upgrade packages [15:00:28] proceeding with journal nodes [15:03:27] uh, what... who made this: http://datavis.wmflabs.org/where/ [15:03:45] weird elukey sorry accidentaly quit irc and couldnt' get back on [15:04:13] oliver ... [15:05:37] ottomata: just upgraded the journal nodes, all good! [15:06:15] yeehaw cool! [15:06:43] going for 1001 and 1002 [15:07:09] coo [15:08:04] oh elukey were we going to do fstab now too? [15:08:08] or wait til we upgrade to jessie? [15:08:19] oh no [15:08:20] you already did that?! [15:12:33] a-team: did you all want to meet in 20 minutes to discuss solving this webrequest-counting problem in hadoop? [15:12:58] (I had put it on the team calendar but it was last-minute so no pressure) [15:13:05] ottomata: I was planning to do it after reimaging the nodes to debian [15:13:16] so avoid adding too many things [15:13:39] ok col [15:13:40] I only ran the script on an1039 to test and reboot the new fstab [15:13:42] yeah makes sense elukey [15:13:42] cool [15:13:45] super [15:13:52] milimetric: we doing upgrade today/now so prob not us [15:13:54] in the meantime, I am upgrading the workers [15:13:57] elukey: great [15:14:05] k, postponing [15:17:17] 10Analytics, 10Analytics-Dashiki: Create dashboard for upload wizard - https://phabricator.wikimedia.org/T159233#3061266 (10Milimetric) [15:34:36] elukey: how we doin? [15:34:50] I am testing HDFS! [15:35:02] I had to remove the tmp/etc.. dirs since it was complaining [15:35:05] but all seems good [15:35:42] yeppp all good, just finished the checks [15:35:54] ottomata: proceeding with stat* [15:36:33] back a-team [15:36:45] how is it going hadoop-prod-guys? [15:36:54] oh ya [15:36:57] ha from a previous test :) [15:36:59] great [15:37:03] milimetric: We can meet if you want, but might be better with ottomata and elukey [15:37:17] yeah, let's wait for everyone [15:37:43] milimetric: just to be sure: it's the rest thing, or something else? [15:37:56] joal: going well. elukey is driving 100%, i hear good things :) [15:38:13] yeah, it seems counter-productive to start setting up infrastructure that is hardening the tech debt we all want to get rid of anyway [15:38:25] awesome ottomata :) Thanks a lot elukey for driving ! [15:38:36] makes sense milimetric [15:38:50] milimetric: I had not seen the map in tools, it's cool ! [15:39:03] it's super old [15:39:18] I think [15:39:28] 2014 from what I read [15:39:43] I'd love to get a handle on all this and centralize / simplify [15:40:09] so I am proceeding with stat, druid and an1003/1027 nodes [15:40:10] milimetric: the curse and the blessing of openness: plenty, but messy :) [15:40:44] milimetric: are you talking about the services api metrics thing? [15:40:55] yes [15:40:59] ah [15:41:27]