[05:49:23] <fdans>	 nuria milimetric: thank you for the review and the changes!
[05:58:39] <wikibugs>	 10Analytics-EventLogging, 06Analytics-Kanban, 10DBA, 05WMF-NDA: Drop tables with no events in last 90 days. - https://phabricator.wikimedia.org/T161855#3145706 (10Marostegui) That is fine, thanks for the heads up. Normally I like to run those statements with:  ``` set session sql_log_bin=0 ; DROP TABLE IF...
[07:14:47] <elukey>	 so very good news, the test on analytics1039 seems a success
[07:15:12] <elukey>	 /var/log/mcelog (cpu temperature alarms) was logging every 10 minutes throttling alerts
[07:15:26] <elukey>	 and the last one is exactly before Chris applied the new thermal paste
[07:15:33] <elukey>	 \o/
[07:29:39] <moritzm>	 the same is also needed for quite a range of app servers... https://phabricator.wikimedia.org/T149287
[07:33:45] <wikibugs>	 10Analytics, 10Analytics-Cluster, 06Operations, 10ops-eqiad, 15User-Elukey: Analytics hosts showed high temperature alarms - https://phabricator.wikimedia.org/T132256#3145801 (10elukey) I checked today mcelog and the thermal errors stopped right after Chris applied the thermal paste!   New list of affect...
[07:35:59] <elukey>	 commented :)
[07:36:36] <elukey>	 I think that we have those alarms everywhere, not sure if there is a bigger problem (high temp in the DC?) or not
[07:50:18] <elukey>	 moritzm: ah I didn't know about cat /sys/class/thermal/thermal_zone*/temp !
[07:51:09] <wikibugs_>	 10Analytics, 10Analytics-Cluster, 06Operations, 10ops-eqiad, 15User-Elukey: Analytics hosts showed high temperature alarms - https://phabricator.wikimedia.org/T132256#3145858 (10elukey)
[07:51:26] <wikibugs_>	 10Analytics-EventLogging, 06Analytics-Kanban, 10DBA, 05WMF-NDA: Drop tables with no events in last 90 days. - https://phabricator.wikimedia.org/T161855#3145859 (10jcrespo) +1
[07:54:37] <moritzm>	 elukey: no, the ones on the app servers seem specific to some servers and crusty thermal paste
[07:54:53] <moritzm>	 (or none at all, no idea about how they're configured)
[07:55:51] <elukey>	 moritzm: in the age of cloud we still deal with crusty thermal paste :P
[08:00:33] <moritzm>	 in the age of cloud it's just someone else dealing with the thermal paste :-)
[08:07:22] <elukey>	 hahahah exactly
[08:07:44] <elukey>	 I added the analytics task undert the main one for thermal alarms
[08:08:10] <elukey>	 maybe we could schedule with Chris how to apply the new thermal paste everywhere
[08:08:19] <elukey>	 but it is a big work :)
[08:18:18] <joal>	 Hi elukey b
[08:20:31] <elukey>	 o/
[08:22:39] <joal>	 Thanks for following up on thermal paste :)
[08:25:17] <joal>	 elukey: About spark jobs failing on workers restart: https://issues.apache.org/jira/browse/SPARK-17485
[08:27:28] <elukey>	 will check :)
[08:27:54] <joal>	 elukey: nothing to do really, just that 1 container failure means full application failure
[08:27:59] * joal is sad
[08:30:27] <elukey>	 yeah this is not really good
[08:31:27] <joal>	 elukey: this however explains why my jobs are failing
[09:18:57] <elukey>	 (brb)
[10:46:39] <joal>	 takinmg a break a-team
[10:50:15] <elukey>	 I am going to step away from keyboard for early lunch + errand 
[12:24:18] <wikibugs_>	 (03CR) 10Mforns: [V: 032 C: 032] "Merging, as all comments in reviews have been addressed, and code has already been used to populate the production pipeline." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/339421 (https://phabricator.wikimedia.org/T156388) (owner: 10Mforns)
[12:30:00] <wikibugs_>	 (03CR) 10Mforns: [V: 032 C: 032] "Merging, as code has +1, and has already been used to populate the production pipeline." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/344914 (https://phabricator.wikimedia.org/T156388) (owner: 10Mforns)
[12:56:40] <elukey>	 joal: reimaging analytics1047 and 1048 ok?
[12:56:54] <elukey>	 let me check first if I see your username in there :P
[12:57:29] <elukey>	 yesss
[12:57:41] <joal>	 elukey: please don't touch machines having spark executors, please, job is 80% done :(
[12:57:53] <elukey>	 hahahah
[12:57:57] <elukey>	 yep yep I am checking
[12:57:57] <joal>	 ;)
[12:58:02] <joal>	 Thanks mate
[12:58:33] <elukey>	 I will probably skip reimages today, you are everywhere :D
[12:58:46] <joal>	 elukey: I have 64 executors, si I'm pretty sure I'm everywhere :)
[12:58:51] <joal>	 thanks for that elukey 
[12:59:08] <joal>	 elukey: I really want to find a way to overcome the problem, but I don't know how ...
[12:59:13] <joal>	 Actually I might know how
[12:59:18] <joal>	 hm, need to think
[13:00:42] <elukey>	 analytics1034.eqiad.wmnet seems free :P
[13:01:27] <joal>	 not from my view
[13:02:05] <joal>	 elukey: https://yarn.wikimedia.org/proxy/application_1488294419903_102181/jobs/
[13:02:16] <joal>	 elukey: even better: https://yarn.wikimedia.org/proxy/application_1488294419903_102181/executors/
[13:04:44] <elukey>	 ahh okok!
[13:04:52] <elukey>	 I tried ps aux | grep joal
[13:04:54] <elukey>	 :P
[13:05:02] <elukey>	 anyhow, will restart on Monday
[13:19:57] <mforns>	 elukey, joal, hi! can any of you guys give me a hand with git?? :]
[13:20:29] <elukey>	 I usually follow this mantra: http://ohshitgit.com/
[13:20:34] <elukey>	 :D
[13:20:43] <elukey>	 jokes aside, I am not an expert but shoot!
[13:21:01] <mforns>	 elukey, xD
[13:21:03] <mforns>	 ok
[13:21:17] <mforns>	 soooo, I have a change that I want to merge
[13:21:29] <mforns>	 https://gerrit.wikimedia.org/r/#/c/344914/
[13:21:58] <mforns>	 but when I try to, it says: depends on change that was not submitted
[13:22:58] <mforns>	 now, the thing is, all dependencies have been merged indeed!
[13:23:49] <mforns>	 In my machine I try to: git checkout master; git pull; git checkout <patch>; git rebase master
[13:24:29] <mforns>	 and it does a weird thing, it duplicates a commit that was already merged... dunno very weird
[13:25:41] <joal>	 mforns: I think I have experienced that before, not cool :(
[13:26:38] <joal>	 mforns: I suggest creating a new branch from master, then cherry pick the commit you want to submit
[13:26:50] <joal>	 But I'm no expert neither
[13:27:14] <mforns>	 aha
[13:27:19] <mforns>	 will try, thanks!
[13:27:49] <elukey>	 mforns: wait a min
[13:28:00] <mforns>	 elukey, you mean creating another patch and abandon the other one, right?
[13:28:03] <elukey>	 I tried to rebase https://gerrit.wikimedia.org/r/#/c/344914/ and it tells me about a merge conflict
[13:28:06] <elukey>	 so what I'd do is
[13:28:16] <elukey>	 1) git fetch ssh://elukey@gerrit.wikimedia.org:29418/analytics/refinery refs/changes/14/344914/1 && git cherry-pick FETCH_HEAD
[13:28:36] <elukey>	 (not with my user of course.. I found it in the top right corner, download -> cherry pick)
[13:28:42] <mforns>	 yep
[13:28:46] <elukey>	 2) Fix it
[13:28:54] <elukey>	 3) git review again
[13:29:01] <elukey>	 before 3)
[13:29:07] <elukey>	 3-a) git pull --rebase
[13:29:10] <elukey>	 to be sure
[13:29:13] <mforns>	 K
[13:29:13] <elukey>	 and you should be ok
[13:29:23] <mforns>	 :]
[13:30:22] <mforns>	 the thing is, there's nothing to fix!
[13:30:55] * elukey tries
[13:31:15] <wikibugs>	 (03PS2) 10Mforns: Fix domain_abbrev_map job to disambiguate wikimedia projects [analytics/refinery] - 10https://gerrit.wikimedia.org/r/344914 (https://phabricator.wikimedia.org/T156388)
[13:31:45] <mforns>	 elukey, nvm! I think it will work
[13:32:06] <wikibugs_>	 (03PS3) 10Elukey: Fix domain_abbrev_map job to disambiguate wikimedia projects [analytics/refinery] - 10https://gerrit.wikimedia.org/r/344914 (https://phabricator.wikimedia.org/T156388) (owner: 10Mforns)
[13:32:17] <elukey>	 mforns: sorry I tried git review with --^
[13:32:19] <elukey>	 :(
[13:32:30] <mforns>	 xD
[13:32:31] <elukey>	 you cna discard PS3 then
[13:32:36] <elukey>	 or use it
[13:33:33] <elukey>	 mforns: do you want to merge?
[13:33:47] <mforns>	 elukey, looking, I think it looks good, one sec
[13:34:20] <mforns>	 elukey, yea, there's no difference between ps2 and ps3
[13:34:45] <mforns>	 I'll merge
[13:35:29] <wikibugs>	 (03CR) 10Mforns: [V: 032 C: 032] "Merging, as code has +1, and has already been used to populate the production pipeline." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/344914 (https://phabricator.wikimedia.org/T156388) (owner: 10Mforns)
[13:35:42] <mforns>	 elukey, \o/ thanks
[13:36:02] <elukey>	 nice!
[13:38:45] <wikibugs>	 (03PS5) 10Mforns: Use both projectcounts raw and all sites to load cassandra [analytics/refinery] - 10https://gerrit.wikimedia.org/r/345144 (https://phabricator.wikimedia.org/T161494)
[13:40:20] <wikibugs_>	 (03CR) 10Mforns: [V: 032 C: 032] "Merging, as code has already been used to populate the production pipeline, as discussed in stand-up." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/345144 (https://phabricator.wikimedia.org/T161494) (owner: 10Mforns)
[13:52:10] <wikibugs_>	 10Analytics, 10Analytics-Dashiki, 13Patch-For-Review: Create dashboard for upload wizard - https://phabricator.wikimedia.org/T159233#3146585 (10Milimetric) @Nuria: let's not make @matthiasmullie repeat himself too much, I had already started working with him and this use case seems fine to me.  You and I can...
[14:02:45] <wikibugs>	 10Analytics-Tech-community-metrics: Git code repository is listed but not all recent activity in it is shown on wikimedia.biterg.io - https://phabricator.wikimedia.org/T161211#3146614 (10Aklapper)
[14:02:55] <ottomata>	 elukey:  i'm going to do analytics1047, ok?
[14:03:39] <elukey>	 ottomata: hello! I didn't do anything today since joal is running a huge Spark job and one container failures == all failed
[14:04:05] <elukey>	 it seems due to a bug in spark if I got it correctly from Joseph
[14:04:12] <ottomata>	 oo
[14:04:17] <elukey>	 yeah :(
[14:04:18] <ottomata>	 ok so better to wait then
[14:04:19] <ottomata>	 ok
[14:05:48] <joal>	 ottomata, elukey: Job failed for another reason - Killing it
[14:05:56] <joal>	 ottomata, elukey: Please go ahead !
[14:06:03] <wikibugs_>	 10Analytics-EventLogging, 06Analytics-Kanban: Research Spike: Better support for Eventlogging data  on hive - https://phabricator.wikimedia.org/T153328#3146622 (10Ottomata)
[14:06:07] <ottomata>	 oh ok
[14:07:23] <fdans>	 joal: do you have a few minutes pre standup to talk about removal of is_productive?
[14:07:34] <joal>	 sure fdans !
[14:07:41] <ottomata>	 ok elukey i'm proceeding with 1047
[14:07:52] <fdans>	 joal: now a la grotte?
[14:08:01] <elukey>	 ottomata: let's double up! 104[78]
[14:08:04] <joal>	 Oui fdans, a la grotte !
[14:08:11] <ottomata>	 ok!
[14:08:15] <ottomata>	 you doing 1048?
[14:08:21] <ottomata>	 i can do both if you prefer?
[14:08:22] <elukey>	 the wmf-reimage script can take multiple hosts! 
[14:08:33] <elukey>	 then in the end I'll fix 1048
[14:08:35] <ottomata>	 ok
[14:08:38] <ottomata>	 oh
[14:08:38] <elukey>	 super
[14:08:39] <ottomata>	 cool
[14:08:43] <ottomata>	 so i should give it both then?
[14:08:49] <elukey>	 yep!
[14:10:05] <ottomata>	 ok stopping  nodemanager after icinga downtime scheduled
[14:10:38] <ottomata>	 waiting for jobs to clear
[14:10:47] <elukey>	 ottomata: in the meantime I am merging the an1027 nuke change
[14:10:51] <ottomata>	 cool
[14:19:56] <elukey>	 an1027 is officially role spare
[14:20:22] <icinga-wm>	 PROBLEM - Hue Server on analytics1027 is CRITICAL: PROCS CRITICAL: 0 processes with command name python2.7, args /usr/lib/hue/build/env/bin/hue
[14:20:44] <ottomata>	 :)
[14:21:03] <elukey>	 aaah icinga
[14:22:42] <wikibugs_>	 06Analytics-Kanban: Kill limn1 - https://phabricator.wikimedia.org/T146308#3146635 (10chasemp) 05Open>03Resolved a:03chasemp
[14:23:39] <wikibugs>	 06Analytics-Kanban: Kill limn1 - https://phabricator.wikimedia.org/T146308#2656945 (10chasemp)
[14:26:14] <ottomata>	 ok jobs drained, proceeding
[14:26:51] <wikibugs_>	 06Analytics-Kanban, 06Operations, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3146647 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by otto on neodymium.eqiad.wmnet for hosts: ``` ['analytics1047.eqiad.wmnet', 'analytics1048.eq...
[14:27:34] <wikibugs>	 06Analytics-Kanban: Update DataLake History schema to only contain "objective" measures - https://phabricator.wikimedia.org/T157362#3146648 (10JAllemandou) Notes from talk with @fdans:  - Big picture:     - Changes apply to mediawiki denormalized history dataset.     - It impacts Spark job and Hive table creatio...
[14:28:42] <wikibugs_>	 06Analytics-Kanban: Update DataLake History schema (`revision_is productive` --> `revision_time_to_identity_revert`) - https://phabricator.wikimedia.org/T157362#3146652 (10JAllemandou)
[14:30:20] <wikibugs_>	 10Analytics, 10DBA, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3146654 (10Marostegui) Hi! How's the process to decommission db1047 going?
[14:32:17] <wikibugs>	 06Analytics-Kanban, 06DC-Ops, 06Operations, 10ops-eqiad, 13Patch-For-Review: Decom/Reclaim analytics1027 - https://phabricator.wikimedia.org/T161597#3146682 (10elukey)
[14:33:01] <wikibugs>	 06Analytics-Kanban, 06DC-Ops, 06Operations, 10ops-eqiad, 13Patch-For-Review: Decom/Reclaim analytics1027 - https://phabricator.wikimedia.org/T161597#3136552 (10elukey) a:05elukey>03Cmjohnson
[14:33:55] <wikibugs>	 06Analytics-Kanban, 06DC-Ops, 06Operations, 10ops-eqiad, 13Patch-For-Review: Decom/Reclaim analytics1027 - https://phabricator.wikimedia.org/T161597#3136552 (10elukey) Chris the host is atm set as role::spare, but whenever you are ready I can cleanup the rest of the puppet entries (DHCP, etc..). Let me k...
[14:52:04] <wikibugs>	 06Analytics-Kanban, 06Operations, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3146723 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['analytics1047.eqiad.wmnet', 'analytics1048.eqiad.wmnet'] ```  and were **ALL** successful.
[14:53:03] <elukey>	 ottomata: ---^ 
[14:53:09] <elukey>	 taking 1048 ok?
[14:54:40] <wikibugs>	 10Analytics-EventLogging, 06Analytics-Kanban: Research Spike: Better support for Eventlogging data  on hive - https://phabricator.wikimedia.org/T153328#3146728 (10Ottomata) > Filters mediawiki.revision-create  On second thought, this seems a little crazy to me.  Perhaps we can just always import with `revision...
[14:54:56] <ottomata>	 elukey: ya proceed
[14:54:58] <ottomata>	 do the & thing!
[14:54:59] <ottomata>	 :)
[14:55:03] <ottomata>	 i'll do 1047
[14:55:45] <elukey>	 sure! good suggestion, didn't think about it 
[14:55:46] <elukey>	 :)
[14:55:55] <elukey>	 does the rest looks good? 
[14:55:58] <elukey>	 (the doc)
[14:57:30] <ottomata>	 ya!, i mean, i know to create the mount dirs, but the doc doesn't say that
[14:57:31] <ottomata>	 maybe it should
[14:57:33] <ottomata>	 before mount -a
[14:58:05] <elukey>	 it is done by the puppet run
[14:58:15] <elukey>	 that then fails before starting hdfs datanode
[14:58:24] <elukey>	 maybe I should have mentioned it
[14:58:25] <elukey>	 :(
[14:58:28] <ottomata>	 hm
[14:58:38] <ottomata>	 the letter mounts are created?  hmm i guess so
[14:58:43] <elukey>	 yep yep!
[14:58:45] <elukey>	 I just did it
[14:58:45] <ottomata>	  /var/lib/hadoop/data doesn't exist for me
[14:58:46] <elukey>	 :)
[14:58:49] <ottomata>	 but i didn't do a manual puppet run
[14:59:04] <elukey>	 ah yes that one is the only one that needs manual creation (we discussed it yesterday IIRC)
[15:00:08] <nuria>	 a-team: standddupppp
[15:00:38] <elukey>	 ottomata: what is the command that you use with '&' ?
[15:00:42] <elukey>	 I got syntax error :/
[15:01:00] <ottomata>	 replace ; with &
[15:01:01] <ottomata>	 and space
[15:01:17] <ottomata>	 just edited
[15:01:27] <elukey>	 ahhhh
[15:01:31] <elukey>	 without the ;
[15:01:36] * elukey ignorant as always
[15:03:09] <ottomata>	 trying to join...
[15:19:22] <elukey>	 ottomata: sudo cumin 'R:class = role::analytics_cluster::hadoop::worker' 'uname -a'
[15:19:38] <elukey>	 this is an example of the cumin's power :)
[15:19:49] <ottomata>	 elukey:  does that work with any puppet class?
[15:19:51] <ottomata>	 or just roles?
[15:19:58] <elukey>	 any puppet class!
[15:20:01] <ottomata>	 !
[15:20:03] <ottomata>	 that's awesome!
[15:20:26] <ottomata>	 soooo yeah!
[15:20:28] <ottomata>	 sudo cumin 'R:class = cdh::oozie::server' 'uname -a'
[15:20:29] <ottomata>	 awesome!
[15:20:31] <elukey>	 sudo cumin 'R:File = /etc/ssl/localcerts/api.svc.eqiad.wmnet.chained.crt' "openssl x509 -in /etc/ssl/localcerts/api.svc.eqiad.wmnet.chained.crt -text -noout | grep DNS: | sed -e 's/DNS://g' -e 's/ //g'"
[15:20:40] <elukey>	 I used this one to compare certs the other day
[15:20:43] <elukey>	 (credis to Riccardo)
[15:20:59] <elukey>	 so even R:File is awesome
[15:21:12] <elukey>	 you can select hosts that have File resources in puppet 
[15:21:18] <ottomata>	 woah cool
[15:45:22] <elukey>	 ottomata: I left 1047 chowing in tmux, will get back in ~1hour to finish
[15:45:28] <elukey>	 err 1048
[15:45:38] <ottomata>	 k cool
[15:45:40] <ottomata>	 1047 is done
[15:46:31] <elukey>	 ah mine too nice!
[15:48:29] <ottomata>	 bbiab
[15:49:44] <elukey>	 1048 completed too
[15:49:50] <elukey>	 9 hosts on Debian :)
[15:49:54] <wikibugs>	 10Analytics-Tech-community-metrics, 06Developer-Relations (Apr-Jun 2017): Have "Last Attracted Developers" information for Gerrit (already exists for Git) - https://phabricator.wikimedia.org/T151161#3146820 (10Aklapper) Trying to find short term workarounds fir this problem,  * MZ's https://www.mediawiki.org/w...
[15:49:57] <elukey>	 and a much faster procedure thanks to Andrew
[15:50:26] <elukey>	 with a bit of work we can probably complete the cluster (12 new nodes + other ~20 reimages) during April
[15:51:04] <elukey>	 all right logging off  bit earlier today
[15:51:10] <elukey>	 have a nice weekend people :)
[16:29:47] <ottomata>	 elukey:  i'm going to do 2 more nodes
[16:29:50] <ottomata>	 ja?
[16:31:20] <wikibugs>	 (03PS1) 10DCausse: [cirrus] Distinguish morelike vs fulltext api search requests [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/345863
[16:31:55] <ottomata>	 doing 49 and 50
[16:54:15] <mforns>	 bye team! see you in a week :]
[16:56:41] <joal>	 Bye mforns ! Safe trip home !
[16:56:47] <mforns>	 :D
[16:58:19] <ottomata>	 joal:  i've stopped 2 nodemanagers, but its taking a long to drain them
[16:58:28] <ottomata>	 maybe I shouldn't reinstall right now?
[16:58:30] <joal>	 ottomata: drain?
[16:58:32] <ottomata>	 long time*
[16:58:34] <ottomata>	 ya jobs
[16:58:36] <ottomata>	 i geuss
[16:58:42] <ottomata>	 we stop node managers so no new jobs get assigned
[16:58:44] <ottomata>	 to those nodes
[16:58:47] <wikibugs_>	 (03CR) 10Bearloga: "Commenting here (as well as IRC) that my recommendation is to put these patterns into a separate enum the way we split SearchQueryFeatureR" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/345863 (owner: 10DCausse)
[16:58:47] <ottomata>	 but we wait for any running jobs to finished
[16:58:49] <ottomata>	 to be nice
[16:58:51] <joal>	 ottomata: You mean you've stopped node managers but didn't kill the containers?
[16:58:55] <ottomata>	 ya
[16:59:05] <joal>	 Ah !
[16:59:12] <joal>	 Which nodes are they?
[16:59:44] <ottomata>	 1049 1050
[17:00:23] <joal>	 ottomata: please go ahead and kill them
[17:00:30] <joal>	 ottomata: stuff should be robust enough
[17:00:38] <joal>	 If it's not, it's an issue
[17:01:41] <joal>	 ottomata: In certain jobs, I got containers lost and no failure
[17:01:51] <ottomata>	 hm ok
[17:01:57] <joal>	 ottomata: I have not been able to understand why certain job fails :(
[17:02:19] * joal checks log
[17:03:06] <ottomata>	 ya its mw history hmmm
[17:03:11] <ottomata>	 at least one of the jobs is
[17:03:17] <joal>	 ottomata: please go :)
[17:03:18] <ottomata>	 you sure?  I can wait joal
[17:03:20] <ottomata>	 we don't have to do this now
[17:03:42] <joal>	 ottomata: I've add plenty jobs failing lately, at least if this one fails it'll give me interesting info
[17:03:46] <ottomata>	 ok
[17:04:01] <joal>	 ottomata: AppMaster is on 1028, so no big deal should happen
[17:04:51] <ottomata>	 ooook
[17:05:00] <wikibugs_>	 06Analytics-Kanban, 06Operations, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3147080 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by otto on neodymium.eqiad.wmnet for hosts: ``` ['analytics1049.eqiad.wmnet', 'analytics1050.eq...
[17:06:06] <wikibugs>	 10Analytics-EventLogging, 06Analytics-Kanban, 10DBA, 06Operations, 13Patch-For-Review: Improve eventlogging replication procedure - https://phabricator.wikimedia.org/T124307#3147093 (10Ottomata) ping @Marostegui, in case you didn't see it: https://gerrit.wikimedia.org/r/345646
[17:08:49] <joal>	 ottomata: I have seen logs telling me 49 an 50 were gone, but job has not failed
[17:10:41] <wikibugs>	 10Analytics: Add new interesting fields in Mediawiki Denormalized History - https://phabricator.wikimedia.org/T161896#3147108 (10JAllemandou)
[17:11:48] <ottomata>	 great
[17:11:54] <joal>	 A-Team - Last day of the month, I'll care cassandra new jobs setup between this evening and tomorrow (drop legacy pre-project loading, move new per-project to production code)
[17:12:23] <joal>	 ottomata: While this is great, it also means less and less clue about why those jobs fail :(
[17:12:42] <milimetric>	 hm
[17:29:52] <wikibugs_>	 06Analytics-Kanban, 06Operations, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3147181 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['analytics1049.eqiad.wmnet', 'analytics1050.eqiad.wmnet'] ```  and were **ALL** successful.
[17:37:22] <ottomata>	 joal: ?
[17:38:33] <joal>	 yes ottomata ?
[17:42:25] <ottomata>	 "ottomata: While this is great, it also means less and less clue about why those jobs fail 
[17:42:26] <ottomata>	 "
[17:43:12] <joal>	 ottomata: The spark MWH history jobs have been failing lately, and I suspected machines restarts
[17:43:41] <joal>	 I think it has happened sometimes, but from what I have seen on the specofoc job run, I need to look for something else
[17:48:47] <ottomata>	 hm ok
[17:49:05] <ottomata>	 joal:  got a min for quick brain bounce about EL hive stuff?  have some thoughts
[17:57:00] <wikibugs>	 10Analytics, 06Discovery-Analysis: Get 'sparklyr' working on stats1002 - https://phabricator.wikimedia.org/T139487#3147238 (10mpopov) >>! In T139487#3125467, @Nuria wrote: > We are going to take a look at this, we can probably do it if it does not involve changing all the cluster configuration, migration to sp...
[17:58:14] <joal>	 ottomata: sure
[17:58:27] <joal>	 ottomata: batcave?
[18:00:26] <ottomata>	 ya 1 min...
[18:21:32] <wikibugs>	 10Analytics, 10MediaWiki-extensions-WikimediaEvents, 10The-Wikipedia-Library, 10Wikimedia-General-or-Unknown, and 4 others: Implement Schema:ExternalLinksChange - https://phabricator.wikimedia.org/T115119#3147272 (10Samwalton9) Of my last series of 9 test edits (30th March: https://test.wikipedia.org/w/ind...
[18:39:13] <wikibugs>	 10Analytics-Tech-community-metrics: List of open tasks in "maniphest_backlog" is not always sorted by "Days open" - https://phabricator.wikimedia.org/T161923#3147327 (10Aklapper)
[18:41:50] <wikibugs_>	 10Analytics-Tech-community-metrics, 06Developer-Relations (Apr-Jun 2017): Go through default Kibana widgets; decide which ones are not relevant for us and remove them - https://phabricator.wikimedia.org/T147001#3147338 (10Aklapper)
[18:57:05] <wikibugs_>	 10Analytics-EventLogging, 06Analytics-Kanban: Write Spark schema differ / Hive DDL generator - https://phabricator.wikimedia.org/T161924#3147352 (10Ottomata)
[19:03:50] <wikibugs>	 06Analytics-Kanban, 06Operations, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3147369 (10Ottomata) > Messages like the above one were logged frequently in the Yarn NodeManager logs. A daemon restart fixed the issue, but we didn't find any go...
[19:38:53] <wikibugs>	 10Analytics-Tech-community-metrics, 06Developer-Relations (Apr-Jun 2017): Go through default Kibana widgets; decide which ones are not relevant for us and remove them - https://phabricator.wikimedia.org/T147001#3147403 (10Aklapper)
[19:51:37] <wikibugs>	 10Analytics-Tech-community-metrics: Maniphest Backend: Provide statistics on resolving tasks - https://phabricator.wikimedia.org/T161926#3147431 (10Aklapper)
[19:56:43] <wikibugs_>	 10Analytics-Tech-community-metrics: Add remaining KPIs to Overview once available in kibana - https://phabricator.wikimedia.org/T116572#3147449 (10Aklapper)
[19:56:49] <wikibugs>	 10Analytics-Tech-community-metrics, 10Phabricator, 06Developer-Relations (Jan-Mar-2017): Decide on wanted metrics for Maniphest in kibana - https://phabricator.wikimedia.org/T28#3147447 (10Aklapper) 05stalled>03Open As T138002 got resolved a few days ago, * https://wikimedia.biterg.io/app/kibana#/dashboa...
[20:03:26] <wikibugs>	 10Analytics-Tech-community-metrics: Maniphest Backend: Consider having metrics covering *any* user activity in Maniphest - https://phabricator.wikimedia.org/T161928#3147468 (10Aklapper)
[20:04:23] <wikibugs>	 10Analytics-Tech-community-metrics, 10Phabricator, 06Developer-Relations (Jan-Mar-2017): Decide on wanted metrics for Maniphest in kibana - https://phabricator.wikimedia.org/T28#3147479 (10Aklapper) 05Open>03Resolved Going through / updating the way too long list of proposed metrics in the task descripti...
[20:05:07] <wikibugs_>	 10Analytics-Tech-community-metrics, 10Phabricator, 06Developer-Relations (Jan-Mar-2017): Decide on wanted metrics for Maniphest in kibana - https://phabricator.wikimedia.org/T28#3147487 (10Aklapper)
[20:24:42] <wikibugs>	 10Analytics-EventLogging, 06Analytics-Kanban: Research Spike: Better support for Eventlogging data  on hive - https://phabricator.wikimedia.org/T153328#3147537 (10Ottomata) Just parking this here:  I tried to `unionAll` two EventLogging DataFrames, and got: `org.apache.spark.sql.AnalysisException: unresolved o...
[20:40:41] <wikibugs_>	 (03PS9) 10Nuria: Support reportcard in Dashiki [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/344114 (https://phabricator.wikimedia.org/T143906) (owner: 10Fdans)
[20:45:30] <wikibugs>	 (03CR) 10Nuria: "I had to revert changes on aqs-api.js as they broke the vital-signs layout, base problem is that the way the multi-project queries were se" [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/344114 (https://phabricator.wikimedia.org/T143906) (owner: 10Fdans)
[20:57:30] <wikibugs_>	 10Analytics, 10Analytics-Dashiki: Refactor aqs api and usage for simplicity - https://phabricator.wikimedia.org/T161933#3147633 (10Milimetric)
[20:57:39] <wikibugs>	 (03CR) 10Milimetric: [V: 032 C: 032] "This looks good, I've made T161933 to follow-up on the refactor." [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/344114 (https://phabricator.wikimedia.org/T143906) (owner: 10Fdans)
[20:58:10] <wikibugs_>	 10Analytics, 10Analytics-Dashiki: Refactor aqs api and usage for simplicity - https://phabricator.wikimedia.org/T161933#3147646 (10Milimetric) @fdans can of course feel free to jump on this too.
[21:09:30] <wikibugs>	 (03PS1) 10Milimetric: Deploy reportcard [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/345889 (https://phabricator.wikimedia.org/T143906)
[21:09:41] <wikibugs_>	 (03CR) 10Milimetric: [V: 032 C: 032] Deploy reportcard [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/345889 (https://phabricator.wikimedia.org/T143906) (owner: 10Milimetric)
[21:13:58] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10MediaWiki-Vagrant, 06Services (watching): Vagrant git-update error for event logging - https://phabricator.wikimedia.org/T161935#3147676 (10Pchelolo)
[21:49:05] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10MediaWiki-Vagrant, 06Services (watching): Vagrant git-update error for event logging - https://phabricator.wikimedia.org/T161935#3147710 (10Pchelolo) The connection timeout is not an issue here, it fails even without it.
[23:07:14] <wikibugs_>	 10Analytics-EventLogging, 06Analytics-Kanban: Research Spike: Better support for Eventlogging data  on hive - https://phabricator.wikimedia.org/T153328#3147768 (10Tbayer) >>! In T153328#3146728, @Ottomata wrote: >> Filters mediawiki.revision-create >  > On second thought, this seems a little crazy to me.   Yup...