[03:48:39] <HaeB>	 bearloga: BTW (saw your tweet) there is now https://phabricator.wikimedia.org/T160941 ;)
[04:06:28] <milimetric>	 aah, I wish I could get to all the things :(
[06:25:20] <wikibugs__>	 06Analytics-Kanban, 10DBA, 13Patch-For-Review: Change length of userAgent column on EL tables - https://phabricator.wikimedia.org/T160454#3142820 (10Marostegui) The eventlogging script on db1047 is failing due to: ``` Thu Mar 30 06:07:49 UTC 2017 localhost ContentTranslationError_11767097, createERROR 1005 (...
[08:28:09] <wikibugs>	 10Analytics-EventLogging, 06Analytics-Kanban, 10DBA, 13Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} - https://phabricator.wikimedia.org/T125135#3143065 (10jcrespo) That would be huge win! And it would make the terbium back-filling unnecessary finally!
[08:44:32] <wikibugs__>	 06Analytics-Kanban, 10DBA, 13Patch-For-Review: Change length of userAgent column on EL tables - https://phabricator.wikimedia.org/T160454#3143073 (10jcrespo) Should we increase open_files_limit or do you think this was a one-time issue due to the rename process?
[08:46:06] <wikibugs>	 06Analytics-Kanban, 10DBA, 13Patch-For-Review: Change length of userAgent column on EL tables - https://phabricator.wikimedia.org/T160454#3143081 (10Marostegui) I thought about it, but it has not happened until we did the rename thingy yesterday. So I would leave it for now.
[11:18:26] <wikibugs>	 10Analytics-Tech-community-metrics: Git code repository is listed but not all recent activity is shown on wikimedia.biterg.io - https://phabricator.wikimedia.org/T161211#3143339 (10Aklapper)
[12:12:29] <wikibugs__>	 10Analytics-Tech-community-metrics, 10Gerrit: Numerous Gerrit (draft) patchsets cannot be accessed: "Cannot display change because it has no revisions." - https://phabricator.wikimedia.org/T161207#3143407 (10Aklapper)
[12:38:36] <elukey>	 ah snap I found https://github.com/wikimedia/varnishkafka/issues/5
[12:38:39] <elukey>	 from October..
[12:38:47] <elukey>	 I put myself in the watchlist
[12:45:17] <milimetric>	 elukey: yeah, gerrit is just not good for actual open source work.  It gets the review part right, but completely fails on the social part
[12:46:03] <milimetric>	 I wonder if we should think about moving to differential
[12:48:11] <joal>	 elukey: Would you have aminute for some explanation to the newby I am on puppet in labs?
[12:48:24] <milimetric>	 joal: whatcha doin?
[12:49:45] <joal>	 milimetric: I have launched an instance on labs yesterday, and couldn't find how to apply it puppet role
[12:51:21] <milimetric>	 well, did you make it a self-hosted puppetmaster yet?
[12:51:36] <joal>	 milimetric: didn't do anything :)
[12:51:53] <milimetric>	 one sec, looking up instructions in case they changed
[12:52:02] <joal>	 milimetric: it makes a long time I had not done some labs infra, and last time there was possi
[12:52:05] <joal>	 bility
[12:52:10] <joal>	 to apply roles through UI
[12:52:12] <milimetric>	 https://wikitech.wikimedia.org/wiki/Standalone_puppetmaster
[12:52:13] <joal>	 IIRC
[12:52:24] <milimetric>	 right, those are the built-in roles
[12:52:35] <milimetric>	 you can do that from wikitech... unless they moved it, one sec
[12:52:48] <elukey>	 wait wait, why a self hosted puppet master? :)
[12:53:08] <elukey>	 joal: have you access to horizon?
[12:53:29] <joal>	 elukey: I think I don't - I tried yesterday and didn't manage to login
[12:54:33] <elukey>	 joal: you should be able to access via wikitech credentials..
[12:54:42] <milimetric>	 aha, they moved puppet stuff to horizon
[12:55:20] <milimetric>	 I thought he needed a role he was working on, to test, self-hosted is still the only way to do that, right?
[12:55:23] <elukey>	 if the puppet role is already in puppet and you don't need to live-hack it etc.. it suffice to connect to Horizon and select the role
[12:55:36] <joal>	 elukey: I suspected something like that :)
[12:56:04] <elukey>	 milimetric: yes exactly, my understanding is that self-hosted pm is a super painful process that you need only if you have to hack :D
[12:56:06] <joal>	 elukey: horizin gives me invalid creds message (I use GAuthenticator as 2nd validation)
[12:56:45] <elukey>	 mmmm
[12:57:52] <joal>	 elukey: I think 2 factor auth was not setup (from what I see), even if I had an Gauth in place (weird)
[12:58:27] <elukey>	 so you can now login?
[12:58:32] <milimetric>	 yeah, they sent an email a while back that it was required, we should've mentioned it at standup
[13:02:24] <joal>	 I triple checked 2-factor auth on wikitech - works fine - And Horizon style doesn't allow me :(
[13:02:57] <milimetric>	 hm, I remember something weird like that, one sec
[13:04:42] <milimetric>	 joal: it looks like you need to be a projectadmin on a project, elukey is that just being in the admin role? or something else?
[13:04:56] <joal>	 Yay ! Finally managed to
[13:05:04] <joal>	 milimetric: it was a space problem
[13:05:21] <milimetric>	 oh yay.  Ok, I just checked and you're projectadmin on analytics
[13:05:22] <elukey>	 :O
[13:05:25] <ottomata>	 hiii
[13:05:34] <joal>	 milimetric: My 2 auth program gives me a code with 6 digits, 3 then space then 3
[13:05:42] <milimetric>	 whaaaa?!
[13:05:49] <joal>	 Well that space causes a problem to horizon
[13:05:58] <joal>	 But not to wikitech from what I have tested
[13:06:03] <joal>	 WEIRDOOOOH
[13:06:06] <milimetric>	 I've never heard of a space in a 2fa
[13:06:30] <milimetric>	 so you just leave the space out?
[13:06:40] <joal>	 correct
[13:06:53] <milimetric>	 cool, and did you find puppet in there?
[13:07:04] <joal>	 not yet, still looking around :)
[13:07:25] <joal>	 I did !
[13:07:29] <elukey>	 joal: if you click on the instance there should be a puppet tab
[13:07:31] <elukey>	 ok :)
[13:07:42] <joal>	 And it looks ottomata already did the things for me yesterday :)
[13:07:54] <ottomata>	 i broke some things
[13:07:57] <ottomata>	 we both broke things
[13:07:59] <joal>	 Ah by the way elukey - Looks like I'll have a strong incentive to moving fast to jessie ;)
[13:08:07] <joal>	 ottomata: I surely did !
[13:08:26] <elukey>	 joal: should I be scared? :D
[13:08:27] <ottomata>	 joal:  what is status?  i was just going to stand you up a new jessie cluster this morning
[13:08:28] <joal>	 ottomata: I kinda felt removing 3 instances like that would have broken something :)
[13:08:52] <joal>	 ottomata: I managed to run a job with jessie when running local
[13:09:03] <joal>	 I'd love to have a small cluster to test it yarn mode
[13:09:17] <joal>	 (like 1 master and 1 or 2 workers)
[13:09:23] <ottomata>	 ok, i'm going to just blast all the existing nodes and make a new 3 node node cluster non HA
[13:09:27] <joal>	 needs to be jessie, and to have some packages installed
[13:09:29] <wikibugs>	 10Analytics-Tech-community-metrics, 06Developer-Relations (Jan-Mar-2017): Investigate why there is a mismatch between six names and certain email address in mediawiki-identities data - https://phabricator.wikimedia.org/T123643#3143480 (10Aklapper) a:03Aklapper
[13:09:45] <joal>	 awesome, thanks a lor andrew
[13:09:58] <joal>	 ottomata: By the way, can I shadow you on that (learning will bei nteresting)
[13:10:21] <wikibugs__>	 10Analytics-Tech-community-metrics, 06Developer-Relations (Jan-Mar-2017): Investigate why there is a mismatch between six names and certain email address in mediawiki-identities data - https://phabricator.wikimedia.org/T123643#1934774 (10Aklapper) 05Open>03Resolved **<tl;dr>** This is wrong / corrupted dat...
[13:11:24] <ottomata>	 joal:  sure!
[13:11:25] <ottomata>	 bc?
[13:11:31] <joal>	 OMW !
[13:11:47] <wikibugs>	 10Analytics-Tech-community-metrics, 06Developer-Relations (Jan-Mar-2017): Investigate why there is a mismatch between six names and certain email address in mediawiki-identities data - https://phabricator.wikimedia.org/T123643#3143493 (10Aklapper)
[13:25:13] <wikibugs>	 (03PS7) 10Fdans: Add changes to support reportcard in Dashiki [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/344114 (https://phabricator.wikimedia.org/T143906)
[13:26:11] <milimetric>	 fdans: want me to review that?  I see it doesn't have a WIP
[13:27:11] <fdans>	 milimetric: that'd be great! It's missing a couple of tests that I'm adding now, just fyi
[13:27:18] <milimetric>	 k
[13:31:44] <wikibugs__>	 (03CR) 10Milimetric: [V: 032 C: 032] Update sqoop script to better handle failures (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/345327 (owner: 10Joal)
[13:32:16] <joal>	 lost you andrew
[13:33:02] <joal>	 ottomata: --^
[13:34:01] <joal>	 thanks milimetric :)
[13:59:38] <wikibugs>	 06Analytics-Kanban, 15User-Elukey: Review the recent Varnishkafka patches - https://phabricator.wikimedia.org/T158854#3143560 (10elukey) Patch reviewed! Definitely a little bug (not relevant to our settings btw) gets resolved, but I am unsure about the other ones. Asked clarifications.
[14:00:51] <elukey>	 finally vk patches reviewed \o/
[14:01:35] <musikanimal>	 hello analytics! I need access to some EventLogging data, I understand you may need to set it up to replicate in Hive?
[14:01:46] <musikanimal>	 specifically, this is for cookie blocks
[14:03:41] <elukey>	 ottomata: https://gerrit.wikimedia.org/r/#/c/345509/2/wmf-config/ProductionServices.php
[14:07:43] <musikanimal>	 nvm, I found it :)
[14:14:04] <ottomata>	 musikanimal:  ya!  hi EL has been going to HDFS for a while
[14:14:12] <ottomata>	 we plan to make the hive integration better soonish
[14:14:19] <ottomata>	 for now you have to do it like you probably just found on wikitech
[14:14:33] <ottomata>	 elukey:  .discovery.?!
[14:20:36] <joal>	 ottomata: looks like a success :)
[14:20:47] <joal>	 ottomata: I had to have another manual step though :(
[14:20:49] <musikanimal>	 ottomata: is EventLogging data exposed in grafana?
[14:20:59] <musikanimal>	 I found what looks to be like what I want, but I see no data
[14:21:04] <musikanimal>	 so we could have just set it up wrong
[14:21:18] <musikanimal>	 but also I don't see "grafana" mentioned in the wikitech docs
[14:21:45] <musikanimal>	 https://grafana.wikimedia.org/dashboard/db/eventlogging?orgId=1&var-topic=eventlogging_CookieBlock&from=1469071723354&to=1490882923354
[14:23:17] <ottomata>	 musikanimal:  sorta, but nothing beyond # of events
[14:23:45] <ottomata>	 you can see there aren't many messages for that schema, so it is hard to graph there
[14:23:46] <ottomata>	 hm
[14:24:30] <ottomata>	 musikanimal:  you can also kinda see it here
[14:24:31] <ottomata>	 https://grafana.wikimedia.org/dashboard/db/kafka-by-topic?from=1469071723354&to=1490882923354&refresh=5m&orgId=1&var-cluster=analytics-eqiad&var-kafka_brokers=All&var-topic=eventlogging_CookieBlock
[14:24:34] <ottomata>	 but yeah hardly any messages
[14:25:00] <ottomata>	 joal:  what's your other manual step?
[14:25:01] <musikanimal>	 ottomata: nice! now I see the data I expected
[14:25:09] <elukey>	 ottomata: yep! Now the auth dns is able to respond with the active DC IP
[14:25:11] <musikanimal>	 we only had it set up on testwiki
[14:25:15] <elukey>	 just wanted to let you know
[14:25:20] <elukey>	 it is for the switchover
[14:25:21] <joal>	 ottomata: downloading some NLTK resources through python
[14:25:40] <joal>	 ottomata: I think this step can be seen as putting some resource under /usr/local/share/nltk_data
[14:25:47] * elukey brb
[14:28:27] <ottomata>	 elukey:  cool!
[14:28:29] <ottomata>	 that's pretty awesome
[14:28:40] <ottomata>	 hmmm
[14:28:41] <ottomata>	 ok...
[14:31:01] <wikibugs__>	 06Analytics-Kanban, 10DBA, 13Patch-For-Review: Change length of userAgent column on EL tables - https://phabricator.wikimedia.org/T160454#3143638 (10Ottomata) Wow, ok.  Thanks.
[14:32:44] <musikanimal>	 ottomata: hey, so what does "Kafka Messages by topic" mean?
[14:32:56] <musikanimal>	 sorry for not reading the docs
[14:33:32] <ottomata>	 musikanimal:  do you know what kafka is?
[14:33:50] <musikanimal>	 haha no I guess not
[14:33:57] <ottomata>	 ok, simple answer then :)
[14:34:03] <ottomata>	 kafka is our distributed log buffer
[14:34:17] <ottomata>	 so, its what allows us to ship logs and streaming data around the cluster 
[14:34:19] <ottomata>	 kinda likea  message queue
[14:34:31] <ottomata>	 EventLogging uses it
[14:35:00] <musikanimal>	 oh ok
[14:35:04] <ottomata>	 it gets incoming events from browsers or wherever, parses them, then produces them to separate kafka 'topics' (queues)
[14:35:06] <ottomata>	 so
[14:35:18] <ottomata>	 the EL schema based ones  are all prefixed with 'eventlogging_'
[14:35:18] <wikibugs__>	 10Analytics, 10EventBus, 05MW-1.29-release (WMF-deploy-2017-03-28_(1.29.0-wmf.18)), 06Services (done): Page properties-change event is rejected if page was deleted - https://phabricator.wikimedia.org/T158702#3143645 (10Pchelolo) 05Open>03Resolved Deployed, verified, resolving.
[14:35:33] <ottomata>	 so all messages for a particular schema name will go into 'eventlogging_MySchemaName' topic
[14:35:51] <ottomata>	 from there
[14:35:59] <ottomata>	 the messages are consumed by various pieces
[14:36:03] <ottomata>	 one of which is a mysql inserter
[14:36:07] <ottomata>	 that's how the data gets into mysql
[14:36:15] <ottomata>	 another is an HDFS writer
[14:36:20] <ottomata>	 that is how the data gets into HDFS
[14:36:24] <joal>	 ottomata: Tried to provide the nltk corpora as resource to spark, but didn't work :(
[14:36:51] <ottomata>	 joal:  i don't know what that is, but is that something we can just put into hdfs via refinery or something?
[14:37:05] <musikanimal>	 ottomata: I see, thank you for the explanation!
[14:37:24] <musikanimal>	 so i was kind of hoping to know in integers how many events were logged, is that possible?
[14:37:38] <ottomata>	 musikanimal:  https://wikitech.wikimedia.org/wiki/EventLogging
[14:37:41] <ottomata>	 has a little diagram
[14:38:08] <ottomata>	 also,i gave a lightning tech talk about it a while ago
[14:38:10] <ottomata>	 if you are curious
[14:38:10] <ottomata>	 https://www.youtube.com/watch?v=yUQ5d192z3M
[14:38:16] <ottomata>	 https://www.mediawiki.org/wiki/File:EventLogging_on_Kafka_-_Lightning_Talk.pdf
[14:39:19] <musikanimal>	 ottomata: ok thank you!
[14:41:45] <wikibugs__>	 10Analytics, 10Analytics-Dashiki, 13Patch-For-Review: Create dashboard for upload wizard - https://phabricator.wikimedia.org/T159233#3143678 (10matthiasmullie) Yes and no. I do have time to work on this, but no experience with Dashiki. So far, I've not had much luck getting anywhere myself, so some pointers...
[14:46:08] <joal>	 ottomata: it won't work with HDFS as far as I understand :(
[14:49:55] <joal>	 ottomata: nltk package as used in revscoring expects data in specific places: https://gist.github.com/jobar/2ee394541c4cec991b6b3e7078dfd85f
[14:50:26] <ottomata>	 oof, so it'd need to be on all the workers in one of those dirs?
[14:50:28] <ottomata>	 ergh
[14:50:54] <ottomata>	 joal:  is there an nltk data python package?
[14:50:56] <joal>	 ottomata: Mwarf indeed
[14:51:04] <joal>	 ottomata: didn't check !!!1
[14:51:18] <ottomata>	 http://www.nltk.org/data.html
[14:51:25] <ottomata>	 If you did not install the data to one of the above central locations, you will need to set the NLTK_DATA environment variable to specify the location of the data
[14:52:22] <ottomata>	 joal:  i betcha we could make a deb package that includes the data
[14:52:35] <joal>	 ottomata: Seems not too complicated
[14:52:50] <joal>	 ottomata: except that I don't know how to make packages :S
[14:54:19] <ottomata>	 haha yeah i can do for ya
[14:54:40] <joal>	 ottomata: from other tuto I see, they run the download command on every work node (ugly)
[14:54:44] <ottomata>	 yeah
[14:54:48] <ottomata>	 no good
[14:55:42] <joal>	 some utilities comming with cluster make ugly easy (command: acluster --> runs the command on every worker - WROOOOONG !)
[15:03:27] <wikibugs__>	 06Analytics-Kanban: Document and publicize AQS legacy page counts endpoint - https://phabricator.wikimedia.org/T159959#3143706 (10mforns) a:03mforns
[15:14:29] <wikibugs>	 06Analytics-Kanban, 10DBA, 06Operations: Improve eventlogging replication procedure - https://phabricator.wikimedia.org/T124307#3143721 (10Nuria) a:03Ottomata
[15:15:28] <wikibugs>	 06Analytics-Kanban, 10DBA, 06Operations: Improve eventlogging replication procedure - https://phabricator.wikimedia.org/T124307#1952524 (10Nuria) Let's take advantage of the fact that after the rename we have now autoincrement ids on new tables .
[15:29:43] <wikibugs>	 10Analytics, 10MediaWiki-extensions-WikimediaEvents, 10The-Wikipedia-Library, 10Wikimedia-General-or-Unknown, and 4 others: Implement Schema:ExternalLinksChange - https://phabricator.wikimedia.org/T115119#3143785 (10Milimetric) Sam knows about the replication, that's why he knows there's a delay.  But Sam,...
[15:43:45] <fdans>	 nuria: the full config for the reportcard is now in https://meta.wikimedia.org/wiki/Config:Dashiki:Sample/tabs
[15:43:56] <fdans>	 if you want to take a look
[15:44:02] <nuria>	 looking
[15:45:20] <nuria>	 fdans: I see, i should be "daily" unique devices only, no monthly, other than that it looks good
[15:45:41] <fdans>	 nuria: cool, removing monthly
[15:54:16] <wikibugs>	 10Analytics, 10MediaWiki-extensions-WikimediaEvents, 10The-Wikipedia-Library, 10Wikimedia-General-or-Unknown, and 4 others: Implement Schema:ExternalLinksChange - https://phabricator.wikimedia.org/T115119#3143837 (10Samwalton9) >>! In T115119#3143785, @Milimetric wrote: > Sam knows about the replication, t...
[15:54:40] <wikibugs__>	 10Analytics, 06Operations, 07Documentation: Improve SSH access information in onboarding documentation - https://phabricator.wikimedia.org/T160941#3115695 (10Nuria) Can we be specific as to what needs improvements to help ops document what is needed?  cc @Zareenf, @Tbayer @mpopov who had had trouble with thi...
[15:56:56] <wikibugs__>	 06Analytics-Kanban, 06DC-Ops, 06Operations, 10ops-eqiad, 13Patch-For-Review: Decom/Reclaim analytics1027 - https://phabricator.wikimedia.org/T161597#3143852 (10Nuria)
[15:57:17] <wikibugs>	 06Analytics-Kanban, 06DC-Ops, 06Operations, 10ops-eqiad, 13Patch-For-Review: Decom/Reclaim analytics1027 - https://phabricator.wikimedia.org/T161597#3136552 (10Nuria) a:03elukey
[15:57:48] <wikibugs__>	 10Analytics: Improve Oozie error emails for testing - https://phabricator.wikimedia.org/T161619#3143856 (10Nuria) p:05Triage>03Normal
[15:58:44] <wikibugs__>	 10Analytics: Update pivot to latest version - https://phabricator.wikimedia.org/T161630#3143858 (10Nuria) p:05Triage>03Normal
[16:01:12] <wikibugs>	 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: CODFW: 6 Nodes for Kafka refresh/upgrade - https://phabricator.wikimedia.org/T161637#3138006 (10Nuria) p:05Triage>03Normal
[16:02:40] <wikibugs__>	 10Analytics: Upgrade pivot - https://phabricator.wikimedia.org/T161725#3143873 (10Nuria) 05Open>03Invalid
[16:11:03] <wikibugs>	 06Analytics-Kanban: Security Upgrade for piwik - https://phabricator.wikimedia.org/T158322#3143894 (10Nuria) a:05Milimetric>03None
[16:11:34] <wikibugs>	 10Analytics, 10Fundraising-Backlog: Storage for banner history data - https://phabricator.wikimedia.org/T161635#3143898 (10DStrine) We are checking some points with legal here: T161656
[16:11:54] <wikibugs__>	 06Analytics-Kanban: Create robots.txt policy for datasets - https://phabricator.wikimedia.org/T159189#3143900 (10Milimetric) a:03Milimetric
[16:15:10] <wikibugs>	 10Analytics, 10MediaWiki-extensions-WikimediaEvents, 10The-Wikipedia-Library, 10Wikimedia-General-or-Unknown, and 4 others: Implement Schema:ExternalLinksChange - https://phabricator.wikimedia.org/T115119#3143902 (10Milimetric) The two tables are part of a maintenance on EventLogging, we needed to delete d...
[16:37:29] <wikibugs>	 10Analytics, 06Operations, 07Documentation: Improve SSH access information in onboarding documentation - https://phabricator.wikimedia.org/T160941#3144005 (10Tbayer) >>! In T160941#3143836, @Nuria wrote: > Can we be specific as to what needs improvements to help ops document what is needed?  cc @Zareenf, @Tb...
[16:39:02] <wikibugs>	 10Analytics-EventLogging, 06Analytics-Kanban, 10DBA, 06Operations: Improve eventlogging replication procedure - https://phabricator.wikimedia.org/T124307#3144009 (10Nuria)
[16:44:58] <wikibugs__>	 10Analytics, 10Analytics-Dashiki, 13Patch-For-Review: Create dashboard for upload wizard - https://phabricator.wikimedia.org/T159233#3144111 (10Nuria) Mathias:  Let's backtrack a little here: seems to me that your case has little do do with analytics but rather error logging, I understand that mediawiki does...
[16:47:34] <wikibugs__>	 10Analytics, 06Operations, 07Documentation: Improve SSH access information in onboarding documentation - https://phabricator.wikimedia.org/T160941#3144130 (10Nuria) This task will be seen by the ops on cleaning duty next week. It will help to have a list of issues so they can know what problems the documenta...
[17:29:14] <wikibugs>	 10Analytics: Add zero carrier to pageview_hourly data on druid - https://phabricator.wikimedia.org/T161824#3144399 (10Nuria)
[17:30:10] <elukey>	 team I am shutting off analytics1039
[17:30:18] <elukey>	 so Chris will be able to apply thermal paste
[17:39:42] <nuria>	 elukey: i love the fact that that 1039 is a metal box somewhere now cover in some ooo
[17:41:54] <elukey>	 :D
[17:45:24] <elukey>	 joal: I think that I killed one of your spark jobs, sorry :(
[17:46:11] <elukey>	 ottomata: ping?
[17:46:21] <ottomata>	 hii
[17:46:35] <ottomata>	 oh elukey yeah!
[17:46:39] <ottomata>	 lets do one?
[17:48:36] <elukey>	 sure
[17:49:03] <ottomata>	 in bc if you want
[17:49:10] <ottomata>	 which one?
[17:49:41] <elukey>	 we can do in here too! 
[17:49:47] <elukey>	 so 1046 could be a good candidate
[17:49:53] <elukey>	 next inline for the reimage :)
[17:50:13] <ottomata>	 ok here is good
[17:50:17] <ottomata>	 ok
[17:50:25] <ottomata>	 ok if i just follow https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Administration#Worker_Reimage_.2812_disk.2C_2_flex_bay_drives_-_analytics1028-analytics1057.29
[17:50:26] <ottomata>	 and ask qs?
[17:50:47] <elukey>	 so what I usually do is schedule a bit of downtime and shutdown yarn/hdfs, to let jobs drain. Not sure if it is a good procedure but with the new settings (Yarn stop doesn't kill the containers) it might let computations to survive?
[17:50:58] <elukey>	 anyhow, then neodymium and type: 
[17:51:40] <elukey>	 sudo -E wmf-auto-reimage analytics1046.eqiad.wmnet -p T160333
[17:51:41] <stashbot>	 T160333: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333
[17:51:57] <elukey>	 it will ask the pwstore management passzorz
[17:52:10] <elukey>	 after that it will be following the above guide :)
[17:52:43] <ottomata>	 elukey:  should I jsut try and see what happens without downtime?
[17:53:08] <ottomata>	 oh hm
[17:53:13] <ottomata>	 yarn stop doesn't kill containers
[17:53:14] <ottomata>	 i get it
[17:53:15] <ottomata>	 ya i'll do that
[17:53:38] <elukey>	 IIRC if the appmaster is somewhere running it might still get container's computations
[17:53:43] <elukey>	 but I am not sure :(
[17:53:53] <ottomata>	 aye
[17:54:38] <ottomata>	 !log stopping hadoop services on analytics1046 for jessie upgrade
[17:54:40] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[17:55:14] <ottomata>	 ok yeha, so lots of yarn procs still running
[17:55:19] <ottomata>	 so i'll wait til those disappear
[17:56:29] <elukey>	 ah one thing that I wanted to talk with you about
[17:56:39] <elukey>	 /var/lib/hadoop/data is not created automagically
[17:57:11] <elukey>	 at least, it is not when the disks have not all the datanode partitions that can be mounted
[17:57:18] <elukey>	 so a manual mkdir is needed
[17:58:03] <ottomata>	 ah right, makes sense, cause puppet creates the path
[17:58:15] <ottomata>	 you do that after it comes up elukey?
[17:58:42] <elukey>	 usually after the first puppet run that breaks when setting up the datanode partitions
[18:02:34] <elukey>	 !log an1039 back up again after thermal paste applied
[18:02:35] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[18:03:16] <wikibugs>	 10Analytics, 10Analytics-Cluster, 06Operations, 10ops-eqiad, 15User-Elukey: Analytics hosts showed high temperature alarms - https://phabricator.wikimedia.org/T132256#3144546 (10elukey) Chris applied the thermal paste and the host is up and running again. Will watch mcelog during the next days to see if...
[18:05:43] <elukey>	 ouch mw-history-denorm spark job killed, joal will be mad :)
[18:05:49] <elukey>	 sorryyyyy
[18:06:13] <ottomata>	 uuuuh oh
[18:06:15] <ottomata>	 how'd it happen
[18:06:17] <ottomata>	 an39?
[18:09:02] <elukey>	 yeah probably, there were spark jobs running and I shutdown the host since I knew they would have taked a lot of time.. :(
[18:09:21] <elukey>	 1039 is the host showing up most of the temp alerts
[18:17:22] <ottomata>	 ook elukey 0 yarn procs on 1046 now
[18:18:22] <ottomata>	 running the reimage thang
[18:18:52] <wikibugs>	 06Analytics-Kanban, 06Operations, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3144577 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by otto on neodymium.eqiad.wmnet for hosts: ``` ['analytics1046.eqiad.wmnet'] ``` The log can b...
[18:19:05] <elukey>	 nice :)
[18:19:30] <elukey>	 ok so now everything should be handled by the script
[18:19:38] <elukey>	 reimage, first puppet run, reboot
[18:19:46] <ottomata>	 amazing
[18:19:47] <elukey>	 and salt check etc..
[18:19:55] <ottomata>	 it'll verify the salt key?!
[18:20:26] <elukey>	 it will call the host via salt
[18:20:29] <elukey>	 after signing the key
[18:21:05] <elukey>	 so ideally if the script finishes without any issue you'll just ssh and fix datanode partitions
[18:23:28] <elukey>	 ottomata: brb later on if you need me!
[18:24:39] <ottomata>	 amazing
[18:24:40] <ottomata>	 ok cool
[18:25:22] <nuria>	 fdans: reportcard looks good!, some minor nits
[18:34:49] <ottomata>	 nuria: woudl you say that it is safe to no longer bother replicating any of the tables that we didn't rename?
[18:37:11] <nuria>	 ottomata: that would mean not replicating tables that get less than 1 event per day (which is how i compiled initial list of tables to rename) 
[18:37:25] <ottomata>	 hm
[18:37:44] <ottomata>	 nuria shouldn't we have just renamed all talbes then?  if there are tables that are 'active' at less than 1 event per day
[18:37:52] <ottomata>	 those tables will still have the short varchars
[18:37:57] <ottomata>	 and also no id field
[18:38:02] <ottomata>	 (well, maybe no id field)
[18:40:08] <nuria>	 ottomata: right, but I'd say that is a very uncommon use case of less than 1 event per day, we see that pattern in schemas that are being "replaced" by a new one and thus being phased out 
[18:42:06] <ottomata>	 hm oook
[18:42:43] <nuria>	 ottomata: i guess it is [possible that we have some tables with little little usage that still have columns of old length 
[18:44:00] <ottomata>	 ja nuria, i just noticed because replication script replicates all tables on master -> slave
[18:44:04] <ottomata>	 and if the table on master does not have id field
[18:44:08] <ottomata>	 script will fail
[18:44:11] <ottomata>	 if i try to replicate by ides
[18:44:13] <ottomata>	 ids
[18:44:16] <ottomata>	 so, i'm going to have to be smarter
[18:44:30] <ottomata>	 i think i can code to prefer ids, but fallback to timestamp
[18:44:54] <nuria>	 ottomata: let's see, I think is posible that some tables that have less than one event per day will have a column with old UA length
[18:45:18] <ottomata>	 ya, the old ua length isn't such a big deal (to me), as stuff will keep workign with that
[18:45:23] <ottomata>	 the data will just be truncated for those ones
[18:45:29] <ottomata>	 i'm trying to do the id based replication
[18:45:30] <nuria>	 ottomata: that did not seem a huge deal as the truncation of UA was not happening every time and things would on thsoe tables, now, auto increment
[18:45:36] <ottomata>	 and lots of tables don't have ids :/
[18:45:39] <nuria>	 ottomata: by definition those tables have to be real small
[18:45:51] <ottomata>	 nuria:  also
[18:45:55] <nuria>	 ottomata: if they receive such a small flow of events
[18:45:57] <ottomata>	 is data currently purged at all?
[18:45:59] <ottomata>	 on slaves?
[18:47:46] <nuria>	 I think purging has issues so maybe we could remove all data that is old before fixing replication?
[18:50:06] <ottomata>	 nuria:  yah, i just noticed that purging is done by this script, but it is commented out
[18:50:10] <ottomata>	 so i'm pretty sure it snot happening
[19:03:14] <elukey>	 ottomata: o/
[19:03:20] <elukey>	 I saw that the reimage went fine
[19:03:25] <elukey>	 https://phabricator.wikimedia.org/T160333#3144625
[19:03:56] <ottomata>	 ya!
[19:03:58] <ottomata>	 following things no
[19:03:59] <ottomata>	 w
[19:04:02] <ottomata>	 chowing...
[19:04:26] <elukey>	 goood! it will take a while :(
[19:04:52] <ottomata>	 oh ya lots!
[19:05:22] <ottomata>	 elukey:  that would go a lot faster if you added a & to background each letter search
[19:05:38] <elukey>	 ah this is a good point! 
[19:05:50] <elukey>	 didn't think about it!
[19:06:28] <elukey>	 will try next run then, thanks!
[19:07:02] <elukey>	 ottomata: all right I am logging off then!
[19:09:59] <ottomata>	 laters!
[19:33:04] <joal>	 elukey: I am indeed somehow unhappy, but it's for the good cause :) Jessie EVERYWHERE !
[19:34:44] <joal>	 elukey: Restarted :)
[19:50:55] <nuria>	 milimetric: super good work with those formatters on digraphs i was able to figure out what was wrong and change to "kmb" real fast 
[19:53:25] <milimetric>	 cool.  Vue has "filters" so like if you have {{value}} you can be like {{value | kmb}}
[19:53:55] <milimetric>	 pretty elegant, and you can have a function return the filter based on config
[19:58:42] <milimetric>	 nuria: I think the dashiki work would be so much more useful with actual documentation of the layouts.  But I don't love the layouts as they are, lots of bugs.  So I was thinking: one week to re-shape the configs, then two days to document them properly.  Let me know if you think that fits into our plan.  While I think it would be nice, we can get by fine
[19:58:42] <milimetric>	 with just me making dashboards for people whenever they need them
[19:59:58] <nuria>	 milimetric: mmmm.. I think they are pretty good, i do not think layouts need much doc besides readme meta config and nice code
[21:38:53] <Niharika>	 Hi! I finally logged into stat1003 to get some event logging data but I am not sure how to access it. I first checked grafana but https://grafana.wikimedia.org/dashboard/db/eventlogging-schema?refresh=5m&orgId=1 does not list CookieBlock schema in the drop-down. 
[21:41:32] <Niharika>	 Oh, I found the sample query on https://wikitech.wikimedia.org/wiki/Analytics/EventLogging#MariaDB. That's helpful. Thanks!