[01:20:57] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Cloud-Services, 10Developer-Advocacy (Oct-Dec 2019): Explore importing geoeditors_daily data (agrehgated edits per namespace per country per wiki) into druid - https://phabricator.wikimedia.org/T234281 (10Nuria)
[01:28:36] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Cloud-Services, 10Developer-Advocacy (Oct-Dec 2019): Explore importing geoeditors_daily data (aggregated edits per namespace per country per wiki) into druid - https://phabricator.wikimedia.org/T234281 (10Nuria)
[05:56:38] <elukey>	 o/
[06:19:19] <wikibugs>	 10Analytics, 10User-Elukey: Port IRCRecentChanges to Kafka - https://phabricator.wikimedia.org/T232483 (10elukey) Yesterday I had a chat with @faidon about this project and this is what I gathered:  * we currently run a patched ircd daemon on kraz (`role::mw_rc_irc` in site.pp) that serves `irc.wikimedia.org`...
[06:19:57] <wikibugs>	 10Analytics, 10User-Elukey: Architecture of recent changes on top of kafka. Produce Design Document. - https://phabricator.wikimedia.org/T234234 (10elukey)
[06:20:39] <wikibugs>	 10Analytics, 10User-Elukey: Port IRCRecentChanges to Kafka - https://phabricator.wikimedia.org/T232483 (10elukey) 05Open→03Stalled As first step, we (Analytics) are going to create a quick design doc / one-pager about how the architecture should look like in T234234
[06:20:47] <wikibugs>	 10Analytics, 10Code-Stewardship-Reviews, 10Operations, 10Tools, 10Wikimedia-IRC-RC-Server: IRC RecentChanges feed: code stewardship request - https://phabricator.wikimedia.org/T185319 (10elukey)
[07:19:53] <wikibugs>	 10Analytics, 10Analytics-Kanban: Shorten the time it takes to move files from hadoop to dump hosts by Kerberizing the dump hosts  - https://phabricator.wikimedia.org/T234229 (10elukey)
[07:21:04] <wikibugs>	 10Analytics, 10Analytics-Kanban: Shorten the time it takes to move files from hadoop to dump hosts by Kerberizing the dump hosts - https://phabricator.wikimedia.org/T234229 (10elukey) @MoritzMuehlenhoff @ArielGlenn Thoughts?
[07:21:19] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10User-Elukey: Shorten the time it takes to move files from hadoop to dump hosts by Kerberizing the dump hosts - https://phabricator.wikimedia.org/T234229 (10elukey)
[07:23:12] <wikibugs>	 10Analytics, 10Fundraising-Backlog, 10Operations, 10SRE-Access-Requests: Banner History and page view data access for fundraising analysts - Jerrie and Erin - https://phabricator.wikimedia.org/T233636 (10MoritzMuehlenhoff) 05Resolved→03Open There's two issues with the patch merged for Erin Yener: (1) I...
[07:51:55] <joal>	 Hi team
[07:55:05] <elukey>	 o/
[07:58:38] <joal>	 elukey: I'm being kicked off chan almost every day recently - any idea?
[08:01:20] <elukey>	 joal: nope, what error do you get in your client?
[08:01:44] <joal>	 No error - Ijust get kicked off chan and sent to wikimedia-overflow
[08:09:26] <elukey>	 strange 
[08:09:29] <elukey>	 what client do you use
[08:09:30] <elukey>	 ?
[08:09:36] <joal>	 irssi
[08:09:50] <elukey>	 same thing for me, but I am not kicked out
[08:10:01] <joal>	 mwarf :()
[08:10:19] <elukey>	 maybe there is some setting that could prevent this? I have set up mine a ton of time ago and I don't recall if I had to do anything 
[08:10:28] <elukey>	 but I can share the config if you want
[08:11:06] <joal>	 elukey: when I disconnect, do you see anything strange on the chan, like split or something>?
[08:15:26] <elukey>	 joal: I don't have all the events logged, only when people join, but I can add all of them so we can monitor
[08:15:42] <joal>	 no bother elukey  - I'll look at logs
[08:15:58] <elukey>	 no bother at all :)
[08:17:27] <joal>	 Nothing logged :(
[08:19:02] <joal>	 I can see you loggin out and back in elukey 
[08:19:33] <elukey>	 joal: yep just restarted irssi, I had some pending cleanups to do, also removed the ignores
[08:19:48] <elukey>	 next time that happens I'll tell you what I see
[08:20:08] <joal>	 Thanks
[08:21:41] <joal>	 elukey - Do you trhink me asking ottomata to setup alluxio for test with presto is a good idea?
[08:32:03] <elukey>	 joal: I have no idea what alluxio is but +1
[08:32:14] <elukey>	 :D
[08:32:50] <elukey>	 ah nice https://www.alluxio.io/
[08:33:54] <elukey>	 no idea how it would need to be deployed, is it a daemon?
[08:34:59] <joal>	 elukey: here is my thought process: one of the main advantage of big-data computation is data-locallity optimization. Presto as it is setup looses this advantage (not part of HDFS) - IMO alluxio could be an interesting helper :)
[08:35:14] <joal>	 elukey: I thik it's a system with master/wokers
[08:35:24] <joal>	 elukey: like another layer on top of hdfs
[08:36:20] <elukey>	 ah so it would need to be deployed on hadoop
[08:36:21] <elukey>	 right?
[08:36:26] <elukey>	 (I mean on hadoop workers)
[08:37:11] <joal>	 elukey: I dont think so - deployed on machines where the cache happens (presto in our case), and configured to cache data flowing in from HDFS
[08:37:57] <elukey>	 ah ok, clearer now
[08:38:15] <elukey>	 I don't see why not! Maybe let's wait to see the first results form andrew
[08:38:19] <joal>	 elukey: My idea is that caching data at the presto-machine level could help
[08:38:27] <elukey>	 make sense yes
[08:38:46] <elukey>	 does presto need zookeeper?
[08:39:00] <joal>	 IIRC it does
[08:39:37] <elukey>	 that brings me to the next question!
[08:39:56] <elukey>	 an-conf100[1-3] are ready, they have been serving the hadoop test cluster for the past days
[08:39:57] <joal>	 Moar questionz, moar answerz (maybe)
[08:40:15] <elukey>	 there are some caveats though
[08:40:24] <elukey>	 the most important one is that they run Java 11
[08:40:35] <elukey>	 I tried java 8, but it doesn't work with the zookeeper version on buster
[08:40:51] <elukey>	 but, no issue so far from the hdfs zkfc logs etc..
[08:41:18] <elukey>	 (like serialization troubles, etc..)
[08:41:34] <elukey>	 so I'd be inclined to move the hadoop prod cluster to the new zookeeper nodes
[08:41:41] <elukey>	 and presto/alluxio/etc..
[08:42:29] <joal>	 works for me elukey - Getting out of the prod-zookeeper is something we've been waiting for IIRC
[08:44:03] <elukey>	 joal: the main "issue" is that we'll have to stop hdfs/yarn briefly on the master nodes
[08:44:17] <elukey>	 otherwise we might end up in a split-brain scenario
[08:44:51] <elukey>	 so something like - 1) cluster drain 2) hdfs safe mode 3) shutdown of the master daemons 4) merge/run-puppet with the new config
[08:44:55] <joal>	 yes - well, there are changes that needs the machine to stop :)
[08:45:41] <elukey>	 when would it be ok for you?
[08:45:46] <elukey>	 (to schedule the change)
[08:46:41] <joal>	 whenever elukey - As usual I'll only be here watching you ;)
[08:47:00] <elukey>	 ack, will prep the etherpad and possibly schedule it for tomorrow
[08:47:20] <joal>	 elukey: arf - tomorrow kids (sorry should have mentioned)
[08:47:29] <joal>	 thursday?
[08:48:16] <joal>	 yessir
[08:50:28] <elukey>	 ah yes right Thu is fine!
[08:55:11] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Cloud-Services, 10Developer-Advocacy (Oct-Dec 2019): Develop a tool or integrate feature in existing one to visualize WMCS edits data - https://phabricator.wikimedia.org/T226663 (10JAllemandou) Hi @srishakatux - I'm sorry for not answering sooner, I've been sick the whole...
[09:04:41] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10User-Elukey: Shorten the time it takes to move files from hadoop to dump hosts by Kerberizing the dump hosts - https://phabricator.wikimedia.org/T234229 (10ArielGlenn) Adding @Bstorm because the labstore servers are WMCS boxes.  How long does data take to show up on the lab...
[09:08:51] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10User-Elukey: Shorten the time it takes to move files from hadoop to dump hosts by Kerberizing the dump hosts - https://phabricator.wikimedia.org/T234229 (10elukey) The main problem is that a ton of data (like the recent mediawiki history dumps) need to go through the fuse h...
[09:41:47] <wikibugs>	 (03PS3) 10Joal: Add network-origin to the geoeditors-daily table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538613 (https://phabricator.wikimedia.org/T233504)
[09:58:15] <wikibugs>	 10Quarry, 10User-revi, 10cloud-services-team (Kanban): Quarry struggling on the queue (2019-10-01) - https://phabricator.wikimedia.org/T234310 (10revi)
[09:58:58] <joal>	 elukey: found that today - https://blog.ippon.fr/content/images/2018/03/6ced44f31377835938ccb66275194e2b3ccea500967210c35ca8fb2343cbaf8d.jpg
[10:01:19] <elukey>	 hahaahhaahahahh
[10:01:37] <joal>	 made my day
[10:04:37] <elukey>	 kerberos replication between eqiad/codfw works
[10:04:40] <elukey>	 what a nightmare
[10:04:50] <joal>	 you man should be named Hercules
[10:06:36] <elukey>	 <#
[10:06:38] <elukey>	 <3
[10:07:41] <elukey>	 it is frustrating since the error messages are so brief and not explanatory
[10:08:00] <elukey>	 need to write some docs now
[10:08:07] <elukey>	 otherwise I'll forget the magic sequence
[10:13:16] <moritzm>	 all kerberos error messages are cryptic, it's one of the biggest problems krb5 has
[10:29:41] * elukey lunch!
[11:34:18] <joal>	 I don't know if the mysql issue is related, but I'm experiencing problems on stat1007 - I get an HDFS command error ar regular interval (more or less)
[11:56:37] <wikibugs>	 10Quarry, 10User-revi, 10cloud-services-team (Kanban): Quarry struggling on the queue (2019-10-01) - https://phabricator.wikimedia.org/T234310 (10zhuyifei1999) 05Open→03Resolved a:03zhuyifei1999 I ssh-ed in to both workers, no weird behavior. No logs either. It's as if the celery workers never received...
[12:07:44] <elukey>	 joal: I think it is related, the rsync is causing 1) a ton of resources consumed by the fuse mount 2) a ton of sockets opened
[12:07:59] <joal>	 mwarf
[12:08:36] <joal>	 elukey: indeed error is of type networking (Namenode for analytics-hadoop remains unresolved for ID an-master1001-eqiad-wmnet)
[12:12:35] <wikibugs>	 10Analytics, 10Fundraising-Backlog, 10Operations, 10SRE-Access-Requests: Banner History and page view data access for fundraising analysts - Jerrie and Erin - https://phabricator.wikimedia.org/T233636 (10EYener) Thank you! I also have access to Turnilo. I have two follow-up questions:  1. Can @jkumalah and...
[12:30:43] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic, and 2 others: No access to mysql from stat1007 - https://phabricator.wikimedia.org/T234160 (10elukey) We are almost sure that this issue is due to network/system overloading due to a big rsync that is currently running...
[12:47:59] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic, and 2 others: No access to mysql from stat1007 - https://phabricator.wikimedia.org/T234160 (10GoranSMilovanovic) @elukey Thank you. I was able to collect the data needed for T234036 from `stat1004`, so I will close thi...
[12:48:08] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic, and 2 others: No access to mysql from stat1007 - https://phabricator.wikimedia.org/T234160 (10GoranSMilovanovic) 05Open→03Resolved
[12:53:13] <wikibugs>	 (03CR) 10Joal: "This patch needs to be rebased on top of its updated parent." (036 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538880 (https://phabricator.wikimedia.org/T233717) (owner: 10Fdans)
[13:44:38] <ottomata>	 joal o/ yt?
[13:44:49] <joal>	 hey ottomata - Here I am yes :)
[13:45:09] <joal>	 how may I help?
[13:45:53] <ottomata>	 i want to backfill rev score
[13:46:01] <joal>	 ok
[13:46:07] <ottomata>	 so i'm trying to construct a query to transform the array fields into our new map fields
[13:46:33] <ottomata>	 i've moved the old table and partitions to otto.mediawiki_revision_score_1
[13:47:04] <ottomata>	 am reading stuff, trying to figure out how...but thought you might have some tips :)
[13:47:39] <joal>	 ottomata: hm - I've not done that I think
[13:48:20] <joal>	 ottomata: I'd probably use RDDs :)
[13:49:08] <ottomata>	 oh that would work...right...
[13:49:08] <ottomata>	 hmmm
[13:50:10] <joal>	 ottomata: I have a the problem you encouterred with camus as well
[13:50:58] <joal>	 ottomata: a QUESTION sorry :)
[13:51:44] <ottomata>	 joal:  ya?
[13:52:13] <joal>	 ottomata: Couold have been an issue with consumer-group wrong value?
[13:53:10] <ottomata>	 joal:  i don't think so...camus doesn't use consumer groups :)
[13:53:19] <ottomata>	 the offsets are in hdfs
[13:53:40] <ottomata>	 and the offset history files existed and were being read by camus
[13:53:53] <ottomata>	 and i'm pretty sure data was coming back from kafka at the requested offset
[13:56:26] <joal>	 ottomata: I was after trying to understand the empty iterator thing - Ah right - no data stored in kafka-internals in camus...
[13:56:31] <joal>	 ok
[13:57:05] <joal>	 ottomata: if ou'd have a minute I could use a quick chat on spark memory - I'm facing wonders
[13:57:30] <elukey>	 ottomata: o/
[13:57:42] <elukey>	 does presto use zookeeper by any chance? Didn't see any setting today
[13:57:57] <elukey>	 but I was wondering that.. if so, it would be great to make it use the an-conf100X hosts
[13:58:14] <elukey>	 I was chatting today with Joseph about migrating Hadoop Prod to the new zk nodes on thursday
[14:01:10] <joal>	 oh ottomata - Another topic: alluxio for presto !
[14:02:30] <joal>	 there are days like that - I possibly could call them the otto-days :)
[14:05:22] <elukey>	 I am writing https://wikitech.wikimedia.org/wiki/User:Elukey/Analytics/Kerberos to gather all the info about the kerberos cluster
[14:05:30] <elukey>	 still wip but should contain all
[14:22:29] <ottomata>	 joal:  sorrry for some reason my irc client isn't pinging me...
[14:22:37] <ottomata>	 alluxio ...?
[14:22:38] <ottomata>	 looking
[14:24:25] <ottomata>	 joal not sure I understand?
[14:24:29] <ottomata>	 elukey:  nice!
[14:24:52] <joal>	 ottomata: currently presto needs to be transfered all the data everytime it is queried
[14:25:00] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Cloud-Services, 10Developer-Advocacy (Oct-Dec 2019): Develop a tool or integrate feature in existing one to visualize WMCS edits data - https://phabricator.wikimedia.org/T226663 (10Milimetric) @srishakatux / @bd808: what would you like to see in your dashboard?  Off the t...
[14:25:35] <milimetric>	 ottomata: what would the schema registry UI do?
[14:25:48] <joal>	 ottomata: With alluxio, if some data is queried over and over, it'll be cached locally on the presto nodes instead of being hdfs-copied to them at every query
[14:27:34] <ottomata>	 oh 
[14:29:15] <wikibugs>	 10Analytics: Import siteinfo dumps onto HDFS - https://phabricator.wikimedia.org/T234333 (10JAllemandou)
[14:29:36] <wikibugs>	 (03PS1) 10Joal: Add site-info dump type to importer [analytics/refinery] - 10https://gerrit.wikimedia.org/r/540124 (https://phabricator.wikimedia.org/T234333)
[14:30:03] <joal>	 ottomata: would you have a minute to batcave on spark memory?
[14:30:34] <ottomata>	 milimetric:  i was mostly searching for stuff  to satisfy use cases like "
[14:30:34] <ottomata>	     As an analyst/product manager I want to able to search through existing schemas to find which data is being collected and how the data is defined in the event system.
[14:30:34] <ottomata>	 "
[14:30:36] <ottomata>	 joal sure! 
[14:30:59] <ottomata>	 actually joal  gimme 4 mins
[14:31:03] <joal>	 np
[14:31:38] <milimetric>	 ottomata: I thought Apache Atlas was giving us that
[14:32:05] <elukey>	 joal, ottomata - this is a draft of the procedure to swap the zk nodes https://etherpad.wikimedia.org/p/analytics-zk-migration
[14:32:13] <elukey>	 let me know your thoughts when you have a minute
[14:32:23] <elukey>	 should be simple enough, but better to triple check
[14:33:36] <joal>	 elukey: only dowside I can think of is the loss of the historical jobs for the time - but seems small
[14:33:57] <ottomata>	 milimetric:  i dunno maybe...?
[14:34:05] <elukey>	 yes that is the the downside, unless we dump the status of zookeeper etc..
[14:34:10] <ottomata>	 schema / config / governmance is a long way off and not prioritized i think
[14:34:12] <ottomata>	 anyway
[14:34:12] <joal>	 too much work
[14:34:19] <ottomata>	 i was just browsing around and wanted to note that thing
[14:34:36] <milimetric>	 the thing is great, we should deploy it and get over everyone's fear of .NET and be happy
[14:34:58] <milimetric>	 MS basically just threw $$ at developer sad, it's great
[14:35:20] <wikibugs>	 (03CR) 10Milimetric: Transition data rows to using time ranges instead of timestamps (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/531148 (https://phabricator.wikimedia.org/T230514) (owner: 10Fdans)
[14:35:34] <wikibugs>	 (03CR) 10Milimetric: "check experimental" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/531148 (https://phabricator.wikimedia.org/T230514) (owner: 10Fdans)
[14:36:00] <wikibugs>	 (03CR) 10Milimetric: "heh, maybe it doesn't work like that anymore? :)" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/531148 (https://phabricator.wikimedia.org/T230514) (owner: 10Fdans)
[15:04:02] <milimetric>	 \o/ phab inbox 0
[15:04:16] <ottomata>	 heya elukey  yt?
[15:05:03] <ottomata>	 i think we accidentally skipped adding an-worker1088 to net_topology! :)
[15:05:04] <ottomata>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/474904/3/hieradata/common.yaml
[15:05:14] <ottomata>	 unless we did it on purpose
[15:05:15] <ottomata>	 do you know?
[15:06:00] <elukey>	 ottomata: nono surely not on purpose
[15:06:07] <ottomata>	 ok going to add it
[15:06:15] <ottomata>	 won't bother restarting but just in case
[15:06:18] <elukey>	 but is it in the default rack/whatever?
[15:06:20] <ottomata>	 just so we have it there nexts time
[15:06:21] <ottomata>	 yes
[15:06:31] <elukey>	 that is bad, because we have an alarm
[15:06:39] <elukey>	 so it doesn't work
[15:06:43] <ottomata>	 we do?
[15:06:47] <elukey>	 can you wait a sec before fixing it?
[15:06:49] <ottomata>	 ya
[15:06:53] <ottomata>	 i'll just make the patch
[15:08:53] <elukey>	 ah snap!
[15:08:54] <elukey>	 Rack: /eqiad/default/rack
[15:08:54] <elukey>	    10.64.36.100:50010 (an-worker1088.eqiad.wmnet)
[15:09:02] <elukey>	 we have /usr/local/lib/nagios/plugins/check_hdfs_topology
[15:09:14] <elukey>	 but it does sudo -u hdfs hdfs dfsadmin -printTopology | grep -q 'Rack: default'
[15:09:32] <elukey>	 this is why it doesn't work
[15:09:50] <elukey>	 we could simply grep for default?
[15:10:03] <elukey>	 grep -iq 'default'
[15:10:37] <elukey>	 or maybe something more elaborate like 'Rack.*default.*'
[15:11:17] <elukey>	 sudo -u hdfs hdfs dfsadmin -printTopology | egrep 'Rack:.*default.*'
[15:12:16] <elukey>	 sending a code patch
[15:14:26] <elukey>	 ottomata: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/540150/
[15:19:43] <elukey>	 we can deploy your patch when we switch the zk clusters?
[15:25:29] <ottomata>	 a!
[15:25:57] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10User-Elukey: Make the Kerberos infrastructure production ready - https://phabricator.wikimedia.org/T226089 (10elukey) Summary of progresses:  * set up krb1001 in eqiad and krb2001 in codfw * set up a basic replication between 1001 and 1002 via kprop/kpropd * documented ever...
[15:44:09] <ottomata>	 elukey:  we can merge my patch anytime
[15:44:21] <ottomata>	 and wait for whatever next nm/rm restarts we do
[15:44:31] <ottomata>	 ....and nodemanagers?
[15:44:41] <ottomata>	 i think maybe just the nm and rm uses net topology?
[15:45:49] <elukey>	 +1
[15:58:41] <wikibugs>	 (03PS1) 10Mforns: Migrate reports from MySQL EventLogging to Hive [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/540159 (https://phabricator.wikimedia.org/T223414)
[16:14:07] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move the Analytics Refinery to Python 3 - https://phabricator.wikimedia.org/T204735 (10elukey) Another weird use case happened: after running `apt-get autoremove` on an-worker1080, python2 dependencies got cleaned up but hen druid_loader.py caused some fai...
[16:19:17] <wikibugs>	 (03PS2) 10Mforns: Migrate reports from MySQL EventLogging to Hive [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/540159 (https://phabricator.wikimedia.org/T223414)
[16:30:33] <wikibugs>	 (03CR) 10Milimetric: [C: 03+1] "I just did a sanity check on the queries, they look fine to me.  I wouldn't worry too much since you can just rerun in case of a problem. " [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/540159 (https://phabricator.wikimedia.org/T223414) (owner: 10Mforns)
[16:43:02] <milimetric>	 a-team: Asheville, NC too!  That's a great option
[16:43:06] <milimetric>	 best ice cream I've ever had
[16:51:40] * elukey off!
[16:51:41] <elukey>	 o/
[17:06:30] <wikibugs>	 10Analytics, 10MinervaNeue, 10Readers-Web-Backlog: MinervaClientError sends malformed events - https://phabricator.wikimedia.org/T234344 (10Jdlrobson)
[18:57:52] <ottomata>	 joal:  don't suppose you are around?
[19:10:27] <wikibugs>	 10Analytics, 10Fundraising-Backlog, 10Operations, 10SRE-Access-Requests: Banner History and page view data access for fundraising analysts - Jerrie and Erin - https://phabricator.wikimedia.org/T233636 (10Nuria) Is @herron going to correct the patch per @MoritzMuehlenhoff guidelines?  @EYener: You should ha...
[19:50:25] <joal>	 hey ottomata - passing by now - how may I help?
[19:54:53] <wikibugs>	 10Analytics, 10Fundraising-Backlog, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Banner History and page view data access for fundraising analysts - Jerrie and Erin - https://phabricator.wikimedia.org/T233636 (10herron) >>! In T233636#5536946, @MoritzMuehlenhoff wrote: > There's two issues wi...
[19:56:57] <joal>	 hm - gone again - we'll see tomoroow
[20:09:49] <ottomata>	 oh hey
[20:09:50] <ottomata>	 sorry!
[20:09:53] <ottomata>	 joal ok tomorrow is fine
[20:10:04] <ottomata>	 no worriees, i'm deep in spark land
[20:10:09] <ottomata>	 maybe getting somewhere...not sure
[20:16:39] <Nettrom>	 hey a-team: is there a process around de-whitelisting a schema? the Growth team is looking to stop storing Help Panel data. Didn't see this mentioned on wikitech, but maybe I was looking in the wrong place?
[20:26:41] <ottomata>	 Nettrom: i don't think anyone has ever asked to do that! :)
[20:27:04] <ottomata>	 I think the process would be the same as whitelisting tho, but with less review required?
[20:27:25] <Nettrom>	 ottomata: I was suspecting that our request would be somewhat rare, yes ;)
[20:27:26] <ottomata>	 although, i don' tthink anything will autmoatically delete the old data
[20:27:33] <ottomata>	 it'll just start sanitzingthe new stuff
[20:27:38] <ottomata>	 we'd have to manually sanitize the old stuff...i think
[20:29:08] <Nettrom>	 what I'm thinking the process might look like is that: 1) I create a phab task; 2) submit a patch to remove the schema from the whitelist; 3) back up relevant data we need to keep (part of a different experiment); 4) after the patch is deployed, ask someone to delete the table
[20:29:13] <Nettrom>	 would that sound about right?
[20:29:58] <ottomata>	 sounds right to me!
[20:31:54] <Nettrom>	 cool! I'll make sure a phab task is opened once we're ready to start moving
[20:31:58] <Nettrom>	 and thanks!
[20:47:22] <wikibugs>	 10Analytics, 10Fundraising-Backlog, 10Fundraising Sprint Sysadmin Kane, 10Fundraising Sprint T 2019: Identify source of discrepancy between HUE query in Count of event.impression and druid queries via turnilo/superset - https://phabricator.wikimedia.org/T204396 (10Ejegg)
[22:00:14] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10EventBus, 10Scoring-platform-team, 10Patch-For-Review: Change event.mediawiki_revision_score schema to use map types - https://phabricator.wikimedia.org/T225211 (10Ottomata) The Hive table is looking good and new data is coming in.  I moved the old data away and created...
[22:03:55] <icinga-wm>	 PROBLEM - HDFS Namenode RPC 8020 call queue length on an-master1001 is CRITICAL: 599 ge 20 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen
[22:06:51] <icinga-wm>	 RECOVERY - HDFS Namenode RPC 8020 call queue length on an-master1001 is OK: (C)20 ge (W)10 ge 0 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen
[22:17:51] <icinga-wm>	 PROBLEM - HDFS Namenode RPC 8020 call queue length on an-master1001 is CRITICAL: 570 ge 20 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen
[22:33:14] <wikibugs>	 (03CR) 10Nuria: [C: 03+1] "Looks good, if we have tested teh job +2" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538613 (https://phabricator.wikimedia.org/T233504) (owner: 10Joal)
[22:43:45] <icinga-wm>	 RECOVERY - HDFS Namenode RPC 8020 call queue length on an-master1001 is OK: (C)20 ge (W)10 ge 0 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen
[23:04:47] <icinga-wm>	 PROBLEM - HDFS Namenode RPC 8020 call queue length on an-master1001 is CRITICAL: 787 ge 20 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen
[23:09:37] <icinga-wm>	 RECOVERY - HDFS Namenode RPC 8020 call queue length on an-master1001 is OK: (C)20 ge (W)10 ge 0 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen
[23:18:18] <icinga-wm>	 PROBLEM - HDFS Namenode RPC 8020 call queue length on an-master1001 is CRITICAL: 610 ge 20 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen
[23:23:36] <icinga-wm>	 RECOVERY - HDFS Namenode RPC 8020 call queue length on an-master1001 is OK: (C)20 ge (W)10 ge 0 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen
[23:33:38] <icinga-wm>	 PROBLEM - HDFS Namenode RPC 8020 call queue length on an-master1001 is CRITICAL: 620 ge 20 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen
[23:37:25] <wikibugs>	 (03CR) 10Nuria: Migrate reports from MySQL EventLogging to Hive (032 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/540159 (https://phabricator.wikimedia.org/T223414) (owner: 10Mforns)
[23:38:00] <icinga-wm>	 RECOVERY - HDFS Namenode RPC 8020 call queue length on an-master1001 is OK: (C)20 ge (W)10 ge 0 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=54&fullscreen
[23:39:36] <wikibugs>	 (03CR) 10Nuria: Add oozie job to load top mediarequests data (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538880 (https://phabricator.wikimedia.org/T233717) (owner: 10Fdans)
[23:40:40] <wikibugs>	 (03CR) 10Nuria: [C: 03+1] "Change looks fine" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/540124 (https://phabricator.wikimedia.org/T234333) (owner: 10Joal)