[00:08:50] <nuria>	 mforns: it did
[00:11:51] <mforns>	 nuria, the last thing I can think about is.. maybe the since and until date format is wrong, the wikitech docs say YYYY-MM-DDTHH:00:00
[00:12:18] <mforns>	 nuria, but the inline docs in HiveToDruid.scala say the format should be: YYYY-MM-DDTHH
[00:12:24] <mforns>	 Maybe that's the problem
[00:12:46] <nuria>	 mforns:  i see , let me see one sec what do system timers do
[00:13:40] <mforns>	 nuria, timers use relative values
[00:13:41] <nuria>	 https://www.irccloud.com/pastebin/Qw5Od8FF/
[00:14:13] <mforns>	 hmmm
[00:14:19] <nuria>	 mforns: right but execute teh date command to pass them as a string
[00:14:30] <nuria>	 mforns: i mean puppet does that 
[00:14:49] <mforns>	 yes..
[00:15:04] <nuria>	 mforns: ah, but no "'"
[00:15:14] <mforns>	 hmmm
[00:15:16] <mforns>	 right
[00:17:39] <nuria>	 mforns: will try again and correct docs , looks a little unlikely to work
[00:18:09] <mforns>	 nuria, tomorrow will have a look
[00:18:42] <wikibugs>	 10Analytics, 10Analytics-Cluster, 10User-EBernhardson: Setup ivysettings.xml for sourcing spark job dependencies from archiva - https://phabricator.wikimedia.org/T216093 (10EBernhardson)
[00:19:08] <nuria>	 mforns: k super thanks no worries
[00:19:09] <wikibugs>	 10Analytics, 10Analytics-Cluster, 10User-EBernhardson: Setup ivysettings.xml for sourcing spark job dependencies from archiva - https://phabricator.wikimedia.org/T216093 (10EBernhardson)
[00:19:35] <mforns>	 byeeeee!
[00:52:43] <wikibugs>	 10Analytics, 10Product-Analytics: Whitelist sample flags and page/rev ID fields for ReadingDepth schema - https://phabricator.wikimedia.org/T216096 (10Tbayer)
[00:57:34] <nuria>	 mforns: corrected docs, things need property files  https://wikitech.wikimedia.org/wiki/Analytics/Systems/Hive_to_Druid_Ingestion_Pipeline#How_to_run_from_the_command_line
[00:57:44] <nuria>	 mforns: or some scaping i cannot figure out
[00:57:58] <wikibugs>	 10Analytics, 10Readers-Web-Backlog: [Bug] Many ReadingDepth validation errors logged - https://phabricator.wikimedia.org/T216063 (10Jdlrobson) The event appears valid for the schema:  > "webHost": "www.google.com.hk", The webhost is invalid. It's likely that a proxy on www.google.com.hk is malforming the data...
[01:05:54] <wikibugs>	 (03PS1) 10HaeB: Update EventLogging whitelist with some fields that were recently added to ReadingDepth [analytics/refinery] - 10https://gerrit.wikimedia.org/r/490514 (https://phabricator.wikimedia.org/T216096)
[01:08:14] <wikibugs>	 10Analytics, 10Product-Analytics, 10Reading-analysis, 10Patch-For-Review: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas - https://phabricator.wikimedia.org/T209087 (10Tbayer) Found (and fixed) an oversight regarding ReadingDepth: T216096
[01:17:45] <wikibugs>	 10Analytics, 10Patch-For-Review: ReadingDepth schema is whitelisting both session ids and page ids - https://phabricator.wikimedia.org/T209051 (10Tbayer) It looks like we had forgotten to whitelist the actual `pageID` field in addition to the page title, probably because it was only introduced shortly after th...
[01:18:41] <nuria>	 mforns: ok, was able to index but cannot find the data source , let's touch base tomorrow
[01:33:43] <nuria>	 mforns: i see it on druid admin interface so it is there, i am going to call it success
[01:34:18] <nuria>	 mforns: success
[02:43:38] <wikibugs>	 10Analytics, 10Product-Analytics: Whitelist sample flags and page/rev ID fields for ReadingDepth schema - https://phabricator.wikimedia.org/T216096 (10Tbayer) PS: patch is at https://gerrit.wikimedia.org/r/490514 (seems @gerritbot is lagging a bit currently)
[04:10:42] <wikibugs>	 10Analytics, 10Product-Analytics: Whitelist sample flags and page/rev ID fields for ReadingDepth schema - https://phabricator.wikimedia.org/T216096 (10Tbayer) NB: The names of these sample field names are spelled with underscores in Hive (e.g. `page_issues_b_sample`, see below) but with dashes in the [[https:/...
[05:02:52] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Year total(2018)  legend in the charts  is misleading - https://phabricator.wikimedia.org/T216104 (10Arjunaraoc)
[05:05:27] <wikibugs>	 10Analytics-Kanban: yearly labels in wikistats say 2017 - https://phabricator.wikimedia.org/T216105 (10Nuria)
[05:05:40] <wikibugs>	 10Analytics, 10Analytics-Kanban: yearly labels in wikistats say 2017 - https://phabricator.wikimedia.org/T216105 (10Nuria)
[05:06:17] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add new cluster to superset db config - https://phabricator.wikimedia.org/T215680 (10Nuria) 05Open→03Resolved
[05:06:29] <wikibugs>	 10Analytics, 10Analytics-Kanban: Refactor Sqoop, join actor and comment from analytics replicas - https://phabricator.wikimedia.org/T210522 (10Nuria)
[05:06:31] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update datasets definitions and oozie jobs for dual-sqoop of comments and actors - https://phabricator.wikimedia.org/T210542 (10Nuria) 05Open→03Resolved
[05:06:50] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Fundraising-Backlog: Clean up old fundraising-related user data on Analytics hosts - https://phabricator.wikimedia.org/T215382 (10Nuria) 05Open→03Resolved
[05:07:09] <wikibugs>	 10Analytics, 10Analytics-Kanban: Check home leftovers of user imarlier (Ian Marlier) - https://phabricator.wikimedia.org/T213702 (10Nuria) 05Open→03Resolved
[05:07:24] <wikibugs>	 10Analytics, 10Analytics-Kanban: Refactor Sqoop, join actor and comment from analytics replicas - https://phabricator.wikimedia.org/T210522 (10Nuria)
[05:07:28] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update sqoop to work with the new schema - https://phabricator.wikimedia.org/T210541 (10Nuria) 05Open→03Resolved
[05:07:53] <wikibugs>	 10Analytics, 10Analytics-Kanban: Refactor Mediawiki-Database ingestion - https://phabricator.wikimedia.org/T209178 (10Nuria)
[05:08:03] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services, and 2 others: Not able to scoop comment table in labs for mediawiki reconstruction process [EPIC} - https://phabricator.wikimedia.org/T209031 (10Nuria) 05Open→03Resolved
[05:08:49] <wikibugs>	 10Analytics, 10Analytics-Kanban: Refactor Sqoop, join actor and comment from analytics replicas - https://phabricator.wikimedia.org/T210522 (10Nuria) 05Open→03Declined
[05:08:51] <wikibugs>	 10Analytics, 10Analytics-Kanban: Refactor Mediawiki-Database ingestion - https://phabricator.wikimedia.org/T209178 (10Nuria)
[05:09:04] <wikibugs>	 10Analytics, 10Analytics-Kanban: Clean up home dirs for users jamesur and nithum - https://phabricator.wikimedia.org/T212127 (10Nuria) 05Open→03Resolved
[05:09:37] <wikibugs>	 10Analytics, 10User-Elukey: Kerberos service running in production - https://phabricator.wikimedia.org/T211836 (10Nuria)
[05:09:40] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Set up a Analytics Hadoop test cluster in production that runs a configuration as close as possible to the current one. - https://phabricator.wikimedia.org/T212256 (10Nuria) 05Open→03Resolved
[05:10:34] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban: EventLogging sanitization - https://phabricator.wikimedia.org/T199898 (10Nuria)
[05:10:36] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: [EL sanitization] Make cron send alert emails if job fails before calling refine - https://phabricator.wikimedia.org/T202429 (10Nuria) 05Open→03Resolved
[05:17:47] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Update git lfs on stat1006/7 - https://phabricator.wikimedia.org/T214089 (10Nuria) Looks like we are using git fat here as a means to deploy from stats machines to prod and we need a better way to do that, the stats machines should be used...
[05:18:08] <wikibugs>	 10Analytics, 10Analytics-Kanban: Refactor Sqoop, join actor and comment from analytics replicas - https://phabricator.wikimedia.org/T210522 (10Nuria)
[05:18:10] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update refinery-source jobs to join labsdb with actor and comment - https://phabricator.wikimedia.org/T210543 (10Nuria) 05Open→03Resolved
[05:18:39] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Refactor analytics cronjobs to alarm on failure reliably - https://phabricator.wikimedia.org/T172532 (10Nuria) This is such an aesome task to close.
[05:18:44] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Refactor analytics cronjobs to alarm on failure reliably - https://phabricator.wikimedia.org/T172532 (10Nuria) 05Open→03Resolved
[05:19:08] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban: EventLogging sanitization - https://phabricator.wikimedia.org/T199898 (10Nuria)
[05:19:11] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: [EL sanitization] Write and productionize script to drop partitions older than 90 days in events database - https://phabricator.wikimedia.org/T199836 (10Nuria) 05Open→03Resolved
[05:19:43] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Drop old mediawiki_history_reduced snapshots - https://phabricator.wikimedia.org/T197888 (10Nuria) Ping @fdans to confirm that dropping is happening as intended.
[05:20:40] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Chinese-Sites, 10Patch-For-Review: Add Chinese Wikiversity edit-related metrics to Wikistats 2 - https://phabricator.wikimedia.org/T213290 (10Nuria) 05Open→03Resolved
[05:23:02] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Punjabi Wikisource WikiStats 2.0 - https://phabricator.wikimedia.org/T215082 (10Nuria) This data should appear on March snapshot correct? cc @JAllemandou
[05:23:41] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add 'mediawiki_history_unchecked' dataset to  oozie - https://phabricator.wikimedia.org/T213524 (10Nuria) 05Open→03Resolved
[05:26:15] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Presto cluster online and usable with test data pushed from analytics prod infrastructure accessible by Cloud (labs) users - https://phabricator.wikimedia.org/T204951 (10Nuria) After our talk with ops and cloud looks like we are going to need to move this...
[05:28:08] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add new wikis to analytics - https://phabricator.wikimedia.org/T209822 (10Nuria) 05Open→03Resolved
[06:14:28] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10elukey) >>! In T215231#4951831, @Cmjohnson wrote: > @elukey is this a 1G or 10G rack?  I see that labsdb1010 and 1011 have 1G, so I'd go for...
[07:43:08] <wikibugs>	 (03CR) 10HaeB: "Still checking the correct spelling of the field names (underscores or dashes), see https://phabricator.wikimedia.org/T216096#4953210" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/490514 (https://phabricator.wikimedia.org/T216096) (owner: 10HaeB)
[08:29:19] <HaeB>	 http://yarn.wikimedia.org/ seems broken - was this deactivated? 
[08:30:18] <elukey>	 Hey HaeB, checking
[08:30:32] <HaeB>	 redirects to http://an-master1002.eqiad.wmnet:8088/ after logging in
[08:32:41] <joal>	 Good morning elukey 
[08:33:18] <elukey>	 Hey joal how are you feeling?
[08:33:28] <elukey>	 HaeB: indeed 1002 is the master, I am going to check what happened
[08:33:33] <elukey>	 and then restore 1001 in a bit
[08:34:42] <joal>	 elukey: Not yet great, but on the way - a lot less pain in the back and Naé's fever is gone, so she has gone to the creche today
[08:35:30] <elukey>	 joal: glad that something is fixed then :)
[08:35:34] <HaeB>	 also, i'm seeing odd behavior with this job right now http://an-master1001.eqiad.wmnet:8088/proxy/application_1547732626747_95814/   ...
[08:36:37] <elukey>	 HaeB: what kind of odd behavior?
[08:36:39] <HaeB>	 ...the query seems to take way longer than one would expect, and has been stuck at the same cumulative cpu time for 45min now:
[08:36:51] <HaeB>	 https://www.irccloud.com/pastebin/JcHZth4k/
[08:37:28] <HaeB>	 hope you'll feel better soon joal !
[08:37:36] <joal>	 Thanks HaeB :)
[08:37:37] <elukey>	 HaeB: I may know what's happening, but I'd need you to kill/restart the job
[08:37:56] <elukey>	 I think that the Hadoop testing cluster weird Yarn state was kind of the issue
[08:37:56] <joal>	 elukey: I'm interested to know :)
[08:38:21] <HaeB>	 elukey: no problem, or feel free to kill it yourself
[08:38:25] <joal>	 hm - misfit between testing and prod clusters elukey ?
[08:38:44] <elukey>	 while checking Yarn for the testing workflows, I noticed that there were jobs listed by yarn application -list on the testing cluster
[08:38:55] <elukey>	 In theory there shouldn't be any overlap
[08:39:04] <joal>	 Nope
[08:39:13] <elukey>	 but I have restarted yarn in the testing cluster
[08:39:27] <elukey>	 I now don't see anything
[08:39:42] <elukey>	 I am wondering if it was a past config still running by mistake
[08:39:58] <HaeB>	 (the kill command giving in the hive messages after launching the query is wrong btw:
[08:40:02] <HaeB>	  "Starting Job = job_1547732626747_95814, Tracking URL = http://an-master1001.eqiad.wmnet:8088/proxy/application_1547732626747_95814/
[08:40:02] <HaeB>	 Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1547732626747_95814"
[08:40:28] <HaeB>	 https://www.irccloud.com/pastebin/YgcxAvRI/
[08:40:49] <elukey>	 yep yep
[08:41:08] <elukey>	 I am seeing application_1547732626747_95814 though in yarn application -list though
[08:43:12] <joal>	 not anymore for me elukey 
[08:43:41] <joal>	 also elukey - app-numbers have restarted from scratch 
[08:43:57] <elukey>	 I just restarted an-master1002 as test
[08:44:07] <elukey>	 so I think that they are sharing zk config
[08:44:07] <joal>	 Ah - makes sense
[08:44:10] <elukey>	 it is the only explanation
[08:44:39] <elukey>	 I am seeing 'camus-webrequest' in an1028 logs
[08:45:01] <HaeB>	 just got an impressivly long error msg after "tbayer@stat1004:~$ yarn application -kill  application_1547732626747_95814" ....
[08:45:23] <HaeB>	 ending with "19/02/14 08:42:19 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to an-master1001-eqiad-wmnet
[08:45:23] <HaeB>	 Application application_1547732626747_95814 has already finished"
[08:45:38] <elukey>	 yep lemme try to fix it first
[08:46:10] <HaeB>	 ok i'll let you folks do your thing and get out of your way ;)
[08:46:19] <elukey>	 thanks for the report!
[08:46:29] <elukey>	 so
[08:46:29] <elukey>	 [zk: localhost:2181(CONNECTED) 2] ls /yarn-leader-election/analytics-hadoop
[08:46:32] <HaeB>	 np, thanks for looking into it!
[08:46:32] <elukey>	 [ActiveBreadCrumb, ActiveStandbyElectorLock]
[08:46:34] <elukey>	 these are separate
[08:46:37] <elukey>	 [zk: localhost:2181(CONNECTED) 3] ls /yarn-leader-election/analytics-test-hadoop
[08:46:40] <elukey>	 [ActiveBreadCrumb, ActiveStandbyElectorLock]
[08:46:56] <joal>	 elukey: please let me know if/how I can help
[08:50:11] <elukey>	 joal: brainbounce in here is great :)
[08:50:19] <elukey>	 so here's what I am seeing
[08:50:34] <elukey>	 1) tried to restart yarn on an-master1002 (was active)
[08:51:02] <elukey>	 2) state didn't change (1002 active 1001 standby)
[08:51:26] <elukey>	 3) analytics1028's yarn logs show some activity, namely that yarn has been fenced
[08:55:05] <joal>	 elukey: can you elaborate on "fenced"|
[08:55:06] <joal>	 ?
[08:56:51] <joal>	 elukey: creation-times for camus webrequest camus-files are a bit odd for the last hour
[08:57:44] <elukey>	 joal: now it should look better
[08:57:58] <joal>	 elukey: as if camus jobs had been stopped between ~07:45 and ~08:45
[08:58:02] <elukey>	 I have stopped both yarns on the testing cluster and restarted the one on 1002
[08:58:07] <elukey>	 now 1001 is the master correctly
[08:58:33] <elukey>	 joal: I think that they were trying to run, for some obscure reason, on the testing cluster
[08:59:01] <joal>	 elukey: ok - weird !
[08:59:05] <elukey>	 I can see a list of jobs in the testing yarn logs that are clearly production
[08:59:11] <joal>	 right
[08:59:25] <joal>	 thanks for making clusters agree on who's who ;)
[08:59:27] <elukey>	 to answer your question - fencing should be used when more than one yarn rm try to be master
[08:59:51] <joal>	 ok
[09:01:06] <elukey>	 oh my
[09:01:52] <elukey>	 Yarn stores application ids in ZK!
[09:02:11] <elukey>	 under /rmstore
[09:02:16] <elukey>	 that is not partitioned by cluster
[09:02:21] * elukey cries in a corner
[09:02:23] <joal>	 elukey: NO WAY !!!!
[09:02:27] <joal>	 :(
[09:02:41] <elukey>	 this explains it! 
[09:03:13] <joal>	 it does indeed - But man, the need to have multiple ZK clusters for multiple yarn instances sounds not right to me :(
[09:03:35] <joal>	 elukey: maybe we have a way to manually choose the ZK-folder, and therefore change it by conf?
[09:04:31] <elukey>	 nono yarn.resourcemanager.zk-state-store.parent-path
[09:04:37] <elukey>	 Full path of the ZooKeeper znode where RM state will be stored. This must be supplied when using org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore as the value for yarn.resourcemanager.store.class
[09:04:55] <elukey>	 default: /rmstore
[09:05:08] <elukey>	 what a lovely default! -.-
[09:05:25] <joal>	 elukey: :S
[09:06:10] <wikibugs>	 10Analytics, 10Product-Analytics: Whitelist sample flags and page/rev ID fields for ReadingDepth schema - https://phabricator.wikimedia.org/T216096 (10Tbayer) @Jdrewniak points out that in https://github.com/wikimedia/mediawiki-skins-MinervaNeue/blob/f07985c6dee5106da8f381a47214e7349fcd147e/resources/skins.min...
[09:06:19] <joal>	 !log Re-run webrequest-load-wf-text-2019-2-14-6
[09:06:20] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:06:22] <elukey>	 HaeB: found the issue, yarn should work fine now
[09:06:33] <HaeB>	 thanks!
[09:06:38] <elukey>	 really sorry, we didn't know about this Yarn "feature"
[09:06:39] <elukey>	 sigh
[09:07:01] <HaeB>	 https://yarn.wikimedia.org/cluster/scheduler looks good for me now
[09:07:32] <joal>	 elukey: Don't blame yourself - There are so many of those hidden settings that keeping learning seems like the correct approach :)
[09:07:49] <joal>	 !log rerun mediawiki-history-wikitext-wf-2019-01
[09:07:51] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:08:20] <elukey>	 joal: 
[09:08:21] <elukey>	 [zk: localhost:2181(CONNECTED) 8] stat /rmstore/ZKRMStateRoot/RMAppRoot
[09:08:26] <elukey>	 numChildren = 10011
[09:08:32] <elukey>	 hahahahaha
[09:08:40] <elukey>	 this is where the application ids are kepy
[09:08:43] <elukey>	 *kept
[09:08:47] <elukey>	 insane
[09:09:40] <joal>	 elukey: shall we create a task about configuring this node-path through puppet?
[09:09:50] <elukey>	 yes definitely
[09:10:04] <elukey>	 probably keeping hadoop "production" under /rmstore
[09:10:19] <elukey>	 but the testing cluster under /rmstore-test or similar
[09:11:07] <joal>	 LGTM :)
[09:11:40] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1032 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[09:12:02] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1037 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[09:12:12] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1036 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[09:12:13] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1033 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[09:12:22] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1035 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[09:12:53] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1040 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[09:12:58] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1038 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[09:18:32] <elukey>	 yeeeeee
[09:18:46] <elukey>	 this is normal, they don't have their parents anymore :P
[09:48:49] <elukey>	 so joal it is crazy
[09:49:00] <elukey>	 I did a get for /rmstore/ZKRMStateRoot/RMAppRoot/application_1547732626747_93102
[09:49:12] <elukey>	 and in there there is the status of the application
[09:49:17] <elukey>	 (it was a random one)
[09:49:20] <elukey>	 classpaths etc..
[09:50:15] <elukey>	  /o\
[09:50:21] <elukey>	 why didn't they use a database for that!
[10:04:09] <joal>	 elukey: I have no clue why they didn't use a DSB :(
[10:15:08] <elukey>	 joal: created https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/490575/
[10:15:15] <elukey>	 I'll wait for andrew before merging
[10:15:22] <elukey>	 but it should solve the issue
[10:15:24] <joal>	 Thanks a lot elukey
[10:15:45] <elukey>	 basically I kept /rmstore for production and the cloud/presto one (they don't share zk)
[10:15:52] <elukey>	 and added -clustername to testing
[10:16:07] <elukey>	 also now by default we add -clustername
[10:16:24] <elukey>	 so there is not risk of "oh snap I forgot bla"
[10:16:38] <elukey>	 very interesting morning :D
[10:17:06] <joal>	 elukey: I'm assuming the diff for prod/test is in CDH patch (can't find it in the puppet one)
[10:19:06] <elukey>	 joal: this is the cdh patch https://gerrit.wikimedia.org/r/#/c/operations/puppet/cdh/+/490572
[10:19:21] <elukey>	 basically adds the parameter with default /rmstore-nameofthecluster
[10:19:35] <joal>	 Got it
[10:20:04] <joal>	 And you overwrite this in puppet for prod cluster to use /rmstore only
[10:20:35] <elukey>	 exactly
[10:20:48] <joal>	 great :)
[10:20:57] <elukey>	 otherwise they'll start fresh at first restart
[10:21:02] <elukey>	 ending up in a big mess
[10:21:04] <elukey>	 :D
[10:21:10] <joal>	 I can forecast that indeed :)
[10:22:59] <elukey>	 FIRE FIRE FIRE
[10:23:24] <elukey>	 I was so surprised that we didn't get into this kind of weirdness spinning up a new cluster
[10:23:26] <joal>	 elukey: I'm the one shouting and running in circles, you're the one picking the extinguisher :)
[10:23:49] <joal>	 elukey: So was I - but eh, just have to wait long enough :)
[10:24:17] <elukey>	 and I checked by chance - I wanted to know if camus was importing data
[10:24:22] <elukey>	 in the testing cluster
[10:29:07] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Punjabi Wikisource WikiStats 2.0 - https://phabricator.wikimedia.org/T215082 (10JAllemandou) It should appear in February snapshot, generated in March, yes :)
[10:30:06] <joal>	 elukey: just saw the task about network-links for the DB-beast
[10:31:04] <joal>	 elukey: given the point of this machine is to send data to hadoop, shouldn't we push for higher throughput (namely 10g instead of 1g)?
[10:35:05] <elukey>	 joal: in theory we could yes, saturating 1G is not that easy though, those hosts are already serving a lot of users without any problem
[10:35:54] <elukey>	 maybe I can add a note in the task asking if we can revise the choice later on (if needed)
[10:36:04] <elukey>	 basically getting a 10g port on the switch only if needed
[10:41:54] <joal>	 elukey: I have no clue of how much and how fast the host can gather from disk/ram, and therefore can't help forecasting the network need
[10:43:34] <elukey>	 joal: how many mappers are used now and how many are we planning to use with the new host?
[10:43:46] <elukey>	 (when we'll be the only user so no need to be gentle :P)
[10:44:19] <joal>	 elukey: currently we are limmited by the number of allowed connections-per-user (10)
[10:44:52] <elukey>	 joal: ah yes makes sense
[10:45:05] <elukey>	 lemme update the task with the 1vs10g question
[10:45:11] <joal>	 elukey: At max there are 4 mappers in parallel currently, whether through multiple small jobs or a signle big one
[10:45:58] <joal>	 elukey: If we have our own dedicated machine, we'll probably ask to have a dedicated user with a different limitation on number of connections, and therefore we should be able to query MOAT :)
[10:46:02] <joal>	 MOAR sorry
[10:46:41] <wikibugs>	 10Analytics, 10Readers-Web-Backlog: [Bug] Many ReadingDepth validation errors logged - https://phabricator.wikimedia.org/T216063 (10phuedx) The "Extra data:" error is raised by `json.loads` as it encounters the `;` character at the end of the JSON string:  `lang=python, name=[0] import json  json.loads('{"even...
[10:47:47] <elukey>	 joal: of course that makes sense, even to cut down time to completion
[10:47:55] <joal>	 indeed elukey 
[10:48:18] <joal>	 elukey: It'll also be very interesting to see how much the mysql box can handle
[10:57:35] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui)
[11:01:38] <elukey>	 yep agreed!
[11:05:17] <wikibugs>	 10Analytics, 10Product-Analytics, 10Readers-Web-Backlog (Tracking): Whitelist sample flags and page/rev ID fields for ReadingDepth schema - https://phabricator.wikimedia.org/T216096 (10ovasileva)
[11:10:49] <wikibugs>	 10Analytics, 10RESTBase, 10Core Platform Team Backlog (Later), 10Services (blocked): Verify that hit/miss stats in WebRequest are correct - https://phabricator.wikimedia.org/T215987 (10JAllemandou) Hey @Pchelolo - I think talking to the traffic team should be the way to go here. I ran a query to get result...
[11:18:34] <wikibugs>	 10Analytics: Coarse alarm on data quality for refined data based on entrophy calculations - https://phabricator.wikimedia.org/T215863 (10JAllemandou) I imagine we would add entropy-stats tables generated hourly (for hourly datasets). The entropy-generation code could (and should!) be generic and reusable, and th...
[11:22:59] <wikibugs>	 10Analytics, 10Performance-Team (Radar): [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema - https://phabricator.wikimedia.org/T214384 (10JAllemandou) just to make sure I have the correct sequence of actions in mind:  - Update `event.navigationtiming` hive table so that `deviceMemory...
[11:24:23] <wikibugs>	 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) Thanks @EBernhardson! Buried in the previous updates there is a solution to this mess, namely using python3.6 via snapshot.debian.org....
[11:36:36] <elukey>	 ah wait
[11:36:39] <wikibugs>	 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10MoritzMuehlenhoff) >>! In T148843#4953772, @elukey wrote: > 2) the null pointer is due to some code for the Hawaii GPU cards (like ours), so no...
[11:36:39] <elukey>	 I just noticed yarn.resourcemanager.fs.state-store.uri
[11:40:03] <wikibugs>	 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) Yep definitely, I am all for it. I hoped to get this GPU working beforehand to have a vague idea about what card we needed (and what qu...
[11:43:19] <wikibugs>	 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10MoritzMuehlenhoff) https://rocm.github.io/ROCmInstall.html#supported-gpus should serve as a useful enough base to select a new GPU I guess (we'...
[12:04:31] <wikibugs>	 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) I'd stick with:  ` GFX9 GPUs “Vega 10” chips, such as on the AMD Radeon RX Vega 64 and Radeon Instinct MI25 “Vega 7nm” chips `  The pri...
[12:06:35] <elukey>	 gpus /o\
[12:06:54] <elukey>	 joal: the last one from AMD (Instinct 7nm) is around 10k :D
[12:07:09] <elukey>	 the other ones look less pricy though
[12:13:51] <joal>	 elukey: wow
[12:14:45] <joal>	 elukey: do you know if we are in read-only mode for wikitech currently?
[12:15:35] <wikibugs>	 10Analytics, 10Readers-Web-Backlog: [Bug] Many ReadingDepth validation errors logged - https://phabricator.wikimedia.org/T216063 (10phuedx) >>! In T216063#4953716, @phuedx wrote: > AFAICT [[ https://github.com/wikimedia/eventlogging/blob/08a1dff0efb4559a7ac8cbcc0633d34ebb1c57b8/eventlogging/parse.py#L110-L114...
[12:16:17] <elukey>	 joal: in theory no
[12:16:38] <elukey>	 I just tried to hit "edit" and it seems working
[12:16:49] <elukey>	 do you get specific errors?
[12:17:11] <joal>	 elukey: nope, I don't get edit mode
[12:17:23] <elukey>	 strange
[12:18:59] <joal>	 elukey: logged-out then back in, still no luck :(
[12:19:38] <joal>	 elukey: I tried to edit this page: https://wikitech.wikimedia.org/w/index.php?title=Analytics/Systems/AQS
[12:19:57] <elukey>	 I am in edit mode now :(
[12:20:02] <elukey>	 any JS errors or similar?
[12:20:07] <elukey>	 might be a visual editor issue
[12:20:11] <elukey>	 try also edit source
[12:20:16] <joal>	 I tried source as well
[12:21:01] <joal>	 indeed elukey - JS errors
[12:21:45] <elukey>	 there are always JS errors :D
[12:21:49] <joal>	 :D
[12:21:54] <elukey>	 going to lunch!
[12:22:03] <joal>	 see ya
[12:57:11] <wikibugs>	 (03CR) 10Joal: [C: 04-1] "Comment inline, and another big one here: the success-email should be sent when denormalize-check succeeds (I'm sorry Francisco to ask you" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484657 (https://phabricator.wikimedia.org/T206894) (owner: 10Fdans)
[12:57:28] <joal>	 Heya fdans - Sorry, another bunch of comment here --^ :S
[13:02:48] <fdans>	 yessssssir
[13:43:02] <wikibugs>	 10Analytics, 10Research: Check home leftovers of ISI researchers - https://phabricator.wikimedia.org/T215775 (10elukey) I accidentally removed the 24 files under /srv/home/paolotti on stat1007 while cleaning up old data, apologies. IIRC they were mostly configuration files but hopefully I didn't delete any imp...
[13:48:24] <wikibugs>	 (03CR) 10Elukey: Introduce analytics-mysql (034 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/488473 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey)
[13:48:54] <elukey>	 joal: forgot to tell you, mediawiki-config is deployied on notebooks/stats
[13:49:28] <joal>	 elukey: yesir, you told me two days ago ;)
[13:49:39] <elukey>	 ah yes but it was a broken path
[13:49:44] <elukey>	 nevermind :)
[13:49:47] <joal>	 Ah - ok :)
[13:49:50] <elukey>	 lemme know if it works!
[13:54:21] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Update git lfs on stat1006/7 - https://phabricator.wikimedia.org/T214089 (10Ladsgroup) >>! In T214089#4953296, @Nuria wrote: > Looks like we are using git fat here as a means to deploy from stats machines to prod and we need a better way t...
[13:58:02] <wikibugs>	 (03PS4) 10Elukey: Introduce analytics-mysql [analytics/refinery] - 10https://gerrit.wikimedia.org/r/488473 (https://phabricator.wikimedia.org/T212386)
[14:14:22] <wikibugs>	 (03PS7) 10Fdans: Change email send workflow to notify of completed jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484657 (https://phabricator.wikimedia.org/T206894)
[14:15:11] <wikibugs>	 (03CR) 10Fdans: "Moved the subworkflow to check_denormalize and added it upon completion of the reduced job." (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484657 (https://phabricator.wikimedia.org/T206894) (owner: 10Fdans)
[14:17:58] <wikibugs>	 10Analytics, 10Readers-Web-Backlog: [Bug] Many ReadingDepth validation errors logged - https://phabricator.wikimedia.org/T216063 (10phuedx) Following @Tbayer's advice, I did a little digging in `wmf.webrequest`:  There were 253 events logged that didn't end with the expected terminating character sequence ("%7...
[14:27:49] <wikibugs>	 (03CR) 10Elukey: [C: 04-1] "Still WIP" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/488473 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey)
[14:35:10] <wikibugs>	 (03PS5) 10Elukey: Introduce analytics-mysql [analytics/refinery] - 10https://gerrit.wikimedia.org/r/488473 (https://phabricator.wikimedia.org/T212386)
[14:42:28] <wikibugs>	 (03CR) 10Alaa Sarhan: [C: 03+1] Add methods for new hosts and changing good_articles.php to use that (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/490105 (https://phabricator.wikimedia.org/T213894) (owner: 10Ladsgroup)
[14:44:32] <wikibugs>	 (03CR) 10Ladsgroup: Add methods for new hosts and changing good_articles.php to use that (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/490105 (https://phabricator.wikimedia.org/T213894) (owner: 10Ladsgroup)
[14:45:59] <elukey>	 Amir1: o/
[14:46:14] <Amir1>	 hey there
[14:46:38] <elukey>	 one question if you have time - we are planning to set dbstore1002's staging to read-only for some hours on Monday, to then enable it on dbstore1005
[14:46:47] <elukey>	 would it be ok for you guys as timing?
[14:49:57] <Amir1>	 elukey: I need to check
[14:54:05] <elukey>	 (I sent an email the other day but nobody answered :P)
[14:54:30] <Amir1>	 elukey: I dind't receive it :/
[14:55:00] <elukey>	 ah snap wait I think I sent it to your gmail account
[14:55:05] <elukey>	 just realized it
[14:55:14] <Amir1>	 Still I should get it 
[14:55:25] <Amir1>	 maybe I missed it through all of other emails
[14:55:34] <elukey>	 subject is "Maintenance proposal for dbstore1002 - staging database migration to dbstore1005 on Monday 18th (EU Morning)"
[14:56:28] <Amir1>	 Okay according to puppet. We run four types of scripts (some of them write to the database). The first group is being ran at 03 UTC everyday, the second group is being ran 12 UTC every day. The third one is 1 UTC at the 7th day of the week. 
[14:56:43] <Amir1>	 I don't think these three group clash with your case
[14:56:57] <Amir1>	 but fourth group actually is being every minute. I need to check those
[14:58:06] <Amir1>	 If it's going to be read-only permanently, we are going to have a problem, but for several hours (as long as it doesn't clash with these three) it should be fine.
[15:02:14] <milimetric>	 1002 will be permanently read only, so scripts would have to point to 1005
[15:02:26] <milimetric>	 Amir1: ^
[15:02:37] <Amir1>	 (Regarding your email, I found it but I don't know why I missed it)
[15:03:06] <Amir1>	 milimetric: In longer term, I know and I'm already building the code to use the new instances: https://gerrit.wikimedia.org/r/490105 
[15:03:12] <milimetric>	 k
[15:03:40] <Amir1>	 Just need some weeks to get this ironed out and deployed
[15:12:31] <elukey>	 Amir1: there will be some hours (like 16) for the read-only period + import on dbstore1005
[15:12:42] <milimetric>	 elukey: does that timeline work for you?  I thought the staging migration was one-time only and permanent.  Just trying to prevent a misunderstanding
[15:13:07] <Amir1>	 that's a lot, It would clash and cause things to fail
[15:13:17] <elukey>	 milimetric: what do you mean?
[15:13:32] <Amir1>	 I wish I could speed up the process of moving this forward
[15:13:51] <milimetric>	 I think it’s possible you two are talking past each other :)  quick hangout to confirm?
[15:13:56] <elukey>	 Amir1: just to summarize - green light from your for monday? 
[15:13:58] <Amir1>	 addshore is travelling atm and is OoO for the conference 
[15:14:22] <Amir1>	 elukey: if it's 16, It would break things
[15:15:08] <milimetric>	 Amir1, elukey: https://plus.google.com/hangouts/_/wikimedia.org/a-batcave
[15:15:20] <elukey>	 milimetric: I think IRC is fine
[15:15:48] <milimetric>	 ok, it sounds to me like you're expecting Amir to use 1005 after the switch, and it sounds to me like Amir is expecting to use 1002 for another few weeks
[15:16:42] <addshore>	 0/
[15:16:45] <Amir1>	 Let me talk to my managers about it, they might speed up getting this done
[15:16:59] <addshore>	 Is this the dB switchover?
[15:17:12] <Amir1>	 hey there, addshore dbstore1002 is going read-only for 16 hours on Monday
[15:17:13] <elukey>	 addshore: only the 'staging' part
[15:17:15] <Amir1>	 (at least)
[15:17:36] <elukey>	 Amir1: sorry probably my bad, only staging on dbstore1002 will be read only starting from Monday
[15:17:38] <Amir1>	 I hope I summarized it correctly 
[15:18:05] <Amir1>	 elukey: oh good to know but it's the same for us. The scripts makes temp tables at dbstore1002
[15:18:10] <elukey>	 after 16 hours (more or less, needed for dump + import) we'll have staging up and running on dbstore1005
[15:18:48] <milimetric>	 elukey: and critically, it will REMAIN read-only after that
[15:18:51] <Amir1>	 elukey: one thing: Is 1005 currently writeable?
[15:19:14] <elukey>	 Amir1: in theory yes, but it does not have all the data that we have on dbstore1002
[15:19:27] <mforns>	 nuria, "success" -> cool!
[15:19:29] <elukey>	 the procedure is meant to move the last snapshot of data there
[15:19:38] <Amir1>	 that's fine. we need it for temp tables
[15:19:48] <Amir1>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/490085
[15:19:55] <Amir1>	 elukey: addshore ^ take a look
[15:20:15] <Amir1>	 I think we need to speed up the process of moving to the new nodes. I can give priority to staging bit
[15:20:21] <mforns>	 nuria, I was just checking it, and I managed to load one more day, there were a couple issues with the casing of the field names, and 'the space' after the backslash xD, but it seemed to work :]
[15:20:40] <elukey>	 Amir1: so staging is used only for temp tables, and nothing more?
[15:20:49] <elukey>	 you guys don't store anything on it that is valuable?
[15:21:07] <milimetric>	 Amir1: I'm paranoid now so I want to make sure you confirm: staging db on dbstore1002 will be read-only permanently, not just for 16 hours
[15:21:31] <Amir1>	 elukey: no AFAIK. addshore can confirm. I can double check the code though
[15:21:54] <Amir1>	 milimetric: Okay it's noted and it would break things :(
[15:22:49] <milimetric>	 Amir1: ok, good, glad I wasn't crazy and it was an actual misunderstanding.  But can you just switch to writing to 1005 now?  And drop the tables on 1002?  That way when Luca does the migration, your tables won't get overwritten and you can continue to operate on 1005?
[15:24:31] <Amir1>	 sure thing but that requires some work: 1- merging the mentioned puppet patch 2- merging this patch https://gerrit.wikimedia.org/r/c/analytics/wmde/scripts/+/490105 and 3- Making some more patches and merging them
[15:24:44] <Amir1>	 it should not be hard but someone should review, merge and test
[15:25:11] <milimetric>	 hopefully not in that order :)
[15:25:46] <elukey>	 one thing that I still don't get is what kind of data/worflow we are talking about
[15:26:45] <elukey>	 because I was convinced from a chat with addshore that it was some support script that could have tolerated even days without data
[15:27:00] <elukey>	 now this is not what we want of course, but we assume that staging is a scratch pad
[15:27:16] <elukey>	 not something that if it goes down for some hours causes a major pain 
[15:27:25] <elukey>	 this needs to be ironed out well
[15:27:39] <elukey>	 so I'd say that we are not ready for Monday
[15:27:45] <elukey>	 there are too many question marks
[15:28:08] <elukey>	 I'd skip to the week after, so hopefully we'll have enough time to do things without hurrying
[15:28:21] <elukey>	 but we need to have a chat about those scripts
[15:28:27] <elukey>	 Amir1, addshore --^
[15:28:55] <wikibugs>	 (03CR) 10Milimetric: Add methods for new hosts and changing good_articles.php to use that (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/490105 (https://phabricator.wikimedia.org/T213894) (owner: 10Ladsgroup)
[15:28:57] <Amir1>	 These run on daily basis, it's possible that losing data in those cases doesn't matter but we need to tell this to PMs/EMs
[15:29:39] <Amir1>	 and let them decide
[15:29:40] <addshore>	 We do store stuff in staging for the analytics/WMDE/scripts repo, but nothing is suuuppeeer valuable
[15:29:59] <addshore>	 If they don't run for a few days that wouldn't be the worst thing
[15:30:16] <elukey>	 my point is that there might be the assumption that the database is supported as critical, meanwhile for us (analytics) it is just a scratch pad
[15:30:47] <addshore>	 Nope, if all of our data vanished there I wouldn't really matter
[15:30:51] <milimetric>	 it doesn't sound like we need to delay the migration, elukey
[15:31:31] <addshore>	 For the scripts in analytics/WMDE/scripts
[15:31:39] <milimetric>	 elukey / Amir1: maybe the only thing to do is to pause the jobs and drop the tables on 1002?
[15:32:02] <addshore>	 I wouldn't even bother pausing our jobs, just let them fail :)
[15:32:12] <Amir1>	 yeah
[15:32:18] <addshore>	 They won't sound any alarms etc
[15:32:26] <addshore>	 Just fail quietly
[15:32:29] <Amir1>	 just keep Raz/Leszek and Lydia in the loop imo
[15:32:51] <Amir1>	 I will write an email today
[15:33:00] <addshore>	 Yarp, which day is this happening again? I can just send an email now
[15:33:08] <Amir1>	 Monday
[15:33:12] <addshore>	 Cool
[15:33:24] <elukey>	 just to clarify: on Monday early EU morning we'll do the following:
[15:33:29] <addshore>	 We can write one then! :) Might even be able to fix them all Monday ;)
[15:33:36] <elukey>	 1) set staging as read only on dbstore1002
[15:33:49] <elukey>	 2) mysqldump the db (that takes hours since the host is slooooow)
[15:34:00] <elukey>	 3) import the dump on dbstore1005
[15:34:20] <elukey>	 after 3) staging will be only read/write on dbstore1005
[15:34:27] <elukey>	 1 -> 3 takes ~16h
[15:34:47] <elukey>	 so ideally we'll have dbstore1005 ready Tuesday early EU morning
[15:34:51] <elukey>	 addshore, Amir1 ---^
[15:35:10] <wikibugs>	 10Analytics, 10Research, 10Article-Recommendation: Generate article recommendations in Hadoop for use in production - https://phabricator.wikimedia.org/T210844 (10bmansurov) @Nuria thanks. I've gone through the [[ https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Oozie | Oozie documentation ]], b...
[15:35:12] <addshore>	 Sounds good
[15:35:19] <Amir1>	 let's do it
[15:35:34] <elukey>	 ok so next steps IIUC:
[15:35:43] <elukey>	 - check with PMs/etc.. that this is ok 
[15:35:58] <elukey>	 - prepare the code reviews to switch to dbstore1005 permanently (only for staging)
[15:36:15] <elukey>	 if those are green by tomorrow we can proceed, otherwise we'll reschedule
[15:36:18] <elukey>	 would it be ok?
[15:37:47] <Amir1>	 fine for me
[15:38:14] <elukey>	 all right, so I'll wait for your ping later on or tomorrow
[15:39:54] <Amir1>	 addshore: Are you writing the email or I should?
[15:40:17] <wikibugs>	 (03CR) 10Milimetric: Introduce analytics-mysql (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/488473 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey)
[15:40:54] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2 C: 03+2] "never mind I'll just add that in my next patch, merging." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/488473 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey)
[15:41:13] <wikibugs>	 (03CR) 10Elukey: Introduce analytics-mysql (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/488473 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey)
[15:41:22] <elukey>	 ahahahah
[15:41:24] <milimetric>	 :)
[15:41:30] <elukey>	 I was about to fix it!
[15:41:31] <wikibugs>	 (03CR) 10Ladsgroup: Add methods for new hosts and changing good_articles.php to use that (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/490105 (https://phabricator.wikimedia.org/T213894) (owner: 10Ladsgroup)
[15:41:40] <milimetric>	 elukey: no I couldn't do that to you, it's too nitpicky
[15:41:40] <addshore>	 Amir1: I'll let you
[15:41:48] <Amir1>	 okay
[15:41:56] <addshore>	 The internet here is "utter shite"
[15:42:13] <Amir1>	 haha, have fun :D
[15:43:38] <elukey>	 hahahahahah
[15:43:56] <elukey>	 I haven't heard "shite" in a long time (/me remembers ireland..)
[15:45:36] <addshore>	 elukey: you lived in Ireland? :P
[15:45:43] <elukey>	 I did! 3y :)
[15:45:50] <addshore>	 Oooooh
[15:46:06] <elukey>	 I know that you wouldn't even think about it from my horrible italian accent
[15:46:12] <elukey>	 I blame the italian community in there
[15:46:15] <elukey>	 too many expats :D
[15:52:30] <wikibugs>	 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10Nuria) Giving my approval here to buy a new GPU card, need to consult with @elukey when it comes to budget but I think we could use part of the...
[15:58:03] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Update git lfs on stat1006/7 - https://phabricator.wikimedia.org/T214089 (10Nuria) @Ladsgroup sounds good, just mention other ticket in this one so we can follow that work.
[16:13:13] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Update git lfs on stat1006/7 - https://phabricator.wikimedia.org/T214089 (10Halfak) Thank you!  I've confirmed that this works.  Sorry for the delay @elukey.  The last couple weeks have been a bit unusual.    Regarding the point that @nuri...
[16:32:19] <wikibugs>	 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) Need to triple check with somebody else but from the inventory stat1005 is a Dell PowerEdge 730, that should be equipped with a Intel X...
[16:41:36] <elukey>	 ah wait Andrew is off for today/tomorrow right?
[16:41:49] <elukey>	 I was waiting before merging my hadoop patch :D
[16:42:30] <elukey>	 joal: if you are on my side we can merge/test tomorrow morning
[16:49:00] <wikibugs>	 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10EBernhardson) As long as it fits in the case, a high end consumer GPU from AMD should be just fine. The most important spec for choosing will p...
[16:51:22] <joal>	 elukey: we can do that (merge and test hadoop patch tomorrow)
[16:56:03] <wikibugs>	 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) IIRC during the last procurement task Rob took care of all the aspects related to power consumption and space, so in theory we should b...
[17:01:42] <nuria>	 ping ottomata 
[17:02:33] <nuria>	 ping fdans 
[17:21:44] <wikibugs>	 10Analytics, 10DC-Ops, 10decommission, 10User-Elukey: Decommission analytics1003 - https://phabricator.wikimedia.org/T206524 (10RobH)
[17:22:59] <wikibugs>	 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10EBernhardson) I look back over things and it looks like stat1005 is in an R470 case, they advertise compatibilty with several full-size nvidia...
[17:24:28] <wikibugs>	 10Analytics, 10DC-Ops, 10decommission, 10User-Elukey: Decommission analytics1003 - https://phabricator.wikimedia.org/T206524 (10RobH)
[17:36:04] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Update git lfs on stat1006/7 - https://phabricator.wikimedia.org/T214089 (10Nuria) Models used in production should be build in hadoop and pushed directly from there wherever they go, for this there is some work that our team needs to on h...
[17:45:28] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Beta: Provide easier mapping between Wikistats1 metrics and Wikistats2 metrics (example: "active editors") - https://phabricator.wikimedia.org/T187806 (10Nuria) a:03fdans
[17:47:08] <wikibugs>	 10Analytics, 10Dumps-Generation, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10JAllemandou)
[17:48:41] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Beta: Provide easier mapping between Wikistats1 metrics and Wikistats2 metrics (example: "active editors") - https://phabricator.wikimedia.org/T187806 (10Nuria) Ping @fdans to describe the plan to have better "bridge" between wikistats1 and wikistats2
[17:48:59] <wikibugs>	 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10EBernhardson) Looks like one more option, a workstation card from AMD, the Vega Frontier has 16GB of memory with very similar compute to the Ve...
[17:51:20] <wikibugs>	 (03CR) 10Joal: "Comments inline :)" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484657 (https://phabricator.wikimedia.org/T206894) (owner: 10Fdans)
[17:51:24] <wikibugs>	 10Analytics, 10DC-Ops, 10decommission, 10User-Elukey: Decommission analytics1003 - https://phabricator.wikimedia.org/T206524 (10RobH)
[17:58:41] <wikibugs>	 10Analytics, 10DC-Ops, 10decommission, 10User-Elukey: Decommission analytics1003 - https://phabricator.wikimedia.org/T206524 (10RobH)
[17:59:14] <wikibugs>	 10Analytics, 10DC-Ops, 10Operations, 10decommission, and 2 others: Decommission analytics1003 - https://phabricator.wikimedia.org/T206524 (10RobH) a:05RobH→03Cmjohnson
[17:59:26] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Kanban (Doing), 10Services (doing): Add monolog adapters for Eventbus - https://phabricator.wikimedia.org/T216163 (10Pchelolo) p:05Triage→03Normal
[17:59:33] <wikibugs>	 10Analytics, 10DC-Ops, 10Operations, 10decommission, and 2 others: Decommission analytics1003 - https://phabricator.wikimedia.org/T206524 (10RobH) ready for wipe and unracking steps
[18:27:25] * elukey off!
[18:53:05] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10User-Marostegui: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10Neil_P._Quinn_WMF) >>! In T215589#4946535, @elukey wrote: > Message to everybody: >  > Analytics and the Data Persistence team are planning to schedule the official cut off date...
[19:31:25] <wikibugs>	 10Analytics, 10Product-Analytics, 10Readers-Web-Backlog (Tracking): Whitelist sample flags and page/rev ID fields for ReadingDepth schema - https://phabricator.wikimedia.org/T216096 (10Tbayer) Blocked on code review and an answer to T216096#4953210 from someone familiar with the whole EL pipeline and the pur...
[19:47:10] <wikibugs>	 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Reading-analysis, 10Patch-For-Review: Productionize per-country daily & monthly active app user stats - https://phabricator.wikimedia.org/T186828 (10kzimmerman) @mpopov to meet with Josh & Charlotte and figure out engineering steps to unblock this...
[20:07:16] <wikibugs>	 10Analytics, 10Dumps-Generation, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10Smalyshev) I don't have a real opinion on this one. Generally for dump users the only concern is for the dump to be recent enough...
[20:08:34] <wikibugs>	 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Reading-analysis, 10Patch-For-Review: Productionize per-country daily & monthly active app user stats - https://phabricator.wikimedia.org/T186828 (10Nuria) If we can track them using a variation of method described here: https://phabricator.wikime...
[20:09:52] <wikibugs>	 10Analytics, 10Product-Analytics, 10Readers-Web-Backlog (Tracking): Whitelist sample flags and page/rev ID fields for ReadingDepth schema - https://phabricator.wikimedia.org/T216096 (10mforns) @Tbayer Thanks for spotting this!  Even if the events arrive to HDFS with the page-issues-X_sample hyphen notation,...
[20:11:28] <wikibugs>	 (03PS2) 10Mforns: Update EventLogging whitelist with some fields that were recently added to ReadingDepth [analytics/refinery] - 10https://gerrit.wikimedia.org/r/490514 (https://phabricator.wikimedia.org/T216096) (owner: 10HaeB)
[20:12:05] <wikibugs>	 (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/490514 (https://phabricator.wikimedia.org/T216096) (owner: 10HaeB)
[20:34:49] <wikibugs>	 10Analytics, 10Product-Analytics, 10Readers-Web-Backlog (Tracking): Whitelist sample flags and page/rev ID fields for ReadingDepth schema - https://phabricator.wikimedia.org/T216096 (10Tbayer) Great, thanks a lot! The sample fields were introduced in September, so no need to go further back. (CC @Groceryheist )
[20:48:27] <wikibugs>	 10Analytics, 10Wikimedia-Stream, 10Services (watching): Eventstreams build is broken - https://phabricator.wikimedia.org/T216184 (10Pchelolo)
[20:50:58] <wikibugs>	 10Analytics, 10EventBus, 10Research, 10Wikidata, and 4 others: Surface link changes as a stream - https://phabricator.wikimedia.org/T214706 (10Pchelolo) 05Open→03Resolved a:03Pchelolo MW train has been deployed, so the events are available for all wikis. The final piece is adding the stream to the do...
[21:31:46] <mforns>	 joal, yt?
[22:36:23] <nuria>	 edsanders: woudl you be so kind to confirm you have access to cluster and such
[22:45:07] <edsanders>	 nuria: appears to be working now, thanks!
[22:46:45] <wikibugs>	 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10EBernhardson) >>! In T148843#4954602, @elukey wrote: > Need to triple check with somebody else but from the inventory stat1005 is a Dell PowerE...
[22:53:59] <nuria>	 edsanders: ok, let us know if we can help you with the information you were looking for initially 
[22:55:53] <nuria>	 mforns: i am going to delete testSerachsatisfaction and reindex cause i think something went bad
[22:56:03] <nuria>	 mforns: what is the easiest way to delete 
[22:56:05] <mforns>	 nuria, ok
[22:56:43] <mforns>	 nuria, what I do is to 1) use coordinator console to disable datasource
[22:57:28] <mforns>	 2) delete datasource from deep storage by sending an http request to overlord
[22:57:40] <mforns>	 3) restart turnilo to refresh
[22:57:55] <mforns>	 nuria, i can do that, will take me 3 mins
[22:58:06] <nuria>	 mforns: nono, i can do it
[22:58:34] <mforns>	 k :]
[23:14:54] <wikibugs>	 10Analytics, 10Research: Check home leftovers of ISI researchers - https://phabricator.wikimedia.org/T215775 (10leila) @elukey I will look into this task in the coming weeks. Until then, please don't delete. ;)
[23:15:03] <wikibugs>	 10Analytics, 10Research: Check home leftovers of ISI researchers - https://phabricator.wikimedia.org/T215775 (10leila) a:03leila
[23:24:34] <wikibugs>	 10Analytics, 10Product-Analytics: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset - https://phabricator.wikimedia.org/T211173 (10kzimmerman)
[23:27:52] <wikibugs>	 10Analytics, 10Product-Analytics: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset - https://phabricator.wikimedia.org/T211173 (10kzimmerman) a:03MNeisler @MNeisler will take on the task of creating the schema with guidance from @Neil_P._Quinn_WMF.
[23:38:55] <nuria>	 mforns: and... is there any magic so when i reindex teh data appears again after having deleted it?