[03:19:26] <AndyRussG>	 Hi all :) mmm getting some 500 internal server error response codes from Druid, both when querying from Jupyter and also through Pivot...
[03:20:24] <AndyRussG>	 Oh hmm seems to work now!
[03:55:31] <AndyRussG>	 Hmmm only on some queries it seems, lemme see......
[08:58:47] <elukey>	 hello people!
[08:59:03] <elukey>	 rolling restart of the remaining yarn nodemanagers to pick up the new settings
[08:59:09] <elukey>	 1 every 2 minutes
[09:37:38] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Enable more accurate smaps based RSS tracking by yarn nodemanager - https://phabricator.wikimedia.org/T182276#3843988 (10elukey) @EBernhardson the config should be now live everywhere, I keep checking metrics in https://grafana.wik...
[10:13:02] <wikibugs>	 10Analytics-Kanban, 10DBA, 10User-Elukey: Precautionary backup needed for the log database on db1107 before applying regular purging/sanitization - https://phabricator.wikimedia.org/T183123#3844059 (10elukey)
[10:30:58] <elukey>	 joal: o/
[10:31:20] <elukey>	 whenever you have time can we chat/brainstorm about what we'd need to test java 8 on hadoop?
[11:37:18] * elukey lunch + errand!
[12:11:48] <wikibugs>	 10Analytics, 10EventBus, 10Reading-Infrastructure-Team-Backlog, 10Trending-Service, and 2 others: Trending Edit's worker offsets disappear from Kafka - https://phabricator.wikimedia.org/T181346#3844372 (10mobrovac) 05Open>03declined The service has been retired from production, so closing.
[12:12:04] <wikibugs>	 10Analytics, 10EventBus, 10Reading-Infrastructure-Team-Backlog, 10Trending-Service, and 2 others: Trending Edit's worker offsets disappear from Kafka - https://phabricator.wikimedia.org/T181346#3844378 (10mobrovac)
[13:02:30] <mforns>	 elukey, can you ping me when you're back :]
[13:05:24] <joal>	 Hi team - I have some time now that Naé is sleeping, but I'll take today off - She's sick :(
[13:05:40] <joal>	 elukey: if you're back we can brainstorm :)
[13:14:51] <elukey>	 joal: bit busy now, we can chat tomorrow, nothing urgent.. if I may add a container to your brain scheduler: try to think about what do you need in labs to test hadoop/java8 and I'll try to make it happen :)
[13:15:03] <elukey>	 mforns: o/
[13:15:27] <joal>	 elukey: Starting my christmas wishlist ;)
[13:16:36] <elukey>	 :D
[13:19:16] <mforns>	 elukey, hey!
[13:20:03] <mforns>	 I'm trying to access puppet logs for hdfs ensures
[13:20:09] <mforns>	 but no permits
[13:20:44] <mforns>	 I think hdfs: /var/log/refinery is not successfully ensured
[13:23:34] <elukey>	 mforns: only root can do it
[13:23:42] <elukey>	 lemme check in puppet
[13:24:01] <mforns>	 elukey, am I right to think that /var/log/refinery should exist?
[13:24:28] <mforns>	 and that its absence is the cause of: /bin/sh: 1: cannot create /var/log/refinery/drop-mediawiki-history.log: Permission denied
[13:24:34] <mforns>	 ?
[13:24:44] <elukey>	 ah snap I forgot to follow up on that one!
[13:25:07] <elukey>	 did it realarm?
[13:25:28] <elukey>	 last one for me is the 15th of dec
[13:25:30] <mforns>	 dunno, I was looking into it because the alarm from friday
[13:25:31] <mforns>	 yes
[13:25:39] <mforns>	 ok, you already solved it? 
[13:25:58] <elukey>	 nono but it was far down in my todo list, lemme check it now
[13:26:00] <elukey>	 it seems important
[13:26:29] <elukey>	 so
[13:26:30] <elukey>	 elukey@analytics1003:/var/log$ ls -dl refinery/
[13:26:30] <elukey>	 drwxrwsr-x 2 hdfs analytics-admins 4096 Dec  4 06:25 refinery/
[13:26:38] <mforns>	 elukey, aaaaaaaa
[13:27:08] <mforns>	 I can not see that...
[13:27:25] <elukey>	 but
[13:27:26] <elukey>	 elukey@analytics1003:/var/log$ ls -l /var/log/refinery/drop-mediawiki-history.log
[13:27:29] <elukey>	 -rw-r--r-- 1 root analytics-admins 394 Nov 15 00:00 /var/log/refinery/drop-mediawiki-history.log
[13:27:37] <mforns>	 root!
[13:27:50] <elukey>	 and group has only read
[13:28:57] <mforns>	 elukey, the ensure command specifies user=hdfs and group=analytics-admins
[13:29:18] <mforns>	 (for refinery)
[13:29:35] <mforns>	 elukey, maybe........
[13:29:39] <elukey>	 mforns: we usually ensure only directories, not files
[13:29:59] <mforns>	 yes yes, I was talking about the directory
[13:30:00] <elukey>	 so each process is free to create them whenever they want
[13:30:07] <mforns>	 elukey, I think I know what happened
[13:30:45] <mforns>	 mediawiki-snapshot-cleaner was executing as root before, then we got a ton of errors, and we changed it to hdfs
[13:32:02] <elukey>	 this one is probably the issue
[13:32:03] <elukey>	 /var/log/refinery/*.log { size 100M rotate 4 missingok notifempty nocreate su root hdfs
[13:32:07] <elukey>	 }
[13:32:09] <elukey>	 horrible paste
[13:35:43] <mforns>	 hey elukey, sorry I got an internet hiccup
[13:36:00] <mforns>	 I was saying that I think we can delete that log file, and then let the cleaner recreate it under hdfs
[13:37:06] <elukey>	 sure, I still have no idea why the logrotate rule for the dir is root/hdfs
[13:37:09] <elukey>	 probably wrong
[13:38:58] <wikibugs>	 (03PS1) 10GoranSMilovanovic: minor [analytics/wmde/WDCM-Overview-Dashboard] - 10https://gerrit.wikimedia.org/r/398833
[13:39:01] <elukey>	 anyhow, I have chowned the file to hdfs
[13:39:17] <mforns>	 elukey, ok, thanks! this should do the trick
[13:39:21] <elukey>	 ah it runs only the 15th of the month!
[13:39:25] <mforns>	 yes
[13:39:31] <elukey>	 mforns: are you going to re-launch it?
[13:39:47] <mforns>	 no, no, it was not a stoppage I think
[13:39:51] <mforns>	 oh, wait
[13:39:52] <wikibugs>	 (03PS1) 10GoranSMilovanovic: minor [analytics/wmde/WDCM-Usage-Dashboard] - 10https://gerrit.wikimedia.org/r/398834
[13:40:24] <wikibugs>	 (03PS1) 10GoranSMilovanovic: minor [analytics/wmde/WDCM-Semantics-Dashboard] - 10https://gerrit.wikimedia.org/r/398835
[13:40:41] <wikibugs>	 (03CR) 10GoranSMilovanovic: [C: 032] minor [analytics/wmde/WDCM-Overview-Dashboard] - 10https://gerrit.wikimedia.org/r/398833 (owner: 10GoranSMilovanovic)
[13:40:45] <wikibugs>	 10Analytics, 10DBA, 10Patch-For-Review, 10User-Elukey: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#3844569 (10jcrespo)
[13:40:48] <wikibugs>	 10Analytics-Kanban, 10DBA, 10User-Elukey: Precautionary backup needed for the log database on db1107 before applying regular purging/sanitization - https://phabricator.wikimedia.org/T183123#3844568 (10jcrespo)
[13:40:49] <wikibugs>	 (03CR) 10GoranSMilovanovic: [C: 032] minor [analytics/wmde/WDCM-Usage-Dashboard] - 10https://gerrit.wikimedia.org/r/398834 (owner: 10GoranSMilovanovic)
[13:40:56] <wikibugs>	 10Analytics-Kanban, 10DBA, 10User-Elukey: Precautionary backup needed for the log database on db1107 before applying regular purging/sanitization - https://phabricator.wikimedia.org/T183123#3844059 (10jcrespo) a:03jcrespo
[13:40:58] <wikibugs>	 (03CR) 10GoranSMilovanovic: [C: 032] minor [analytics/wmde/WDCM-Semantics-Dashboard] - 10https://gerrit.wikimedia.org/r/398835 (owner: 10GoranSMilovanovic)
[13:41:04] <mforns>	 elukey, I'd say it was only a problem of writing the log, let me check
[13:41:33] <elukey>	 okok
[13:41:41] <elukey>	 mforns: totally unrelated thing - https://phabricator.wikimedia.org/T183123#
[13:41:50] <mforns>	 lookin
[13:43:22] <mforns>	 elukey, looks good to me, both options
[13:43:32] <elukey>	 paranoia mode on :D
[13:43:47] <mforns>	 elukey, wouldn't it be possible to restore data from slave in case of failure?
[13:44:03] <mforns>	 oh, you mean in case there's a bug in both hosts?
[13:44:09] <elukey>	 yeah
[13:44:12] <mforns>	 ok
[13:44:27] <mforns>	 yea, we can keep the dump for a couple of weeks
[13:46:34] <elukey>	 mforns: other thought - would it be more consistent if we kept the same purging scheme on both db1108/db1107 ?
[13:46:57] <mforns>	 elukey, hm
[13:47:22] <mforns>	 elukey, would it be possible performance-wise?
[13:47:33] <elukey>	 I'd say so yes
[13:47:48] <mforns>	 executing batch updates while inserting?
[13:48:18] <elukey>	 well it happens even for the slave since the eventlogging_sync periodically does those in batch
[13:48:37] <mforns>	 elukey, I see
[13:48:37] <elukey>	 and drops would also impact the mysql db performance wise
[13:49:44] <mforns>	 I'm not experienced at all in this, so yea, if it makes no difference to performance, I'd say yes, let's keep both master and slave the same :]
[13:49:46] <mforns>	 better
[13:51:28] <elukey>	 we can test it and see the performance impact
[13:51:36] <mforns>	 elukey, the mediawiki snapshot cleaner failed indeed, will relaunch it
[13:51:40] <elukey>	 super
[13:51:48] <elukey>	 remember to use a screen/tmux session
[13:51:54] <elukey>	 so you'll be able to detach it
[13:52:12] <mforns>	 sure
[14:01:16] <elukey>	 mforns: as FYI, jaime started a mysqldump on db1107 just now that may be a bit aggressive
[14:01:31] <elukey>	 can you help me whatch el metrics to see if anything slows down too much?
[14:01:37] <mforns>	 elukey, sure
[14:01:52] <mforns>	 will look at the mysql_consumer logs
[14:02:20] <wikibugs>	 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Elukey: Precautionary backup needed for the log database on db1107 before applying regular purging/sanitization - https://phabricator.wikimedia.org/T183123#3844654 (10jcrespo) Backup is ongoing on db1107:/srv/backups/export-20171218-135659  kill the myd...
[14:02:35] <wikibugs>	 10Analytics-Kanban, 10DBA, 10User-Elukey: Precautionary backup needed for the log database on db1107 before applying regular purging/sanitization - https://phabricator.wikimedia.org/T183123#3844655 (10jcrespo)
[14:04:26] <mforns>	 elukey, the log seems parallized
[14:04:46] <mforns>	 since 7 minutes ago
[14:05:40] <elukey>	 mforns: let's stop the mysql consumer then
[14:05:59] <mforns>	 elukey, yes, do you know how to stop it independently?
[14:06:51] <elukey>	 never done it.. eventloggingctl something stop? :D
[14:07:53] <elukey>	  stop eventlogging/consumer NAME=mysql-m4-master-00 CONFIG=/etc/eventlogging.d/consumers/mysql-m4-master-00 ?
[14:07:56] <mforns>	 elukey, I don't know how to stop the consumer only: sudo eventloggingctl stop stops everything, but I'm not sure how to partilly stop it
[14:08:00] <mforns>	 lookin
[14:08:13] <mforns>	 yea, that makes sense
[14:08:32] <elukey>	 done
[14:09:38] <mforns>	 elukey, try: sudo eventloggingctl status
[14:09:51] <mforns>	 it says consumer     mysql-eventbus          start/running  7794
[14:10:07] <elukey>	 yeah, should it also list the other mysql as stopped?
[14:10:14] <mforns>	 yes?
[14:11:14] <icinga-wm>	 PROBLEM - Check status of defined EventLogging jobs on eventlog1001 is CRITICAL: CRITICAL: Stopped EventLogging jobs: consumer/mysql-m4-master-00 consumer/mysql-eventbus
[14:11:50] <elukey>	 nice :D
[14:12:01] <elukey>	 so the eventlogctl doesn't tell if something is stopped
[14:12:10] <elukey>	 it simply doesn't list it
[14:13:06] <mforns>	 ok
[14:13:45] <ottomata>	 yall doin eventlogging mysql things?
[14:13:48] <mforns>	 elukey, did you also stop the mysql-eventbus?
[14:13:53] <mforns>	 yep
[14:14:05] <elukey>	 ottomata: yesss sorry
[14:14:18] <ottomata>	 s'ok just saw page and checkin :)
[14:15:07] <elukey>	 ottomata: yes sorry, jaime is taking a mysql backup of db1107 and we stopped the mysql consumers
[14:18:11] <wikibugs>	 (03CR) 10Ottomata: [C: 031] Correct bug in mediawiki raw revision table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/398552 (owner: 10Joal)
[14:18:29] <elukey>	 ottomata: whenever you have time I have a list of things that happened for kafka
[14:18:39] <elukey>	 didn't have time to write an email
[14:18:47] <elukey>	 but I can do it if you want to read it later on
[14:19:59] <ottomata>	 that happened?
[14:20:07] <milimetric>	 hey all
[14:21:18] <wikibugs>	 (03CR) 10Milimetric: Correct bug in mediawiki raw revision table (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/398552 (owner: 10Joal)
[14:21:19] <elukey>	 ottomata: yeah, kafka1023 ready to be among the partition leaders again, vk on cp1008 pushing via TLS to jumbo, chat with brandon, etc..
[14:23:03] <mforns>	 heya
[14:27:05] <ottomata>	 OHH cool
[14:27:48] <wikibugs>	 (03CR) 10Milimetric: Change addblocker text (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398598 (https://phabricator.wikimedia.org/T182958) (owner: 10Nuria)
[14:27:52] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Enable more accurate smaps based RSS tracking by yarn nodemanager - https://phabricator.wikimedia.org/T182276#3844755 (10elukey)
[14:28:18] <wikibugs>	 10Analytics-Kanban: Refresh SWAP hardware - https://phabricator.wikimedia.org/T183145#3844760 (10Aklapper)
[14:33:15] <Shilad>	 General question to anybody who is on: Several hive tables have either a "project" (en.wikipedia) or a "db" (enwiki). Is there a good reason to use one or the other as a canonical identifier?
[14:37:18] <ottomata>	 Shilad:  i don't know the full details, but generally they will correlate, but I think there are some exceptions.
[14:37:30] <ottomata>	 db is explicitly what it is: the name of the mysql database that the wiki uses
[14:37:47] <ottomata>	 project is probably more appropriate, as it is explicitly normalized in some way
[14:38:10] <Shilad>	 Got it.
[14:39:17] <Shilad>	 ottomata: And I think I go back and forth using  mediawiki_project_namespace_map, but stripping ".org" from hostname to get project name.
[14:39:28] <Shilad>	 ottomata: Does that seem right to you?
[14:40:27] <ottomata>	 Shilad:  i actually don't know, mforns might know more? 
[14:41:17] <Shilad>	 Thanks! Context: I am trying to infer project for search queries and the page_info.project field in webrequest is sometimes unavailable, but host is always there. 
[14:51:55] <mforns>	 hi Shilad, mediawiki_project_namespace_map is the right way of translating from hostname to dbname an viceversa
[14:51:58] <mforns>	 :]
[14:52:18] <Shilad>	 Thanks, mforns. And I can just strip ".org" from hostname to get projectname?
[14:52:20] <mforns>	 regarding page_info.project in webrequest... not sure 
[14:52:31] <mforns>	 Shilad, yes
[14:52:44] <Shilad>	 mforns: Thanks!
[14:52:44] <mforns>	 that works
[14:52:51] <mforns>	 np :]
[15:01:43] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: [Wikistats2] Add link path to router-link - https://phabricator.wikimedia.org/T183149#3844925 (10mforns)
[15:04:14] <wikibugs>	 (03PS1) 10Mforns: Add link path to router-link [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398854 (https://phabricator.wikimedia.org/T183149)
[15:05:10] <elukey>	 mforns: lol https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&orgId=1&var-server=db1107&var-network=eth0&from=1513594906656&to=1513605706656
[15:05:30] <elukey>	 better: https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&orgId=1&var-server=db1107&var-network=eth0&from=now-3h&to=now
[15:07:41] <elukey>	 also stopped eventlogging_sync on db1108 
[15:19:01] <wikibugs>	 (03PS2) 10Mforns: Add link path to router-link [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398854 (https://phabricator.wikimedia.org/T183149)
[15:26:48] <wikibugs>	 (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [analytics/aggregator] - 10https://gerrit.wikimedia.org/r/387606 (owner: 10Hashar)
[15:36:19] <elukey>	 mforns: we are also going to upgrade mariadb + kernel on db1107
[15:36:19] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Support multi DC statsv - https://phabricator.wikimedia.org/T179093#3845141 (10Ottomata) >  Regarding 1 [...] for the large traffic cross-dc communication to go over verified protocols like TCP/Kafka, as opposed to UDP/Statsd itself.  Indeed, but s...
[15:36:21] <elukey>	 since now everything is stopped
[15:36:30] <mforns>	 elukey, ok
[15:37:21] <wikibugs>	 (03PS3) 10Mforns: Add link path to router-link [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398854 (https://phabricator.wikimedia.org/T183149)
[15:40:14] <wikibugs>	 (03PS4) 10Mforns: Add link path to router-link [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398854 (https://phabricator.wikimedia.org/T183149)
[15:42:02] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Support multi DC statsv - https://phabricator.wikimedia.org/T179093#3845153 (10Ottomata) Or!  We could do like we do for all the other replicated topics, and use DC prefixes, and keep the edge DC kafka cluster routing map.  This would be like the a...
[15:44:16] <mforns>	 elukey, you use a mac right?
[15:47:03] <elukey>	 yep
[15:57:10] <elukey>	 mforns: all done! We can re-enable
[15:57:19] <elukey>	 I am going to re-enabled eventlogging_sync on db1108 first
[15:57:20] <mforns>	 elukey, awesome :]
[15:57:25] <mforns>	 k
[15:57:26] <elukey>	 then the mysql consumers
[16:07:40] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Review the alert message about adblocker preventing AQS requests - https://phabricator.wikimedia.org/T182958#3839781 (10Milimetric) @Trizek-WMF I suggested some alternative language here, I think mentioning scripts is likely confusing for users....
[16:17:07] <wikibugs>	 10Analytics-Kanban, 10DBA, 10User-Elukey: Precautionary backup needed for the log database on db1107 before applying regular purging/sanitization - https://phabricator.wikimedia.org/T183123#3845287 (10jcrespo) Ready to close when @elukey is ready
[16:20:41] <elukey>	 mforns: ready to renable el?
[16:20:48] <mforns>	 elukey, yes
[16:21:35] <icinga-wm>	 RECOVERY - Check status of defined EventLogging jobs on eventlog1001 is OK: OK: All defined EventLogging jobs are runnning.
[16:21:59] <mforns>	 elukey, seems to work fine!
[16:22:08] <elukey>	 super!
[16:22:09] <elukey>	 \o/
[16:22:21] <joal>	 Thanks guys :)
[16:22:39] <wikibugs>	 10Analytics-Kanban, 10DBA, 10User-Elukey: Precautionary backup needed for the log database on db1107 before applying regular purging/sanitization - https://phabricator.wikimedia.org/T183123#3845300 (10elukey) 05Open>03Resolved Everything looks good, thank a lot!
[16:22:45] <wikibugs>	 10Analytics, 10DBA, 10Patch-For-Review, 10User-Elukey: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#3845302 (10elukey)
[16:23:30] <AndyRussG>	 hi all! just a heads-up, Druid seems to be stumbling on some groupby queries (Pivot splits) maybe from overload or something? For example pageviews hourly, latest 1 day, try grouping/splitting on project or Ua Device Family
[16:24:03] <AndyRussG>	 Getting 500 internal server errors in the background call from Pivot, and also an error when running the same queries via pydruid from Jupyter
[16:24:19] <joal>	 Hi AndyRussG - groupby is known to be slow in Druid
[16:24:20] <joal>	 An
[16:24:21] <AndyRussG>	 Not a blocker for me for anything, just thought I'd mention, in case it's not expected ;)
[16:24:34] <AndyRussG>	 joal: ah hmm ok
[16:24:39] <joal>	 AndyRussG: If you can, use topN and then select the few ones you're interested in
[16:24:45] <joal>	 It'll be fast
[16:24:49] <wikibugs>	 10Analytics, 10Pageviews-API: Wikistats Bug : wrong data in Top viewed articles (about frwiki) - https://phabricator.wikimedia.org/T182954#3845317 (10fdans)
[16:25:04] <elukey>	 very nice that https://grafana.wikimedia.org/dashboard/db/prometheus-druid?orgId=1 shows those slow queries joal !
[16:25:26] <AndyRussG>	 joal: ah ok gotcha... yeah these are columns that probably have a pretty big range.... thanks much!!! :)
[16:25:33] <joal>	 Hooooo :) Awesome elukey :)
[16:26:11] <joal>	 np AndyRussG - This method (topN + multiple timeseries requests) is the strategy used by piv :)
[16:26:17] <joal>	 pivot sorry
[16:27:14] <AndyRussG>	 hmmm
[16:28:53] <AndyRussG>	 joal: you mean maybe to list the available values on the drop-downs in the Filter area? Just dragging a field into the split area for the 1-day period causes a "timeout" message in the UI, though in the network tab it's a 500
[16:29:39] <joal>	 AndyRussG: Can you send me a link wioth that query?
[16:29:44] <wikibugs>	 10Analytics, 10Analytics-Wikistats: [Wikistats2] The detail page for tops metrics does not indicate time range - https://phabricator.wikimedia.org/T182990#3845327 (10fdans)
[16:29:44] <AndyRussG>	 yurp!
[16:29:46] <wikibugs>	 10Analytics, 10Pageviews-API: Wikistats Bug : wrong data in Top viewed articles (about frwiki) - https://phabricator.wikimedia.org/T182954#3845329 (10fdans)
[16:31:38] <wikibugs>	 10Analytics, 10Pageviews-API: Filter top pages by namespace/category - https://phabricator.wikimedia.org/T182975#3845335 (10jberkel)
[16:31:53] <AndyRussG>	 joal: https://goo.gl/FxbyQb
[16:32:05] <AndyRussG>	 again, not a blocker for any stuff I'm working on just now...
[16:32:40] <wikibugs>	 10Analytics, 10Analytics-Wikimetrics: Layout bug in Safari for detail page - https://phabricator.wikimedia.org/T182821#3845336 (10fdans) Working good on Safari 11!
[16:32:57] <joal>	 AndyRussG: I think it's because the UA-Device-Family is big - that's weird though, this should work
[16:33:33] <wikibugs>	 10Analytics, 10Analytics-Wikimetrics: Layout bug in Safari for detail page - https://phabricator.wikimedia.org/T182821#3845338 (10fdans) 05Open>03Resolved a:03fdans
[16:35:15] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: wikistats rendering bug - https://phabricator.wikimedia.org/T182817#3845342 (10fdans) a:03Milimetric
[16:36:50] <wikibugs>	 10Analytics: Whitelist analytics.wikimedia.org and stats.wikimedia.org in add blockers - https://phabricator.wikimedia.org/T182816#3835518 (10fdans) We've tried this before, we'll try again!
[16:36:56] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Support multi DC statsv - https://phabricator.wikimedia.org/T179093#3845346 (10Krinkle) >>! In T179093#3845141, @Ottomata wrote: >> Regarding 1 [...] for the large traffic cross-dc communication to go over verified protocols like TCP/Kafka, as oppo...
[16:37:10] <AndyRussG>	 joal: yeah... Same happens on the project column btw, but they both work if you trim it down to a 1-hour window
[16:37:20] <wikibugs>	 10Analytics, 10Analytics-Wikimetrics: selecting wikisource.org? - https://phabricator.wikimedia.org/T183154#3845347 (10Nuria)
[16:37:53] <AndyRussG>	 tried only groupby in Jupyter, I'll check out the topn option :)
[16:37:53] <wikibugs>	 10Analytics-Kanban: Remove request for font.googleapis.com from analytics.wikimedia.org - https://phabricator.wikimedia.org/T182804#3845360 (10fdans) a:03Milimetric
[16:38:18] <wikibugs>	 10Analytics: Remove request for font.googleapis.com from analytics.wikimedia.org - https://phabricator.wikimedia.org/T182804#3845362 (10Milimetric)
[16:38:44] <AndyRussG>	 yeah did seem like something that users might wish to do and expect to work, not too infrequently
[16:38:53] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Sort "Top Viewed Articles" by views, rather than alphabetically - https://phabricator.wikimedia.org/T182757#3845377 (10fdans) 05Open>03Resolved a:03fdans
[16:40:42] <joal>	 Thanks for pinging us AndyRussG - It's unexpected and probably not "groupby" related, but we'll look into it !
[16:42:28] <joal>	 elukey: I think we have something with Druid
[16:42:58] <wikibugs>	 10Analytics, 10Analytics-Wikimetrics: selecting wikisource.org? - https://phabricator.wikimedia.org/T183154#3845347 (10fdans) You can type Sources -> wikisource.org, but you own't find it by default because there's a result limit.  Let's not have a limit on the wikiselector for families.
[16:43:49] <wikibugs>	 10Analytics, 10Analytics-Wikistats: When searching for a project language, display a full list of languages - https://phabricator.wikimedia.org/T182960#3845388 (10fdans)
[16:43:51] <wikibugs>	 10Analytics, 10Analytics-Wikimetrics: selecting wikisource.org? - https://phabricator.wikimedia.org/T183154#3845390 (10fdans)
[16:44:04] <elukey>	 joal: --verbose
[16:44:28] <joal>	 Druid is very slow and seems unable to answer queries that previsouly worked
[16:44:37] <wikibugs>	 10Analytics, 10Analytics-Wikistats: When searching for a project language, display a full list of languages - https://phabricator.wikimedia.org/T182960#3839812 (10fdans)
[16:45:07] <joal>	 elukey: --^
[16:45:14] <elukey>	 druid analytics right?
[16:45:33] <joal>	 correct elukey 
[16:45:37] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Enable more accurate smaps based RSS tracking by yarn nodemanager - https://phabricator.wikimedia.org/T182276#3845395 (10EBernhardson) Thanks! I'll try this out this week and see how things go.
[16:46:52] <joal>	 elukey: Have we done anything on druid lately?
[16:47:03] <elukey>	 joal: one thing to bare in mind is that on analytics druid we have only one broker available for pivot (druid1001)
[16:47:06] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10hardware-requests: eqiad: (8) Hadoop expansion - FY 2017 / 2018 - https://phabricator.wikimedia.org/T182628#3845408 (10Milimetric)
[16:47:41] <joal>	 Funny thing elukey: hole in realtime processed-events betwewen 14:00 and 14:15 :(
[16:47:52] * elukey cries in a corner
[16:48:48] <joal>	 elukey: I wonder if it wouldn't be related to some webrequest dashboard open and refreshing very often
[16:49:20] <joal>	 elukey: possible related to broker as well, I don't know :(
[16:50:11] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Support multi DC statsv - https://phabricator.wikimedia.org/T179093#3845424 (10Ottomata) > Compared to (cross-dc write of kafka, only) Ah yeah, that is this:  > Aside from that, in this case, running active/active statsv varnishkafka producers & st...
[16:51:21] <elukey>	 joal: can we try the same query that appears to be slow from pivot on druid1002/3 ?
[16:51:25] <elukey>	 i am also checking https://grafana.wikimedia.org/dashboard/db/prometheus-druid?orgId=1&from=now-7d&to=now
[16:52:09] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Review the alert message about adblocker preventing AQS requests - https://phabricator.wikimedia.org/T182958#3845427 (10Trizek-WMF) >>! In T182958#3845262, @Milimetric wrote: > @Trizek-WMF I suggested some alternative language here, I think menti...
[16:52:10] <joal>	 elukey: I can try queries, just need to find them :)
[16:52:20] <joal>	 elukey: picot is bhorium, right?
[16:52:27] <joal>	 s/picot/pivot
[16:52:35] <elukey>	 nope, thorium
[16:52:38] <mforns>	 elukey, EL seems fine still, catching up on insertions! There are some duplicate entries... don't know exactly why: it can be because of small kafka offset differences
[16:52:40] <joal>	 Arf
[16:52:47] <mforns>	 this has happened before and I think we're fine
[16:53:53] <elukey>	 mforns: yeah :(
[16:54:39] <joal>	 elukey: Looks like pivot is scanning druid multiple times per minute :)
[16:54:49] <mforns>	 elukey, I guess when we stopped the consumer, because of extreme mysql conditions, the consumer blocked and was not able to sync offsets with kafka
[16:55:05] <joal>	 elukey: however, no druid query in pivot logs - looking for broker logs then
[16:55:12] <mforns>	 we might also have lost some buffered data? maybe? but very few I'd say
[16:55:46] <elukey>	 mforns: well I'd expect the consumer not to commit anything until it pushes it mysql no?
[16:56:13] <mforns>	 elukey, mhhh.... no, because there are biggish buffers
[16:56:23] <mforns>	 I think it commits every 2 seconds or so
[16:57:09] <elukey>	 mforns: at this point I guess that it uses an error topic or something similar?
[16:57:36] <mforns>	 elukey, when?
[16:59:37] <mforns>	 I don't think small differences in kafka offsets are a problem, we might have dropped a couple events because of that, but very few, like orders of magntitude below regular event input
[17:00:10] <mforns>	 I guess the only problem is EventLogging consumer buffers, when we stopped the consumer, they were lost I guess...
[17:00:37] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10hardware-requests: eqiad: (8) Hadoop expansion - FY 2017 / 2018 - https://phabricator.wikimedia.org/T182628#3845436 (10faidon) p:05Triage>03High
[17:00:51] <mforns>	 I can check in the db for holes when the insertion rate has stabilized
[17:01:00] <elukey>	 mforns: ack!
[17:01:24] <elukey>	 mforns: I thought that events in the buffer not pushed to mysql would have gone to a kafka topic error
[17:01:47] <mforns>	 elukey, oh! that would be a good idea :] does EL do that?
[17:02:03] <AndyRussG>	 joal: thanks likewise :) o/
[17:03:02] <joal>	 elukey: I can't say what, but something has changed lately in Druid
[17:03:46] <joal>	 elukey: a Dashboard I was running with superset doesn't work anymore (timeouts)
[17:04:06] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Support multi DC statsv - https://phabricator.wikimedia.org/T179093#3845440 (10Krinkle) >>! In T179093#3845424, @Ottomata wrote: > More thoughts about ^:  in my conversation with @bblack, I had originally wanted to more simply use cache traffic rou...
[17:05:47] <joal>	 milimetric: another question: Do we cancel the meeting with Erik tonight, or are there anything I forget and we should discuss?
[17:10:48] <elukey>	 joal: so it should be related to broker/historicals right?
[17:10:50] <joal>	 elukey: timeout errors in druid broker :(
[17:13:52] <elukey>	 so  query=GroupByQuery{dataSource='pageviews
[17:13:53] <elukey>	 -hourly', querySegmentSpec=LegacySegmentSpec{intervals=[2017-12-11T00:00:00.000Z/2017-12-18T17:03:08.000Z]}
[17:13:57] <elukey>	 triggered a timeout
[17:14:03] <joal>	 yup
[17:14:15] <joal>	 I'm surprised about the fact it's a groupby 
[17:14:34] <elukey>	 the fault seems to be org.jboss.netty.handler.timeout.ReadTimeoutException
[17:15:02] <elukey>	 and I guess, checking from the latency metrics, that we have 10s no
[17:15:04] <elukey>	 *now
[17:15:15] <joal>	 elukey: I looked at data-size (pageview-hourly segments for 2017-12-17+18): less than 4Gb - This should be super easy
[17:18:36] <elukey>	 well not if not in cache right ?
[17:19:21] <joal>	 elukey: druid would have to read the segments, but as said - 4G, and the disks are SSDs, timeout doesn't sound right
[17:19:50] <joal>	 elukey: I don't know how, but it seems druid swaps
[17:20:05] <joal>	 elukey: see mem on druid1001
[17:20:11] <joal>	 free sorry
[17:20:38] <elukey>	 joal: do you remember that I mentioned the use of a huge heap vs few cached memory for the os ?
[17:20:41] <elukey>	 :D
[17:20:45] <joal>	 :D
[17:21:16] <joal>	 elukey: swap used on both druid1001 and druid1003
[17:22:11] <joal>	 elukey: And given the cache-size used from the metrics you sent, we better release most of that memory to the system
[17:23:34] <elukey>	 joal: we'd probably need to make some calculations and/or review the Xms/Xmx settings (probably only the former)
[17:23:46] <joal>	 elukey: agreed
[17:24:05] <joal>	 elukey: This still seems weird - things that were working yesterday don't today
[17:24:27] <joal>	 I wonder if there are new regular usage we dont know of
[17:25:26] <elukey>	 it could be yes
[17:27:25] <elukey>	 it is also weird that I can't find those 10s of read timeout anywhere
[17:30:33] <joal>	 elukey: Ok, I suppose we're gonna talk about that first thing tommorow
[17:31:30] <elukey>	 yep!
[17:31:37] <joal>	 k
[17:31:52] <joal>	 Gone for diner a-team, back after
[17:42:52] <wikibugs>	 (03PS13) 10Fdans: Add pageview by country oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/394062 (https://phabricator.wikimedia.org/T181521)
[18:18:12] <AndyRussG>	 hey all, got another quick question here... Where could I see the code or read some cannonical doc about the contents of the 'project' field in Pageview hourly? https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly
[18:18:25] <AndyRussG>	 Perhaps somewhere in the wmf config repo?
[18:19:10] <AndyRussG>	 That column seems is also in the pageview_info map in Webrequest https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Webrequest
[18:20:27] <AndyRussG>	 Just wishing to find specifically the cannonical place we set up which sites would take a language prefix there and which wouldn't
[18:20:54] <AndyRussG>	 I gues from an analytics perspective it comes directly from the URL somehow?
[18:21:13] <AndyRussG>	 (and then the for the real setup I should check with operaitons)?
[18:29:01] <joal>	 AndyRussG: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/PageviewDefinition.java#L271
[18:29:27] <joal>	 AndyRussG: We compute the project from URLs indeed
[18:30:02] <AndyRussG>	 joal: cool, thanks so much!!! :D
[18:30:24] <joal>	 AndyRussG: And the project field in pageview_hourly is the exact same as pageview_info['project'] in webrequest (see https://github.com/wikimedia/analytics-refinery/blob/master/oozie/pageview/hourly/pageview_hourly.hql#L34)
[18:31:08] <joal>	 AndyRussG: The thing to keep in mind when using this field in webrequest table is that it is corrextly populated on for rows having is_pageview = TRUE
[18:31:32] <joal>	 A value may exist when is_pageview = false, but it definitely could be wrong
[18:33:45] <AndyRussG>	 joal: yeah really looking at pageviews
[18:35:09] <joal>	 AndyRussG: If you're after project-normalization data in webrequest, normalized_host (see https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Webrequest.java#L245  and https://github.com/wikimedia/analytics-refinery/blob/master/oozie/webrequest/load/refine_webrequest.hql#L103)
[18:39:18] <AndyRussG>	 joal: nice...! Yeah, webrequest.normalized_host != pageview_hourly.project
[18:39:26] <joal>	 correct AndyRussG :)
[18:41:47] * elukey off! 
[19:22:33] <AndyRussG>	 elukey: joal: yeah Pivot now borking on simple queries, here's Pageviews Hourly for the last week, no filters or splits: https://goo.gl/W3fNEZ
[19:23:29] <joal>	 AndyRussG: elukey is gone for tonight, we plan to take care of that tomorrow :)
[19:24:26] <AndyRussG>	 joal: K, thanks so much... Yea just thought I'd mention in case it's useful..... :)
[19:27:06] <joal>	 Thanks AndyRussG :)
[21:04:18] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Link to 'more info' doesn't always work - https://phabricator.wikimedia.org/T183188#3846199 (10Erik_Zachte)
[21:05:49] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Display of radio buttons in Wikistats 2 is somewhat confusing - https://phabricator.wikimedia.org/T183185#3846212 (10Erik_Zachte) a:03Milimetric
[21:06:23] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Make the colors used the line charts in Wikistats 2 more easy to recognize. - https://phabricator.wikimedia.org/T183184#3846218 (10Erik_Zachte) a:03Milimetric
[21:06:55] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Present Wikistats 2 charts for the period selected by the user. - https://phabricator.wikimedia.org/T183183#3846222 (10Erik_Zachte) a:03Milimetric
[21:08:09] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Consistently preserve settings when a user switches to a new metric (especially on the same page). - https://phabricator.wikimedia.org/T183181#3846228 (10Erik_Zachte) a:03Milimetric
[21:09:01] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Consistently preserve settings when a user switches to a new metric (especially on the same page). - https://phabricator.wikimedia.org/T183181#3846232 (10Erik_Zachte) @Catrope sorry, I added subscribers and project
[21:18:46] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Please add download option 'as csv file' to Wikistats 2 - https://phabricator.wikimedia.org/T183192#3846270 (10Erik_Zachte)
[22:20:47] <wikibugs>	 (03PS1) 10Ottomata: Add _REFINE_FAILED failure flag and skip refinement if it exists [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/399105
[22:22:02] <wikibugs>	 (03PS2) 10Ottomata: Add _REFINE_FAILED failure flag and skip refinement if it exists [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/399105
[23:33:41] <wikibugs>	 10Analytics, 10Analytics-Wikistats: roadmap of migration to Wikistats 2 - https://phabricator.wikimedia.org/T183180#3846772 (10Aklapper) Assuming this is about #analytics-wikistats