[00:01:46] (PS1) Catrope: Add a graph tracking how many people have enabled the Flow beta feature [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/264698 [00:07:10] (PS1) Catrope: Add a graph tracking how many people have enabled the cross-wiki notifications beta feature [analytics/limn-ee-data] - https://gerrit.wikimedia.org/r/264700 [00:09:09] (PS2) Catrope: Add a graph tracking how many people have enabled the Flow beta feature [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/264698 [00:24:48] (PS3) Catrope: Add a graph tracking how many people have enabled the Flow beta feature [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/264698 (https://phabricator.wikimedia.org/T114111) [00:25:13] (PS4) Catrope: Add a graph tracking how many people have enabled the Flow beta feature [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/264698 (https://phabricator.wikimedia.org/T114111) [01:07:17] !log restarted eventlogging again. A single raw client side processor consumer seemed stuck (according to burrow). seeing offset commit errors in logs. [01:07:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [01:23:18] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [30.0] [01:39:59] PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [10.0] [01:40:18] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 20.00% above the threshold [20.0] [06:57:53] halfak: just wanted to point out that your figure was standing at the presenter spot during the wikimedia 15 yrs birthday party [06:58:35] * elukey woke up at 4 AM this morning, hello jet lag! [07:05:11] elukey: awww [07:05:32] * madhuvishy should be going to sleep [07:18:03] madhuvishy: o/ [07:18:30] \o [07:53:52] Analytics-Tech-community-metrics, DevRel-January-2016: Make GrimoireLib display *one* consistent name for one user, plus the *current* affiliation of a user - https://phabricator.wikimedia.org/T118169#1940617 (Lcanasdiaz) It is not working yet and it should be!. I'm having a look .. [09:22:20] Hi jetlagged-elukey :) [09:23:06] hello joal! [09:23:38] I am doing https://phabricator.wikimedia.org/tag/engineering_onboarding_template/ [09:23:55] seems a very good initiative [09:24:06] (it is missing some complete tasks) [09:24:38] Cool elukey ! [09:24:57] I've not done it myself /*shameful*/ [09:25:25] it is super new, I am the pilot :D [09:25:54] Ah, right :) [09:25:59] * joal feels better [13:04:50] Analytics-Kanban, DBA, Patch-For-Review: EL replication having issues since at least January 11th - https://phabricator.wikimedia.org/T123634#1940916 (jcrespo) "CREATE TEMPORARY TABLE SELECT" is not recommended (any kind of DDL + SELECTing/going over lots of rows) because it will create a huge amount o... [14:49:45] !log restarting eventlogging to un-blacklist MobileWebSectionUsage [14:49:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [14:54:28] (CR) Erik Zachte: "Patch no longer relevant due to rework of that part of the code" [analytics/wikistats] - https://gerrit.wikimedia.org/r/92056 (owner: Nemo bis) [14:54:48] (Abandoned) Erik Zachte: Move download limiter to proper place and comment it as it's only a test [analytics/wikistats] - https://gerrit.wikimedia.org/r/92056 (owner: Nemo bis) [14:58:59] Analytics-Tech-community-metrics: top-contributors.html displays comma as "Location" when a person has more than one affiliation - https://phabricator.wikimedia.org/T123926#1941104 (Aklapper) NEW [15:09:08] PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 27.27% of data under the critical threshold [10.0] [15:21:39] Analytics-Tech-community-metrics: "Last 30 days" stats for specific mailing list display an account as one list item per username character - https://phabricator.wikimedia.org/T123927#1941124 (Aklapper) NEW [15:22:22] Analytics-Tech-community-metrics: "Last 30 days" stats for specific mailing list display an account as one list item per username character - https://phabricator.wikimedia.org/T123927#1941124 (Aklapper) {F3239715} [15:29:27] Analytics-Tech-community-metrics: Mailing lists recently added to korma do not have "Top senders" data created (JSON file is 404) - https://phabricator.wikimedia.org/T123929#1941149 (Aklapper) NEW [15:34:43] Analytics, Analytics-Wikistats, DevRel-January-2016: Clean the code review queue of analytics/wikistats - https://phabricator.wikimedia.org/T113695#1941181 (ezachte) @Nemo_bis says this may have to wait till Feb in the meantime I can look further into '[Full dump analysis] Reduce edits_only and revert... [15:37:22] Analytics, Analytics-Wikistats, DevRel-January-2016: Clean the code review queue of analytics/wikistats - https://phabricator.wikimedia.org/T113695#1941190 (ezachte) closed https://gerrit.wikimedia.org/r/#/c/92056/ -> 4/5 open [15:50:48] PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [10.0] [15:52:41] (PS3) Mforns: Add split-by-os argument to AppSessionMetrics job [analytics/refinery/source] - https://gerrit.wikimedia.org/r/264297 (https://phabricator.wikimedia.org/T117615) [15:52:59] (CR) Mforns: [C: -1] "Still WIP" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/264297 (https://phabricator.wikimedia.org/T117615) (owner: Mforns) [15:55:08] PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [10.0] [15:58:29] ottomata: Sorry to bother, are you restarting the hell (the EL) to solve that --^ ? [15:59:38] hi! [15:59:38] PROBLEM - Overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [10.0] [15:59:41] haha [15:59:45] man, joal i'm not sure [15:59:50] i'm not sure what the prob is with that [15:59:58] the logs all look fine [16:00:02] and events are rolling into mysql [16:00:08] just spikey like [16:00:17] but, my restart a while ago did seem to cause this. [16:00:23] or at least, it started with that [16:01:31] ottomata, joal, I also took a look (very quick) and everything seemd fine to me [16:08:50] aye ok [16:08:54] thanks mforns [16:13:22] ok cool, thx guys :D [16:15:05] mforns: I see you working on AppSessionMetric - since it's stillflagged as WIP, let me know if you want me to review :) [16:15:24] ottomata: I also see you working on the new hive server, thx mate ! [16:16:35] ja going to try it in labs, make sure i wont break things [16:16:40] bout time i did that, thanks for your bump :) [16:17:04] np, it'll prevent us to have to restart stuff :) [16:17:32] ottomata: tomorrow is mobile stop day isn't it ? [16:18:52] mobile stop day? [16:18:53] oh! [16:18:53] yes [16:18:57] yes it is [16:19:13] k [16:19:26] Maybe not changing server tomorrow is better ;) [16:19:33] joal, WIP still yes, I'll let you know, thz [16:19:46] np mforns :) [16:20:05] mforns: if you have a minute, I'd like to talk again about entropy after standup :) [16:20:13] joal, of course :] [16:34:09] joal, yarn is down? [16:35:36] mforns: are you trying to look at yarn.wikimedia.org? [16:35:43] ottomata, yes [16:35:49] yeah, its been turned off [16:35:55] sorta security issue [16:35:56] oh! of course :P [16:36:00] back to ssh tunnel :/ [16:36:00] I remember now [16:36:03] ok [16:36:13] thx [16:51:54] Analytics-Kanban, DBA, Patch-For-Review: EL replication having issues since at least January 11th - https://phabricator.wikimedia.org/T123634#1934482 (Nuria) cc-ing @Neil_P._Quinn_WMF. Please take a look at @jcrespo (dba) recommendations regarding your selects [16:57:27] Analytics-Tech-community-metrics, DevRel-January-2016, Easy, Patch-For-Review: Entered text in Typeahead search field nearly not visible in Firefox 42: Fix the CSS - https://phabricator.wikimedia.org/T121101#1941383 (Aklapper) Open>Resolved Works now also on korma. Resolving. [17:01:15] ottomata: Stand up ? [17:02:29] Analytics-Kanban: Quaterly review 2016/01/22 (slides due on 19th) - https://phabricator.wikimedia.org/T120844#1941420 (Nuria) [17:02:31] Analytics-Kanban: Gather phabricator metrics for quaterly review [3 pts] - https://phabricator.wikimedia.org/T122333#1941419 (Nuria) Open>Resolved [17:04:42] Analytics-Kanban, DBA, Patch-For-Review: EL replication having issues since at least January 11th - https://phabricator.wikimedia.org/T123634#1941426 (Nuria) @jcrespo: can we get some help here to see what is wrong with replication? [17:06:15] ping ottomata standddupp? [17:06:39] OO [17:06:40] sorry [17:24:04] Analytics-Kanban: Add luca to analytics e-mail alias [1] - https://phabricator.wikimedia.org/T123941#1941470 (Nuria) NEW [17:24:16] Analytics-Kanban, DBA, Patch-For-Review: EL replication having issues since at least January 11th - https://phabricator.wikimedia.org/T123634#1941481 (jcrespo) If you mean s1-s7 replication, I found Faidon's analysis to be quite accurate, plus I already gave some recommendations regarding queries to be... [17:25:00] Analytics-Kanban: Add luca to analytics e-mail alias [1] - https://phabricator.wikimedia.org/T123941#1941490 (Nuria) a:elukey [17:26:04] Analytics-Kanban: Burrow should be restarted automatically when config changes - https://phabricator.wikimedia.org/T123942#1941494 (Nuria) NEW a:elukey [17:29:20] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Create new Hive / Oozie server from old analytics Dell {hawk} - https://phabricator.wikimedia.org/T110090#1941506 (Ottomata) Testing out 'Stop Oozie, Hive, MySQL move to newly installed node, and puppetize.' in labs. Prod Steps are: 1. suspend a... [17:29:47] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Create new Hive / Oozie server from old analytics Dell {hawk} - https://phabricator.wikimedia.org/T110090#1941509 (Ottomata) [17:32:59] Analytics-Kanban: move vital signs to its own instance {kudu] - https://phabricator.wikimedia.org/T123944#1941520 (Nuria) NEW a:madhuvishy [17:34:28] Analytics-Kanban, DBA, Patch-For-Review: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1941535 (jcrespo) BTW, a small nitpick, we commonly use "outage" when we had unexpected loss of service; "scheduled mainte... [17:40:55] Analytics-Kanban: move vital signs to its own instance {kudu] [5] - https://phabricator.wikimedia.org/T123944#1941555 (Nuria) [17:54:05] Analytics-Kanban: Reorganize oozie jobs to not use mobile cache webrequest_source [13] - https://phabricator.wikimedia.org/T122651#1941606 (Nuria) [17:55:13] Analytics-Kanban: Reorganize oozie jobs to not use mobile cache webrequest_source [13] - https://phabricator.wikimedia.org/T122651#1909893 (Nuria) Be aware that after this change we need to document how to proceed for users doing ad hoc selects here: https://wikitech.wikimedia.org/wiki/Analytics/Data/Webreq... [17:55:54] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Create new Hive / Oozie server from old analytics Dell {hawk} [13 pts] - https://phabricator.wikimedia.org/T110090#1941620 (Ottomata) [17:58:22] Analytics-Kanban, DBA, Patch-For-Review: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1941634 (Nuria) Sorry, let's do this maintenace after we have taken a look at replication issues, right? [17:58:30] joal: do we have alarms for the cassandra latencies or was it just you looking through the graphs? [18:40:05] Analytics-Kanban, DBA, Patch-For-Review: EL replication having issues since at least January 11th - https://phabricator.wikimedia.org/T123634#1941772 (Ottomata) Oh! I didn't know of a check_eventlogging_lag.sh script. Cool! Thanks for looking into this Jaime. [18:42:02] Analytics: Send burrow lag statistics to statsd/graphite {hawk} - https://phabricator.wikimedia.org/T120852#1941774 (madhuvishy) a:madhuvishy [18:42:20] Analytics-Kanban: Reorganize oozie jobs to not use mobile cache webrequest_source {hawk} [13 pts] - https://phabricator.wikimedia.org/T122651#1941779 (madhuvishy) [18:43:09] Analytics-Kanban: move vital signs to its own instance {crow} [5 pts] - https://phabricator.wikimedia.org/T123944#1941780 (madhuvishy) [19:26:16] halfak: Are Schema:PageContentSaveComplete and Schema:NewEditorEdit still necessary, given we have Schema:Edit? It'd be nice to consolidate and lighten the load. However Schema:Edit isn't running on 100% of edits… [19:29:15] Analytics-EventLogging: Replace EventLogging with Confluent Platform - https://phabricator.wikimedia.org/T102082#1941829 (Ottomata) Open>declined EventBus was done instead. [19:37:42] Analytics: Remove comScore from reportcard - https://phabricator.wikimedia.org/T123952#1941851 (kevinator) NEW [19:39:52] Analytics: Remove comScore from reportcard - https://phabricator.wikimedia.org/T123952#1941861 (kevinator) Open>Resolved a:kevinator this was already done... I'm logging this for record keeping. Here are the changes: https://github.com/wikimedia/analytics-reportcard-data/commit/9a9bf846f8771659303d... [20:00:21] Analytics, Analytics-Cluster, EventBus: Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters. - https://phabricator.wikimedia.org/T123954#1941882 (Ottomata) NEW a:Ottomata [20:07:51] Analytics: Look through wikimetrics/scripts and clean them up {dove} - https://phabricator.wikimedia.org/T123956#1941904 (madhuvishy) NEW [20:11:09] Analytics, Editing-Analysis, Editing-Department: Consider scrapping Schema:PageContentSaveComplete and Schema:NewEditorEdit, given we have Schema:Edit - https://phabricator.wikimedia.org/T123958#1941922 (Jdforrester-WMF) NEW [20:23:50] Hey elukey, sorry I missed your previous message [20:24:01] We do not have alarms, it was just me looking [20:55:27] Analytics, Analytics-Cluster, EventBus: Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters. - https://phabricator.wikimedia.org/T123954#1942022 (mobrovac) So we are basically aiming at a master-master Kakfa set-up? Regardless of different queues for the two DCs, the tr... [20:55:53] Analytics, Analytics-Cluster, EventBus, Services, operations: Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters. - https://phabricator.wikimedia.org/T123954#1942024 (mobrovac) [21:06:45] Analytics, Analytics-Cluster, EventBus, Services, operations: Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters. - https://phabricator.wikimedia.org/T123954#1942098 (Ottomata) > RESTBase is stateless per se, but it relies on Cassandra, for which a cross-DC repli... [21:09:14] Analytics, Analytics-Cluster, EventBus, Services, operations: Investigate proper set up for using Kafka MirrorMaker with new main Kafka clusters. - https://phabricator.wikimedia.org/T123954#1942101 (Ottomata) If Services doesn't need events from eqiad to show up in codfw (and vice versa), then... [21:25:13] madhuvishy: , still there? [21:26:25] joal: maybe? [21:29:08] ottomata: yeah [21:29:23] why does the AppSessionMetrics oozie job depend on webrequest_source mobile? [21:29:31] I would think that App metrics aren't in mobile web data [21:29:53] ottomata: ya they are in mobile app - hmmm that's in desktop? /me looks [21:30:04] thought so, since app uses API endpoing, not mobile web [21:30:07] buuut i dunno [21:30:13] maybe the API endpoint is on mobile web? [21:31:02] ottomata: "/webrequest_source=mobile" is what it's been using [21:31:11] may be it's been all wrong so far? :/ [21:31:12] right [21:31:16] but i dunno why [21:31:19] i mean, i could be wrong [21:31:21] but i would think so [21:31:31] hmmmmm [21:31:34] mobile web is mobile html served via caches [21:31:44] like mobile browser accessing wikipedia [21:31:57] right [21:31:58] app is via API endpoints, not sure where they live, but I thought text [21:32:09] i think so too [21:32:33] i forget, is that running in prod oozie? [21:33:14] why would it give numbers though [21:33:17] yeah [21:33:21] ja dunno [21:34:04] oook welp, i guess either way it'll change to text now [21:34:08] if it shoudln't be already [21:34:40] ottomata: yeah - need to confirm though - if all our numbers until now have been wrong [21:37:59] ottomata: [21:38:04] https://www.irccloud.com/pastebin/tfAuti9N/ [21:38:32] so it looks like a small percentage of the requests go to text [21:38:46] iiinteresting [21:39:05] in an hour about 5% [21:42:49] hm ok, so not too bad [21:42:54] so we are missing 5% of those numbers [21:43:03] ottomata: not necessarily [21:43:14] so, what needs changed in the scala code for this? [21:43:22] i don't see anyting in AppSessionMetrics [21:43:25] of those 102589 requests, only 5200 of them have x_analytics_map['wmfuuid' [21:43:27] oh [21:43:27] set [21:43:27] val webrequestMobilePath = params.webrequestBasePath + "/webrequest_source=mobile" [21:43:33] hmm ok [21:43:38] in the text source [21:44:07] so may be the app serves the pages from mobile cache, and only makes api requests - and our numbers so far have not been off? [21:44:13] not sure [21:44:17] but i hope so [21:44:53] almost all requests in the mobile partition with access_method mobile app have the uuid set [21:45:45] so we miss 0.003% i think [21:46:02] ottomata: ^ [21:46:28] * madhuvishy is relieved [21:47:37] ok phew [21:47:37] :) [21:47:42] i'm making this mobile change now [21:47:45] there are quite a few places! [21:47:50] yeah! [21:48:00] you also have to drop the datasets definition right? [21:48:25] yes [21:52:57] Analytics-Kanban: Reorganize oozie jobs to not use mobile cache webrequest_source {hawk} [13 pts] - https://phabricator.wikimedia.org/T122651#1942222 (Ottomata) a:Ottomata [21:53:07] (PS1) Ottomata: Use webrequest_source text for AppSessionMetrics, mobile is merging with text [analytics/refinery/source] - https://gerrit.wikimedia.org/r/264868 (https://phabricator.wikimedia.org/T122651) [21:56:06] (PS1) Ottomata: Remove 'mobile' webrequest_source, mobile is merging with text [analytics/refinery] - https://gerrit.wikimedia.org/r/264870 (https://phabricator.wikimedia.org/T122651) [21:56:46] madhuvishy: ottomata the mobile app makes requests to '.m.' subdomains which hit the mobile varnishes afair [21:56:56] (CR) Madhuvishy: "Until now, it looks like this job has been filtering for mobile app requests by using the x_analytics_map['wmfuuid'] is not null filter. W" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/264868 (https://phabricator.wikimedia.org/T122651) (owner: Ottomata) [21:57:11] this was an explicit decision made back in the day when zero rating depended on hitting the mobile subdomains [21:57:11] YuviPanda: yeah that would make sense [21:57:13] aye, makes sense YuviPanda, i just assumed it used an API endpoint on text [21:57:15] ah ok [21:57:16] cool [21:57:19] it's an API endpoint on mobile [21:57:23] however [21:57:24] k cool [21:57:27] if it encountered ssl errors [21:57:30] on mobile [21:57:34] it'll fallback to text [21:57:48] which might explain the small amount of app requests you see on text [21:57:54] I killed that fallback code yesterday... [21:57:59] the logic for that was [21:58:02] which explains the low but not zero mobile app request counts in the text partition [21:58:11] China sometimes blocked .m. https [21:58:14] but not text https [21:58:20] so we'd fall back between the two [21:58:23] to try catch one [21:58:28] but now we're all https [21:58:33] so this doesn't make sense anymore. [21:58:36] Analytics-Kanban, Patch-For-Review: Reorganize oozie jobs to not use mobile cache webrequest_source {hawk} [13 pts] - https://phabricator.wikimedia.org/T122651#1942236 (Ottomata) @jallemandou, just pushed changes to refinery and refinery-source for removing webrequest_source mobile. Please review :) http... [21:58:40] I submitted a patch to fix that yesterday [21:58:55] ah, nice, good to know [21:59:12] ok cool! [22:02:40] :) [22:43:07] (CR) Madhuvishy: "Might be missing other things but I noticed - https://github.com/wikimedia/analytics-refinery/blob/master/python/tests/test_refinery/test_" [analytics/refinery] - https://gerrit.wikimedia.org/r/264870 (https://phabricator.wikimedia.org/T122651) (owner: Ottomata)