[00:40:07] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 6 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Neil_P._Quinn_WMF) [02:16:48] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform (TEC2) - https://phabricator.wikimedia.org/T185233 (10CCicalese_WMF) [02:17:55] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Event Intake Service (TEC2)), and 2 others: Modern Event Platform: Stream Configuration Service - https://phabricator.wikimedia.org/T205319 (10CCicalese_WMF) [02:26:54] 10Analytics: Return "available time range" custom header with AQS responses - https://phabricator.wikimedia.org/T205949 (10Milimetric) [06:25:21] 10Analytics, 10Research: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207 (10Addshore) [06:25:25] 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10User-Addshore: Investigate June Unique devices increase of 170% for wikidata - https://phabricator.wikimedia.org/T199517 (10Addshore) 05stalled>03Resolved a:03Addshore Looks great! :) {F26272368} [06:48:47] joal is hiding! :P [07:01:37] Hi team - got disconnected yesterdayb [07:03:07] Hi addshore :) [07:03:26] D: [07:03:30] :D [07:03:31] HI [07:03:35] Indeed I was hiding :-P [07:04:00] how on earth did you know I said that though without being here :P silly irc logs [07:04:19] :) [07:07:07] I was going to ask if we could backfill the wikidata co-editors data even further [07:07:27] I believe I should have the technical ability to do that, I just need to remember using which tool [07:09:00] We can do that, needs a modification in spark I think [07:10:15] aah [07:10:28] I thought before I could do it from some UI somewhere, but perhaps not [07:10:55] addshore: Restarting an oozie job having already been run is feasible through UI [07:11:47] addshore: here for backfilling easier is actually to use a single mediawiki-history snapshot and compute the metric for all-time (instead for just new month) [07:13:50] o/ [07:14:45] hi elukey [07:21:32] elukey: I'm investigating the error from sqoop [07:21:40] elukey: This is super bizarre ! [07:25:21] elukey: log-file doesn't contain any error ! [07:26:19] Cron elukey: Ahhhh !! I know --> Same as with piwik yesterday - @hadoop-coordinator-2 - Not analytics 1003 - Everything is safe then [07:26:32] right [07:26:35] yeah :) [07:26:52] will try to shut those off [07:30:02] Thanks elukey - I'll be carefull on who sends alerts as well [07:33:36] I am sorry that we get those in labs, I am going to remove the mailto in puppet in case the realm is labs [07:33:43] so we shouldn't get more false positives [07:33:55] it is easy to confuse those emails [07:34:03] elukey: no big deal - Thanks for removing :) [07:37:28] addshore: Do we have a task about backfilling coeditors? [07:38:05] joal: not yet [07:38:23] addshore: I let you create one and work on colving it in the mean time ;) [07:45:25] (03PS1) 10QChris: Add .gitreview [analytics/wmde/Wiktionary/WD_percentUsageDashboard] - 10https://gerrit.wikimedia.org/r/463907 [07:45:27] (03CR) 10QChris: [V: 032 C: 032] Add .gitreview [analytics/wmde/Wiktionary/WD_percentUsageDashboard] - 10https://gerrit.wikimedia.org/r/463907 (owner: 10QChris) [08:04:19] currently trying to add prometheus metrics for the mariadb instances on analytics-meta (an1003) and matomo [11:18:28] heyaaa [11:21:32] Hi mforns [11:21:38] hey joal :] [11:53:17] * elukey hates piwik [11:58:55] joal: https://grafana.wikimedia.org/dashboard/db/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fanalytics&var-server=analytics1003&var-port=13306 [11:59:16] wow [11:59:48] and we also have matomo1001 [11:59:49] elukey: how the heck? [12:00:25] what do you mean? [12:00:41] DB traffic on tmp tables? [12:01:15] * joal shouldn't be afraid of spikes by default [12:01:19] sorry elukey [12:01:42] :D [12:02:40] PROBLEM - Number of segments reported as unavailable by the Druid Coordinators of the Analytics cluster on einsteinium is CRITICAL: 482 gt 200 https://grafana.wikimedia.org/dashboard/db/druid?refresh=1m&panelId=46&fullscreen&orgId=1&var-cluster=druid_analytics&var-druid_datasource=All [12:03:50] mforns: I'm assuming you're doing indexations :) [12:06:10] ReadingDepth [12:06:25] maybe the alarm should be tuned for a longer period of time ? [12:06:32] to avoid misfires like this one [12:06:54] joal, :S [12:06:59] ea [12:07:12] yea, heap space problems [12:07:24] mforns: on your machine or on druid? [12:08:16] addshore: https://grafana.wikimedia.org/dashboard/db/wikidata-co-editors?orgId=1&from=now-7y&to=now [12:08:54] addshore: Can you please send me a task so that I also document the script I used? [12:09:20] RECOVERY - Number of segments reported as unavailable by the Druid Coordinators of the Analytics cluster on einsteinium is OK: (C)200 gt (W)180 gt 4 https://grafana.wikimedia.org/dashboard/db/druid?refresh=1m&panelId=46&fullscreen&orgId=1&var-cluster=druid_analytics&var-druid_datasource=All [12:09:36] joal: yes! sorry, super backlogged today! [12:09:39] not written it yet [12:09:52] np addshore :) [12:11:15] elukey, it was stat1005... I forgot the --master yarn AGAIN, sorry, was that the trigger of the alarm? [12:12:05] mforns: nono the alarm trigger when there are segments to load, usually it means that either we are loading or a historical is having troubles [12:12:24] I'll fix the alarm to be more tolerant to these kind of loads [12:12:25] I see [13:17:17] 10Analytics-Kanban, 10User-Elukey: Upgrade Analytics infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T192642 (10elukey) [13:17:20] 10Analytics, 10Patch-For-Review: Upgrade bohrium (piwik/matomo) to Debian Stretch - https://phabricator.wikimedia.org/T202962 (10elukey) 05stalled>03Open [13:17:31] 10Analytics, 10Analytics-Kanban: Upgrade bohrium (piwik/matomo) to Debian Stretch - https://phabricator.wikimedia.org/T202962 (10elukey) [13:18:08] 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, 10Operations, 10Patch-For-Review, and 2 others: Prometheus resources in deployment-prep to create grafana graphs of EventLogging - https://phabricator.wikimedia.org/T204088 (10Ottomata) > Is the hope that https://gerrit.wikimedia.org/r/#/c/operations/puppe... [13:26:33] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to to stats, analytics-search-users, statistics-privatedata-users for Chelsy Xie - https://phabricator.wikimedia.org/T205736 (10herron) @chelsyx thanks for clarifying! I've updated the description to reflect this. Thi... [13:27:35] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to to stats, analytics-search-users, statistics-privatedata-users for Chelsy Xie - https://phabricator.wikimedia.org/T205736 (10Ottomata) This doesn't really escalate any access to data, as Chelsey already has analytics... [13:28:20] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to to stats, analytics-search-users, statistics-privatedata-users for Chelsy Xie - https://phabricator.wikimedia.org/T205736 (10elukey) Adding @Nuria to approve from our side :) [13:29:15] 10Analytics, 10Analytics-Kanban: [EventLoggingToDruid] Allow ingestion of simple-type arrays by converting them to strings - https://phabricator.wikimedia.org/T201873 (10mforns) 05Open>03Invalid After discussing with the team, it turns out Druid does allow ingestion of array types! I tested that it works f... [13:29:17] 10Analytics, 10Page-Issue-Warnings, 10Product-Analytics, 10Reading-analysis, 10Readers-Web-Backlog (Tracking): Ingest data from PageIssues EventLogging schema into Druid - https://phabricator.wikimedia.org/T202751 (10mforns) [13:33:25] 10Analytics, 10Page-Issue-Warnings, 10Product-Analytics, 10Reading-analysis, 10Readers-Web-Backlog (Tracking): Ingest data from PageIssues EventLogging schema into Druid - https://phabricator.wikimedia.org/T202751 (10mforns) @Tbayer As, in the end, there was no change needed to ingest array types, PageI... [13:34:39] 10Analytics, 10Analytics-Kanban, 10Page-Issue-Warnings, 10Product-Analytics, and 2 others: Ingest data from PageIssues EventLogging schema into Druid - https://phabricator.wikimedia.org/T202751 (10mforns) a:03mforns [13:35:09] 10Analytics, 10Analytics-Kanban: Add ability to bucketize integers as part of event ingestion - https://phabricator.wikimedia.org/T205641 (10mforns) a:03mforns [13:39:28] o/ [13:40:36] o/ [13:40:47] \o [13:41:11] ottomata: https://grafana.wikimedia.org/dashboard/db/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fanalytics&var-server=analytics1003&var-port=13306 [13:41:12] (03CR) 10Ottomata: [C: 031] Update python/refinery/utils/HdfsUtils [analytics/refinery] - 10https://gerrit.wikimedia.org/r/459780 (https://phabricator.wikimedia.org/T202489) (owner: 10Joal) [13:41:57] OH cool! [13:46:18] \o/ [13:54:45] (03CR) 10Ottomata: Add MediawikiXMLDumpsConverter spark job (038 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/463370 (https://phabricator.wikimedia.org/T202490) (owner: 10Joal) [13:58:33] all right I think that we are ready to move piwik to matomo110 [13:58:36] *matomo1001 [13:58:49] need to announce it first since it will require a little bit of downtime [14:00:28] (03CR) 10Ottomata: Add mediawiki-history-wikitext oozie job (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/463548 (https://phabricator.wikimedia.org/T202490) (owner: 10Joal) [14:00:31] (03CR) 10Ottomata: [C: 031] "One q/nit but +1" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/463548 (https://phabricator.wikimedia.org/T202490) (owner: 10Joal) [14:00:44] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Upgrade bohrium (piwik/matomo) to Debian Stretch - https://phabricator.wikimedia.org/T202962 (10elukey) matomo1001 is ready to take traffic! The switch procedure should be: 1) announce the downtime 2) put piwik on bohrium in read only mode (no inserts in... [14:24:50] morning - my back is killing me today, taking it easy [14:27:33] (03PS1) 10Fdans: Allow breakdown filtering in top metrics [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/463964 (https://phabricator.wikimedia.org/T205725) [14:29:44] milimetric mforns this change is fun with pageviews by country ^ :) [14:31:13] fdans, aha, what are the available breakdowns? access_method? [14:31:19] yep [14:31:28] yea, sounds promising [14:31:42] mforns: it's interesting to see which countries' pageview don't really go down when you filter with mobile app only [14:31:53] aha [14:36:48] I'm going to review this change, btw, fdans [14:38:05] heh, top anon editors is funny :) [14:38:08] thank youuuuu [14:39:27] milimetric: man, I had a beautiful function to deduplicate results in order to filter with checkboxes instead of radio buttons, but then I realized that, of course, values don't match exactly because we're only reported top n [14:39:32] reporting * [14:40:08] don't match exactly? [14:40:11] what you mean? [14:41:08] milimetric: the fact that an article doesn't show up in the top 1000 for mobile-app doesn't mean it has no pageviews on the mobile app [14:41:28] so the sum of its values on the top 1000s don't necessarily match total [14:41:53] 10Analytics, 10User-Elukey: Return to real time banner impressions in Druid - https://phabricator.wikimedia.org/T203669 (10elukey) @AndyRussG @Seddon ping :) [14:44:35] fdans: right, but isn't that also true when filtering by one breakdown value at a time [14:45:09] milimetric: but we're not giving false info when filtering by one at a time [14:45:17] 10Analytics, 10Analytics-Kanban: Ingest data into druid for readingDepth schema - https://phabricator.wikimedia.org/T205562 (10mforns) @Tbayer @Nuria I loaded 1 month (Sept 2018) of ReadingDepth to Druid as a test. You can see it in Turnilo: https://turnilo.wikimedia.org/#event_ReadingDepth/3/N4IgbglgzgrghgG... [14:45:34] milimetric: i can illustrate in the batcave if you want [14:45:50] fdans: sok, lemme finish this and we can talk after the basic review [14:47:49] 10Analytics, 10Analytics-Kanban: Ingest data into druid for readingDepth schema - https://phabricator.wikimedia.org/T205562 (10mforns) My take on this bucketization: Disadvantages: - buckets are less precise than percentiles - buckets are subject to event throughput variations and seasonality Advantages: - w... [14:51:06] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to to stats, analytics-search-users, statistics-privatedata-users for Chelsy Xie - https://phabricator.wikimedia.org/T205736 (10Nuria) Approved. [14:51:13] (03PS1) 10Mforns: Allow EventLoggingToDruid to bucketize integers into ranges [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/463968 (https://phabricator.wikimedia.org/T205641) [14:51:19] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install an-coord1001/wmf7621 - https://phabricator.wikimedia.org/T204970 (10elukey) [14:57:31] (03CR) 10Nuria: Allow EventLoggingToDruid to bucketize integers into ranges (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/463968 (https://phabricator.wikimedia.org/T205641) (owner: 10Mforns) [15:00:57] ping milimetric ottomata [15:01:57] standduppp ottomata [15:02:05] eek [15:02:06] coming [15:04:57] (03CR) 10Milimetric: [C: 04-1] "Works nicely, 15 comments (jk)" (038 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/463964 (https://phabricator.wikimedia.org/T205725) (owner: 10Fdans) [15:08:06] 10Analytics, 10ChangeProp, 10EventBus, 10Services (later), 10Wikimedia-Incident: ChangeProp logging KafkaConsumer is not connected - https://phabricator.wikimedia.org/T199444 (10mobrovac) The interesting part is that this doesn't happen for CP-JQ, only CP. [15:10:07] (03CR) 10Nuria: [C: 04-1] Allow EventLoggingToDruid to bucketize integers into ranges [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/463968 (https://phabricator.wikimedia.org/T205641) (owner: 10Mforns) [15:18:29] 10Analytics, 10ChangeProp, 10EventBus, 10Services (later), 10Wikimedia-Incident: ChangeProp logging KafkaConsumer is not connected - https://phabricator.wikimedia.org/T199444 (10Pchelolo) It does. Yesterday I've restarted JobQueue for that. [15:54:36] 10Analytics: Update wikimedia-history revision data with deleted field (and find it a new name?) - https://phabricator.wikimedia.org/T178587 (10Milimetric) Thoughts: when processing wmf_raw.mediawiki_revision, process rev_deleted like this: * make a new field revision_parts_masked, which is an array * for each... [16:03:44] * elukey off a bit earlier! [16:04:42] 10Analytics, 10Analytics-Kanban: heirloom-mailx fails trying to send out email from SWAP notebook - https://phabricator.wikimedia.org/T168103 (10Ottomata) What type of notebook are you using? Python? [16:15:54] 10Analytics, 10Analytics-Kanban: Issues with page deleted dates on data lake - https://phabricator.wikimedia.org/T190434 (10Milimetric) discussed: approach: try to find the correct Id, if we don't, use artificial id, and test until delete dates make sense and join well with revisions of deleted pages. [16:23:19] 10Analytics: Enhance mediawiki-history page reconstruction with best historical information possible - https://phabricator.wikimedia.org/T179692 (10Milimetric) For complex flows of events that we can't reconcile automatically with a somewhat high degree of certainty, we had an idea: describe the problem in a wa... [17:21:44] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install an-coord1001/wmf7621 - https://phabricator.wikimedia.org/T204970 (10elukey) a:03Cmjohnson [17:30:13] 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, 10Operations, 10Patch-For-Review, and 2 others: Prometheus resources in deployment-prep to create grafana graphs of EventLogging - https://phabricator.wikimedia.org/T204088 (10ovasileva) [17:32:37] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to to stats, analytics-search-users, statistics-privatedata-users for Chelsy Xie - https://phabricator.wikimedia.org/T205736 (10JKatzWMF) Approved. Thanks! [17:50:04] 10Analytics, 10Analytics-Kanban: heirloom-mailx fails trying to send out email from SWAP notebook - https://phabricator.wikimedia.org/T168103 (10Tbayer) >>! In T168103#4634303, @Ottomata wrote: > What type of notebook are you using? Python? Yes - but as mentioned, the same error happens when I try the comman... [17:50:56] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install an-coord1001/wmf7621 - https://phabricator.wikimedia.org/T204970 (10RobH) So i get the same error just attempting to boot into the BIOS. ``` iDRAC Settings: CBL0009: Backplane 1 connector A0 is not connected. CBL0009: Backplane... [17:59:54] 10Analytics, 10Analytics-Kanban: heirloom-mailx fails trying to send out email from SWAP notebook - https://phabricator.wikimedia.org/T168103 (10Ottomata) Yeah, still don't understand why that doesn't work. In the meantime, you could try sending using Python, this works for me in a Python notebook: ``` def s... [18:01:27] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install an-coord1001/wmf7621 - https://phabricator.wikimedia.org/T204970 (10Cmjohnson) THe disks are now being seen by the contorller, this server was the spare we borrowed a cable from to work on cloudvirt1023. Re-connected the cable and n... [18:03:09] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: setup/install an-coord1001/wmf7621 - https://phabricator.wikimedia.org/T204970 (10elukey) a:05Cmjohnson>03elukey [18:03:26] (an-coord1001 finally ready for os! \o/) [18:04:07] mforns: shall I merge your code change/ [18:04:08] ? [18:04:15] elukey, the puppet one? [18:04:22] yep! [18:04:27] sure! [18:04:32] :] [18:04:34] \o/ elukey ! [18:04:43] elukey: what wass it? [18:05:07] joal: missing cable, Chris solved it, the hard disks were not showing up :D [18:05:27] WAT ?? [18:05:48] in the cloud era we are the last ones to see these kind of problems :D [18:05:54] * joal actually knows how it feels to have missing cables for hard-drive connection [18:06:30] IIUC they used the cable for a test on another host because an-coord was a spare [18:06:37] but forgot to put it back :D [18:06:38] elukey: any big player handling bare-metal knows them as well :) [18:07:03] mforns: deployed on analytics1003! [18:07:14] elukey: I do that all the time -- Ah, what is that alert ? Oh, what wasd I doing ? [18:07:25] elukey, :D thanks! [18:07:30] joal: ah yes but only the ones working in the engine room, the others don't care! [18:07:37] true elukey [18:07:51] all right going afk again, have a nice evening! [18:08:45] Bye elukey [18:10:15] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: [EL sanitization] Store the old salt for 2 extra weeks - https://phabricator.wikimedia.org/T199900 (10mforns) @mpopov This task is done! From now on, the cryptographic salts used to hash EventLogging sensitive ids are going to b... [18:35:43] (03PS2) 10Mforns: Allow EventLoggingToDruid to bucketize integers into ranges [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/463968 (https://phabricator.wikimedia.org/T205641) [18:42:37] (03CR) 10Mforns: Allow EventLoggingToDruid to bucketize integers into ranges (035 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/463968 (https://phabricator.wikimedia.org/T205641) (owner: 10Mforns) [18:48:27] 10Analytics, 10Event Tools: Estimate the size of the "Event Organizer" community (and audience for Event Metrics) - https://phabricator.wikimedia.org/T206009 (10jmatazzoni) [19:26:07] (03CR) 10Nuria: [C: 032] Allow EventLoggingToDruid to bucketize integers into ranges (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/463968 (https://phabricator.wikimedia.org/T205641) (owner: 10Mforns) [19:35:34] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to to stats, analytics-search-users, statistics-privatedata-users for Chelsy Xie - https://phabricator.wikimedia.org/T205736 (10chelsyx) Thanks everyone! The patch above gave me access to analytics-search-users. Can I b... [20:04:26] 10Analytics, 10Beta-Cluster-Infrastructure, 10EventBus, 10MediaWiki-JobQueue: GlobalRename stuck again at Beta - https://phabricator.wikimedia.org/T194376 (10Pchelolo) 05Open>03Resolved a:03Pchelolo I believe that's not an issue any more? [20:10:50] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Event Intake Service (TEC2)), and 2 others: Modern Event Platform: Stream Configuration Service - https://phabricator.wikimedia.org/T205319 (10Pchelolo) Related: https://phabricator.wikimedia.org/T161027 [20:11:44] 10Analytics, 10Wikimedia-Stream, 10Patch-For-Review: Create /v2/schema/:schema_uri endpoint for eventstreams that proxies schemas from eventbus - https://phabricator.wikimedia.org/T160748 (10Pchelolo) So was this done or not after all? [20:15:39] ottomata: yt? one short question if you are [20:17:53] 10Analytics, 10EventBus, 10RESTBase, 10Services, 10RESTBase-release-1.0: RESTBase should honor wiki-wide deletion/suppression of users - https://phabricator.wikimedia.org/T120409 (10Pchelolo) [20:17:56] 10Analytics, 10EventBus, 10RESTBase, 10RESTBase-release-1.0, 10Services (next): Strip old metadata from old Parsoid content : mw:TimeUuid, user, comment - https://phabricator.wikimedia.org/T128525 (10Pchelolo) [20:18:02] nuria: ya [20:18:04] here [20:18:19] ottomata: I think /var/log/refinery/sqoop-mediawiki-private.log on analytics1003 needs a logrotate [20:18:26] ottomata: as it has data since march [20:18:35] ooo sounds like it! [20:18:51] ottomata: is that .. ahem.. puppet plug and play [20:18:52] ? [20:19:11] 10Analytics: /var/log/refinery/sqoop-mediawiki-private.log does not rotate - https://phabricator.wikimedia.org/T206020 (10Nuria) [20:20:04] nuria: i think so [20:20:21] depends on how the logging is set up in python (i htink?) but ya i think so [20:21:37] ottomata: it is a crun dumping to /var/log [20:21:41] *cron [20:23:03] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add ability to bucketize integers as part of event ingestion - https://phabricator.wikimedia.org/T205641 (10mforns) I also updated https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Schema_Guidelines [20:23:15] 10Analytics, 10EventBus, 10RESTBase, 10RESTBase-release-1.0, 10Services (next): Strip old metadata from old Parsoid content : mw:TimeUuid, user, comment - https://phabricator.wikimedia.org/T128525 (10Pchelolo) There's been 700 cases when the `If-Match` was not supplied over the last month and only... [20:25:46] 10Analytics, 10Analytics-Kanban: Ingest data into druid for readingDepth schema - https://phabricator.wikimedia.org/T205562 (10mforns) @Tbayer After some discussion in Analytics stan-up meeting, we found the order-of-magnitude notation quite uncomfortable. I applied some changes to the EL2Druid job to improve... [20:28:30] aye nuria then yeah [20:28:51] ottomata: let me see if other scripts set up these way have logrotates [20:32:02] ottomata: i think the only one with logrotate of the many logs we have on that dir is refine_eventlogging_analytics [20:32:25] ottomata: and download-project-namespace-map.manual [20:41:17] 10Analytics, 10EventBus, 10RESTBase, 10Services (next), 10goodfirstbug: Strip old metadata from old Parsoid content : mw:TimeUuid, user, comment - https://phabricator.wikimedia.org/T128525 (10Pchelolo) Tagging as a good onboarding bug as once the subtask is resolved, it will be easy to fix in code... [20:45:35] ottomata: mmm.. no, me no compredou how the logrotate is set up, may be we can talk about this tomorrow [20:50:46] mforns: yt? [20:50:52] nuria, yep [20:51:19] mforns: i think we might need to delete all the data from turnilo and re-reload: event_ReadingDepth [20:51:37] nuria, is it broken? [20:51:38] mforns: cause otherwise when doing filters in columns you will get old and new labels [20:52:07] nuria, yes, I know... I loaded 15 days with new labels and left the other 15 with old labels [20:52:13] so we're able to compare [20:52:23] but I can override the second half of the month... [20:52:51] mforns: I think it would be easier to see data with just one set of labels for the whole period loaded [20:52:59] ok [20:53:01] will do [20:53:29] running [20:54:32] mforns: supertahnks [20:57:36] mforns: let me know when it's done and I will take a look [20:57:41] k [20:57:49] will take a couple minutes [21:05:19] 10Analytics, 10Analytics-Wikimetrics: Wikimetrics docker build/test environment is broken - https://phabricator.wikimedia.org/T193780 (10sbassett) {F26280482} Not sure exactly what issues this ticket meant to address, but I was able to get the docker env working for the current master branch of wikimetrics wi... [21:10:13] 10Analytics, 10Analytics-Wikimetrics: Wikimetrics docker build/test environment is broken - https://phabricator.wikimedia.org/T193780 (10Nuria) Please do submit @sbassett [21:20:51] nuria, there were some errors, second half of the month has big throughput... will repeat in parts [21:20:58] mforns: k [21:30:25] (03PS1) 10SBassett: Update docker-compose.yml and queue_config.yaml [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/464059 (https://phabricator.wikimedia.org/T193780) [21:31:16] (03CR) 10jerkins-bot: [V: 04-1] Update docker-compose.yml and queue_config.yaml [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/464059 (https://phabricator.wikimedia.org/T193780) (owner: 10SBassett) [21:45:31] 10Analytics, 10Analytics-Wikimetrics, 10Patch-For-Review: Wikimetrics docker build/test environment is broken - https://phabricator.wikimedia.org/T193780 (10sbassett) Hmm, a handful of flake8 fails unrelated to my patch: https://integration.wikimedia.org/ci/job/tox-docker/4084/console. I could add the check... [21:58:37] 10Analytics, 10Analytics-Kanban: heirloom-mailx fails trying to send out email from SWAP notebook - https://phabricator.wikimedia.org/T168103 (10Tbayer) Cool! This works great for me. I tweaked it a bit to make the `from_email` and `to_email` parameters optional, autogenerating them based on the server name an... [22:01:48] nuria, it's loaded now [22:09:37] 10Analytics, 10Operations, 10Traffic, 10Services (blocked): Add Accept header to webrequest logs - https://phabricator.wikimedia.org/T170606 (10Pchelolo) We did enable the feature after all by looking at requests reaching #RESTBase, but that's not very convenient. Technically this is no more required. How... [22:12:50] 10Analytics, 10Analytics-Kanban: heirloom-mailx fails trying to send out email from SWAP notebook - https://phabricator.wikimedia.org/T168103 (10Tbayer) PS: This solves my own use case and I think that of some other Python users too. Personally I wouldn't mind closing this task, although the problem as stated... [22:15:27] 10Analytics, 10Analytics-Wikimetrics, 10Security-Reviews: security review of Wikimetrics {dove} - https://phabricator.wikimedia.org/T76782 (10sbassett) Docker fixes @ https://phabricator.wikimedia.org/T193780 (w/ https://gerrit.wikimedia.org/r/464059/), though flake8 tests failing, unrelated to my patch :/... [22:27:44] mforns: checking [22:28:54] mforns: filtering i think now makes more sense, now, another question, there are two measures: "count" and "event_count" [22:29:23] mforns: which * think* are teh same one so we probably only need one [22:29:30] nuria, yea, count is the aggregated row count, I think it's automatic from turnilo [22:29:46] event count is the actual number of events, pre-aggregation [22:30:05] event count is generated at ingestion time by eventloggingtodruid.scala [22:31:27] mforns: i think "count" from turnilo is also the number of events per selected dimensions on this case, right? [22:31:53] nuria, mmmm I can see different values for count and event count [22:32:12] mforns: ah yes, wait.. [22:32:26] event count is bigger than count [22:34:26] mforns: k, yeah, we can remove that with turnilo config [22:34:26] nuria, but I don't remember how EventLoggingToDruid aggregates data... [22:34:43] if count is the row count, we don't need that I think [22:34:50] mforns: ya, agreed [22:34:56] mforns: see histogram: https://bit.ly/2DPx9vB [22:36:33] histogram looks good :] [22:37:13] only thing I don't like is labels are not sortable alphabetically... [22:37:42] mforns: ya, i noticed that too, also probably binnings for performance need to be 0-100ms 100-200ms ...1sec-4sec , 4sec-10 sec but no need to do that now [22:38:13] mforns: data looks not so good [22:38:34] mforns: the data itself not the loading [22:38:44] what rational would be behind binnings? [22:38:56] why do you think the data looks not so good? [22:39:58] (03CR) 10Nuria: [V: 032 C: 032] "If you have tested this locally let's just merge, we have not touched this code in a while and it looks like flake8 has new rules" [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/464059 (https://phabricator.wikimedia.org/T193780) (owner: 10SBassett) [22:40:33] (03CR) 10jerkins-bot: [V: 04-1] Update docker-compose.yml and queue_config.yaml [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/464059 (https://phabricator.wikimedia.org/T193780) (owner: 10SBassett) [22:40:37] (03CR) 10Nuria: [V: 032 C: 032] "Ah, wait it will not be emerged until errors are corrected, i see." [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/464059 (https://phabricator.wikimedia.org/T193780) (owner: 10SBassett) [22:41:11] mforns: for perf binnings? [22:41:14] (03CR) 10jerkins-bot: [V: 04-1] Update docker-compose.yml and queue_config.yaml [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/464059 (https://phabricator.wikimedia.org/T193780) (owner: 10SBassett) [22:41:21] nuria, yes [22:41:30] mforns: because the 0-200ms range is normally something consider instantaneous [22:42:17] mforns: after there is laggy (up to 500ms) , and later teh main perceived impacts are after 1sec and more than 4 people just give up [22:42:46] ok, so the buckets need to be smaller for perf, makes sense [22:42:54] mforns: these are vcommon bounds and actually gilles is doing some hands-on research on these findings this quarter [22:43:00] the 100ms-1sec is too unprecise [22:43:19] I see [22:43:33] mforns: note this is from 2001: https://www.nngroup.com/articles/website-response-times/ [22:43:50] mforns: so these standards for perf are older as the times [22:44:01] k [22:46:10] mforns: but those will be easy enough to implement with your code , we can do so if someone requests it, see for example DOMIntercative event [22:46:12] https://bit.ly/2NZ8Hwz [22:46:33] mforns: the bulk of it (makes sense is <1 sec) but that does not tell you much [22:46:47] yea [22:48:00] mforns: but i do not think we need to do anything additional for now (let me know if you disagree), if we ingest navigation timing we will certainly need to provide more binning for data to be useful [22:48:12] hmm [22:50:16] mforns: yessir? [22:50:37] just thinking of a bucketing split that fits all... [22:50:54] maybe x10 is too big of a jump [22:52:21] mforns: for perf yes but only on the first 2 sec, after that it's all good [23:02:00] 10Analytics: Many client side errors on citation dat, significant percentages of data lost - https://phabricator.wikimedia.org/T206083 (10Nuria) [23:02:18] 10Analytics: Many client side errors on citation dat, significant percentages of data lost - https://phabricator.wikimedia.org/T206083 (10Nuria) These errors are happening most likely due to urls being too large. [23:06:12] nuria, how aout https://pastebin.com/66j01QZr [23:06:19] would that fit all cases? [23:06:44] kind of the currency-split [23:06:55] mforns: jajaja [23:07:19] mforns: i think we can probably do with 0-50, 50-100....rest looks good [23:10:30] 10Analytics: Many client side errors on citation dat, significant percentages of data lost - https://phabricator.wikimedia.org/T206083 (10bmansurov) For CitationUsagePageLoad we're getting about 450-800 events per second, which gives us 37,500 events per minute. At 200 errors per minute, we one error every 187.5... [23:10:35] nuria, good thing about that, is buckets without appearances will not show up in druid/turnilo [23:11:00] 10Analytics: Many client side errors on citation data, significant percentages of data lost - https://phabricator.wikimedia.org/T206083 (10Nuria) [23:11:15] so, when handling perf measurements, the bigger buckets will not show up, and viceversa [23:23:12] mforns: mm., no they will cause if there is 1 measure they will show up right? [23:23:24] nuria, yes [23:24:09] 10Analytics: Many client side errors on citation data, significant percentages of data lost - https://phabricator.wikimedia.org/T206083 (10Nuria) I see that the percentage you do not think is significant for the research to be done that is ultimately up to you, now there are other considerations about being good... [23:25:12] mforns: then I imagine they will probably show up, given how widespread our browser user base is [23:25:39] yea, you're right [23:26:08] well, going to log off! see ya tomorrow! [23:28:53] mforns: ciaoooo [23:32:21] 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, 10Operations, 10Patch-For-Review, and 2 others: Prometheus resources in deployment-prep to create grafana graphs of EventLogging - https://phabricator.wikimedia.org/T204088 (10Jdlrobson) @Ottomata still not seeing them.. does that mean https://gerrit.wikim...