[00:05:16] 10Analytics-Tech-community-metrics, 10Developer-Relations (Oct-Dec 2017): Find a way to publish DB index names, to allow anyone to construct more complex queries based on certain indeces - https://phabricator.wikimedia.org/T179330#3728396 (10Aklapper) [00:06:27] 10Analytics-Tech-community-metrics, 10Developer-Relations (Oct-Dec 2017): Find a way to publish DB field names, to allow anyone to construct more complex queries based on certain fields - https://phabricator.wikimedia.org/T179330#3720800 (10Aklapper) [00:06:58] 10Analytics-Tech-community-metrics, 10Developer-Relations (Oct-Dec 2017): Find a way to publish DB field names, to allow anyone to construct more complex queries based on certain fields - https://phabricator.wikimedia.org/T179330#3720800 (10Aklapper) 05Open>03Resolved I realized that now non-admins can als... [00:21:58] PROBLEM - Throughput of EventLogging EventError events on graphite1001 is CRITICAL: CRITICAL: 18.18% of data above the critical threshold [30.0] [01:13:08] RECOVERY - Throughput of EventLogging EventError events on graphite1001 is OK: OK: Less than 15.00% above the threshold [20.0] [08:21:13] morning! [08:21:28] Heya elukey :) [08:21:46] a-team: the traffic team would like to add more precise info to the X-cache header - https://gerrit.wikimedia.org/r/#/c/387817/ - it is fine for us? I'd say so but I'd like to triple check :) [08:22:04] (X-cache-status sorry) [08:22:20] hi a-team! o/ [08:22:25] Hi ema P) [08:23:11] * joal reads [08:26:09] we currently have X-Cache-Status in webrequest with the following possible values: hit,int,miss,pass,bug [08:26:44] the idea is to add to hit and int some context info explaining at which layer of the CDN they occurred [08:27:21] so the new values would be: hit-front,hit-local,hit-remote,int-front,int-local,int-remote,miss,pass,unknown [08:28:19] the question for you all is whether webrequest.X-Cache-Status is already in use somewhere and our change would break things or if we can go ahead [08:28:26] ema: I thought we had the caching layer path? [08:30:01] joal: yes in X-Cache we have the full path, but we need something describing in a less verbose manner what the request outcome was [08:32:27] ema: I don't think we use it anwhere - Would be good to double check with discovery [08:39:52] joal: "cache_status":"miss" "x_cache":"cp1063 pass, cp3045 miss, cp3047 miss" currently for example, the new format looks really really nicer :) [08:40:56] afaik we don't really care/check/other those values in refinery right? They are there only to be queried by the traffic masters :D [08:54:54] !log relaunched failed pageview-druid-hourly jobs - Druid indexation check failures in the logs (01 Nov 2017 21:00:00 and 01 Nov 2017 19:00:00) [08:54:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:13:48] gehel: hi! Are you guys using the X-Cache-Status header in any way at the moment? I'm planning on changing its values, see ^ [09:14:12] * gehel reading back... [09:14:36] ema: nope, I don't think we use it in any way directly [09:14:41] nice [09:15:59] ema: there might be other people doing ad hoc queries but I have never heard anybody doing anything major with X-cache-status.. not sure if performance do any special queries [09:21:47] elukey: I'm gonna ask! Do we have any query logs for webrequest BTW? [09:22:33] ema: not that I know, but there might be something in hive logs? [09:26:25] ema: We've never used it so far, but we have hive query logs :) [09:26:44] ema: back to 2017-05 [09:29:18] I think that bearloga used that at some point for some ad-hoc investigation, but I don't think he has any recurrent use of them [09:33:53] ema: The list of x_cache ad-hoc queries since 2017-05: https://gist.github.com/jobar/951902b8006eeacb11160b2f8cf3a16b [09:37:13] joal: nice! X-Cache-Status gets rewritten by vanrishkafka into 'cache_status'. Could you search for that instead of x_cache please? [09:37:27] Just did ema [09:39:12] ema: https://gist.github.com/jobar/3cb56bff151abbbd6fbce2f67375a4c2 [09:39:37] ema: There are some requests that I don't have (multilines ones) - represent about 10 requests [10:19:36] joal: cool, it doesn't look like it's used often [10:20:02] and when it is, changing the query to like or regexp seems easy enough :) [10:22:53] great ema :) [11:42:48] 10Analytics-EventLogging, 10Analytics-Kanban, 10MW-1.31-release-notes (WMF-deploy-2017-10-10 (1.31.0-wmf.3)), 10Patch-For-Review, 10User-Elukey: PageContentSaveComplete. Stop collecting - https://phabricator.wikimedia.org/T177101#3729129 (10elukey) Sanity check ``` MariaDB [log]> select count(*) from Pa... [11:48:01] 10Analytics-EventLogging, 10Analytics-Kanban, 10MW-1.31-release-notes (WMF-deploy-2017-10-10 (1.31.0-wmf.3)), 10Patch-For-Review, 10User-Elukey: PageContentSaveComplete. Stop collecting - https://phabricator.wikimedia.org/T177101#3729164 (10elukey) 05stalled>03Open [11:57:33] * elukey lunch! [12:41:46] qq - would it make sense to create a CNAME to db1108 called 'eventlogging-replica.eqiad.wmnet' ? [12:42:08] analytics-slave is currently a bit confusing and it is not mnemonic [12:42:28] elukey: I don' have an opinion as I seldom use those machines/names [12:43:02] but we use analytics-slave a lot in our codebase, so maybe it is not a good idea for the moment [12:43:10] * elukey auto-answer himself :( [12:43:23] Those answers are most of the self-correct :) [12:46:31] 10Analytics, 10DBA, 10Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3729337 (10elukey) All right we are ready to outline the next steps with some deadlines (tentative): * November 6th: the analytics-slave CNAME moves from db1047 to db1108 * N... [12:47:58] HaeB: o/ - db1108 is a brand new database host on which we keep only the log database (plus the custom replication script). If you want to test it and let me know your thoughts I'd be grateful :) [12:48:20] it should be a lot quicker than db1047 and dbstore1002 [12:55:11] 10Analytics, 10DBA, 10Operations, 10User-Elukey: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3729344 (10elukey) [13:02:53] 10Analytics, 10DBA, 10Operations, 10User-Elukey: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3729360 (10Marostegui) >>! In T156844#3729337, @elukey wrote: > All right we are ready to outline the next steps with some deadlines (tentative): > > * Novem... [13:55:40] elukey: you should be on break today, no? Also, I like the eventlogging-replica name better, and the mediawiki-db-replica name if we start doing that. Then we can deprecate the old CNAMES eventually. [13:57:03] milimetric: o/ - nope yesterday was bank holiday for me, today regular workday [13:57:16] ah! ok :) [13:57:26] yay, I feel much safer when you're around [13:58:12] :) [13:59:18] +1 to those names !:) [13:59:48] fyi though, i know most folks are not confused by this, but eventlogging generally refers to a bunch of software, not data! if you are talking about data, i usually refer to this as the eventlogging analytics data [14:00:18] eventlogging is also used for production data that is not in these databases [14:00:53] i am actually digging in some eventlogging code now [14:01:01] holy crap it does a really really weird thing i had no idea! [14:01:07] relevant to https://phabricator.wikimedia.org/T179540#3728047 [14:01:23] from clients, the data is sent in our favorite dt format: [14:01:29] 2017-11-02T13:55:37 [14:01:36] but, eventlogging CONVERTS this to an int [14:01:56] https://github.com/wikimedia/eventlogging/blob/master/eventlogging/parse.py#L82 [14:01:58] and THEN [14:02:01] when being saved in MYSQL [14:02:05] it is converted to somethign else [14:02:15] https://github.com/wikimedia/eventlogging/blob/master/eventlogging/jrm.py#L34 [14:02:18] AAHHHHHH [14:12:30] 10Analytics, 10Analytics-EventLogging: Timestamp format in Hive-refined EventLogging tables is incompatible with MySQL version - https://phabricator.wikimedia.org/T179540#3728047 (10Ottomata) I just looked into what EventLogging is doing to these timestamps. It's kinda crazy! The timestamp comes in from clie... [14:13:57] yall jsonrefine is trucking along very nicely, all puppetized etc.! we didnt' get any alert emails over night, which means all the schemas it is refining succeeded! [14:14:36] \o/ [14:15:06] this means that now eventlogging refined data is in hdfs right? I mean, we started publishing data in there too [14:15:34] yup! exactly [14:15:46] i think there is still al itlte more work to do before we call it done and advertise it to people [14:15:47] but ya [14:15:48] check it out elukey [14:15:51] event database in hive [14:15:56] hive --database event [14:15:58] show tables; [14:16:01] describe popups; [14:16:05] show partitions popups; [14:16:12] or also [14:16:18] show partitions mediawiki_revision_create; [14:17:58] oo i just realized that camus is whitelisting specific eventbus topics! [14:18:04] gonna make it more broad to get more goodies [14:18:11] this is great ottomata, congrats :) [14:18:11] :D [14:27:59] * elukey coffee! [14:33:59] running home, back in a sec [14:38:55] hey a-team, I need to run a quick errand now, I will be back during stand-up, but won't make it to the start! [15:01:27] ping elukey milimetric [15:01:33] omg [15:06:50] ping joal [15:34:40] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats Bug: Menu to select projects doesn't work (sometimes?) - https://phabricator.wikimedia.org/T179530#3729853 (10fdans) [15:35:35] 10Analytics, 10Analytics-Wikistats: Wikistats Bug – Put view settings in URL so it can be shared - https://phabricator.wikimedia.org/T179444#3724954 (10fdans) We're aiming to have this for our beta release. This data will not be serialised in URLs in the alpha release. [15:36:42] 10Analytics, 10Analytics-Wikistats: Wikistats Bug – Don't display time range difference over all time - https://phabricator.wikimedia.org/T179443#3724926 (10fdans) Resolve as duplicate of T179424 [15:37:01] 10Analytics, 10Analytics-Wikistats: Wikistats Bug – Don't display time range difference over all time - https://phabricator.wikimedia.org/T179443#3729862 (10fdans) 05Open>03Resolved a:03fdans [15:37:54] 10Analytics, 10Analytics-Wikistats: Wikistats Bug – Don't display time range difference over all time - https://phabricator.wikimedia.org/T179443#3729866 (10Jdforrester-WMF) [15:37:56] 10Analytics, 10Analytics-Wikistats: Deal with zero divisions in "small metrics" - https://phabricator.wikimedia.org/T179424#3729868 (10Jdforrester-WMF) [15:38:50] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats Bug – In tabular data view, format displayed values - https://phabricator.wikimedia.org/T179441#3729869 (10fdans) [15:38:55] 10Analytics, 10Analytics-Wikistats: Deal with zero divisions in "small metrics" - https://phabricator.wikimedia.org/T179424#3724156 (10Jdforrester-WMF) The merged task was that if the data starts from zero and ends at non-zero (say, all-time number of editors), it reports "+Infinity%". [15:39:41] 10Analytics-Kanban, 10Analytics-Wikistats: Deal with zero divisions in "small metrics" - https://phabricator.wikimedia.org/T179424#3729872 (10fdans) [15:42:52] 10Analytics-Kanban: Update Mediawiki Table manuals on wiki - https://phabricator.wikimedia.org/T179407#3729873 (10fdans) [15:48:34] 10Analytics: Send burrow lag statistics to prometheus {hawk} - https://phabricator.wikimedia.org/T120852#3729890 (10fdans) [15:48:49] 10Analytics, 10User-Elukey: Send burrow lag statistics to prometheus - https://phabricator.wikimedia.org/T120852#1862859 (10fdans) [15:53:12] 10Analytics, 10Spike: Spike: Evaluate alternatives to varnishkafka: varnishevents - https://phabricator.wikimedia.org/T138426#3729923 (10fdans) 05Open>03declined We're going to stick to varnishkafka for the time being ( :( ) [15:54:40] 10Analytics: Upgrade Kafka on main cluster with security features - https://phabricator.wikimedia.org/T167039#3315367 (10fdans) Since 1.0 is out we can add TLS [16:07:49] 10Analytics-Kanban, 10Patch-For-Review: Upgrade AQS restbase-modules - https://phabricator.wikimedia.org/T178312#3729975 (10Nuria) [16:08:10] 10Analytics-Kanban, 10Patch-For-Review: Rename datasources and fields in Druid to use underscores instead of hyphens - https://phabricator.wikimedia.org/T175162#3729976 (10Nuria) 05Open>03Resolved [16:08:21] 10Analytics-Kanban, 10Patch-For-Review: Upgrade AQS restbase-modules - https://phabricator.wikimedia.org/T178312#3687878 (10Nuria) 05Open>03Resolved [16:08:33] 10Analytics-Kanban: Locate data from /srv on stat1003 - https://phabricator.wikimedia.org/T179189#3729979 (10Nuria) 05Open>03Resolved [16:08:46] 10Analytics-EventLogging, 10Analytics-Kanban, 10MW-1.31-release-notes (WMF-deploy-2017-10-10 (1.31.0-wmf.3)), 10Patch-For-Review, 10User-Elukey: PageContentSaveComplete. Stop collecting - https://phabricator.wikimedia.org/T177101#3729980 (10Nuria) 05Open>03Resolved [16:09:06] https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Webrequest is wonderful [16:14:23] :) [16:22:27] ema: say that again! [16:23:02] https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Webrequest is wonderful ! [16:52:25] clearly recentchange is crazy! gonna blacklist that one [16:52:26] awww man [17:42:34] dcausse: o/ [17:42:44] elukey: hi! [17:42:55] dcausse: hi!! Do you have a min for a newbie question about cirrusSearchLinksUpdatePrioritized ? [17:43:04] elukey: sure [17:44:03] can it trigger refreshlink jobs ? [17:44:40] I am working on https://phabricator.wikimedia.org/T173710 and I can see a ton of those jobs on jobrunners, just want to make sure that they are not related [17:44:49] the most affected wiki seems to be commons [17:45:20] this jobs can trigger other jobs on failures but not refreshlinks [17:46:50] elukey: yup, it basically goes the other way. refresh links triggers cirrus links update [17:46:58] ebernhardson: o/ [17:47:29] super, I was seeing stuff like " addedLinks=["Template:Welcome/yue"] removedLinks=[]" in the jobrunner logs and I was wondering what it was [17:47:33] basically if something increases processing of refresh links, that creates more cirrus links updates jobs which is plausibly whats happening [17:48:24] the other job that I see the most is wikibase-addUsagesForPage [17:48:33] but I can't correlate it with refreshlinks atm [17:48:48] maybe it was just a template change that keeps causing recursion [17:49:26] i also like the new patch shipping this week to add causes to those jobs, being able to track back what is happening seems like a big win [17:50:12] don't we added a way to attach the context of the job creation? e.g. jobY created while doing jobX [17:50:45] the only thing that I saw was https://gerrit.wikimedia.org/r/#/c/385248 [17:51:08] that from https://www.mediawiki.org/wiki/MediaWiki_1.31/Roadmap seems to have been deployed for commons [17:51:30] but I can only see causeAction=unknown causeAgent=unknown [17:51:37] :/ [17:51:38] (in the jobrunner logs) [17:51:52] dcausse: we have a unique token that progresses through the chain, so all jobs, even recursively, created by a request have the same reqId in the logs. But that depends on the original request logging something [17:52:25] :S [17:54:27] i suppose it might be that those are existing recursive jobs rather than newly created things [17:54:34] so they dont have a cause [17:57:20] yeah this is my understanding too [17:58:28] ebernhardson: correct me if I am wrong, but technically all the recursive jobs have the same requestId right? So finding in the logs on mwlog1001 the first occurrence of requestId could lead to the start of all this mess? [17:58:41] (not saying it is easy, just wondering if possible :) [17:59:56] elukey: right [18:00:17] (03PS9) 10Mforns: [WIP] Add scala-spark core class and job to import data sets to Druid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/386882 (https://phabricator.wikimedia.org/T166414) [18:00:38] would be nice if we could get those logs into the logstash cluster ... but they were too much volume :)( [18:00:42] s/)// [18:03:41] ebernhardson: on mwlog1001:/srv/mw-log we have runJobs.log, that collects all the jobrunners logs [18:04:03] 10G of compressed file per day, so a log, but it might lead to some result [18:04:27] ottomata, question about refinery-source: when jenkins compiles the code it has no problems, but when I compile it in stat1004 with mvn package it fails with: [18:04:44] object DataFrameToDruid is not a member of package org.wikimedia.analytics.refinery.core [18:05:55] do you know why mvn does not recognize the class DataFrameToDruid as belonging to the package? the package line in the file is correct! [18:06:41] mforns: maybe you need to rebase? [18:06:57] i had a simliar problem because refinery-core/pom.xml was not configured to compile scala! [18:07:06] aaaaa... ok will try [18:07:11] i dont' know why I didn't have that problem before [18:07:16] (03PS10) 10Mforns: [WIP] Add scala-spark core class and job to import data sets to Druid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/386882 (https://phabricator.wikimedia.org/T166414) [18:07:23] but yesterday as I was deploying I did, and had to make a change to get the pom to build scala [18:07:53] https://gerrit.wikimedia.org/r/#/c/387845/ [18:09:02] ottomata, ok! I will rebase then manually in my folder in stat1004 [18:09:28] oh, but with a Gerrit rebase it looks as it's working.. [18:09:34] ok gr8 [18:10:30] it worked! :D [18:11:22] * elukey off! [19:02:47] * HaeB reads https://phabricator.wikimedia.org/T179540#3728047 , blinks repeatly [19:02:54] interesting findings ottomata ;) [19:06:47] hah yeah [19:06:57] i'm poking around in that code right now [19:09:17] 10Analytics, 10Analytics-EventLogging: Timestamp format in Hive-refined EventLogging tables is incompatible with MySQL version - https://phabricator.wikimedia.org/T179540#3730599 (10Tbayer) Interesting findings! Food for thought... we should probably reach out to other users of this data to get more input on t... [19:10:10] 10Analytics, 10Analytics-EventLogging: Timestamp format in Hive-refined EventLogging tables is incompatible with MySQL version - https://phabricator.wikimedia.org/T179540#3730600 (10Ottomata) Ah, you are correct, it is from the varnishes. [19:25:03] 10Analytics-Kanban, 10Discovery, 10Operations, 10Discovery-Analysis (Current work), and 2 others: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003) - https://phabricator.wikimedia.org/T147682#3730632 (10mpopov) [19:36:16] hey a-team, anybody avail to brain bbounce about https://phabricator.wikimedia.org/T179540#3730599 ? [19:38:10] mforns: ^? [19:46:33] ottomata: from my modest opinion seems that propagating ISO format is best [19:47:12] ya but its compcliated with mysql [19:53:02] ottomata, reading now [19:53:10] you still wanna brainbounce? [19:56:44] ya [19:56:50] bc [19:56:52] mforns: [19:56:57] ok [20:02:19] fdans: let me know what you find out about EL alarms [20:09:12] 10Analytics-EventLogging, 10Analytics-Kanban: Refine should parse user agent field as it is done on refinery pipeline - https://phabricator.wikimedia.org/T178440#3730723 (10Nuria) Doesn't seem that json-tuple is that friendly on a where clause right? I just cannot get it to work in something like: select json... [20:19:58] 10Analytics-EventLogging, 10Analytics-Kanban: Refine should parse user agent field as it is done on refinery pipeline - https://phabricator.wikimedia.org/T178440#3730789 (10Ottomata) Yeah a little nasty. It'd be better to do this in EventLogging processor or something though. We're already parsing the user a... [20:20:37] 10Analytics, 10Analytics-EventLogging, 10Operations, 10Ops-Access-Requests: Requesting Sharvani Haran to be added to researchers group - https://phabricator.wikimedia.org/T179611#3730792 (10Fjalapeno) [20:25:59] what theeeee [20:26:09] tilman changed the EventCapsule schema? [20:26:10] https://meta.wikimedia.org/w/index.php?title=Schema:EventCapsule&diff=prev&oldid=16479585 [20:26:14] this is very incorrect [20:26:22] luckily eventlogging code doesn't use this schema version [20:26:25] else all would break! [20:32:34] ottomata: ya, i do not think he knows that schema is literally used on the code [20:33:13] 10Analytics, 10Analytics-EventLogging: Timestamp format in Hive-refined EventLogging tables is incompatible with MySQL version - https://phabricator.wikimedia.org/T179540#3730832 (10Ottomata) > As documented in https://meta.wikimedia.org/wiki/Schema:EventCapsule , EventLogging tables are currently using timest... [20:33:15] ????? [20:33:54] mforns: that's why i was confused when i saw format "YYYYMMDDHHMMSS" in the schema [20:34:02] i was like "how the heck woudl this ever work!?" [20:34:09] but you said it was initially ISO no? [20:34:16] its ISO from varnish [20:34:30] but the schema was millis before tilman [20:34:31] no? [20:34:36] eventlogging processor converts it to utc-millisec, [20:34:36] it should have been millis [20:35:03] then jrm.py stuff converts it to MW timestamp (after validation, ec.) [20:35:10] if we had updated the capsule version now [20:35:15] all validation of events would fail [20:35:23] emmm, so EL code already has context on capsule schema [20:35:28] oh yeha [20:35:31] ya [20:35:49] well sort of [20:35:53] well, yes it def does [20:36:09] the processor code is kind of generic because it uses aformat specificer [20:36:14] aha [20:36:19] but ya, most of eventlogging code base expects schemas to be encapsulated [20:36:23] althought it is optional [20:36:26] which is how eventbus works [20:36:53] capsule sucks though, it means you have to use eventlogging codebase to work with data before it is in Kafka [20:37:04] since it stiches the capsule and the event togetehr [20:37:18] aha [20:45:27] Hi milimetric, mforns - Would you have some time for me? [20:45:39] yessssir [20:47:00] omw to the cave joal [20:58:45] 10Analytics-EventLogging, 10Analytics-Kanban: Refine should parse user agent field as it is done on refinery pipeline - https://phabricator.wikimedia.org/T178440#3730879 (10Nuria) I agree to do it in EL makes most sense , totally, the field should not be coupled to the ability of the storage to handle / not to... [22:34:49] 10Analytics, 10Analytics-EventLogging: Resolve EventCapsule / MySQL schema descrepencies - https://phabricator.wikimedia.org/T179625#3731162 (10Ottomata) [22:42:03] 10Analytics-EventLogging, 10Analytics-Kanban: Refine should parse user agent field as it is done on refinery pipeline - https://phabricator.wikimedia.org/T178440#3731196 (10Ottomata) [22:42:05] 10Analytics, 10Analytics-EventLogging, 10Patch-For-Review: Resolve EventCapsule / MySQL schema descrepencies - https://phabricator.wikimedia.org/T179625#3731195 (10Ottomata) [22:43:20] 10Analytics, 10Analytics-EventLogging, 10Patch-For-Review: Resolve EventCapsule / MySQL / Hive schema discrepancies - https://phabricator.wikimedia.org/T179625#3731162 (10Ottomata) [22:44:28] 10Analytics, 10Analytics-EventLogging, 10Patch-For-Review: Resolve EventCapsule / MySQL / Hive schema discrepancies - https://phabricator.wikimedia.org/T179625#3731205 (10Ottomata) [22:45:07] 10Analytics, 10Analytics-EventLogging, 10Patch-For-Review: Resolve EventCapsule / MySQL / Hive schema discrepancies - https://phabricator.wikimedia.org/T179625#3731162 (10Ottomata) [23:48:05] (03PS1) 10Joal: Rename latest/historical fields in mw-history [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/388265 [23:48:13] (03PS1) 10Joal: Rename latest/historical fields in mw-history [analytics/refinery] - 10https://gerrit.wikimedia.org/r/388266 [23:48:35] (03PS1) 10Joal: Update mediawiki-history page reconsruction [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/388267