[05:10:19] 10Analytics, 10Analytics-EventLogging, 10Patch-For-Review: Provide a robust way of logging events without blocking until network request completes; use sendBeacon - https://phabricator.wikimedia.org/T44815#3623566 (10Mattflaschen-WMF) Replied at https://gerrit.wikimedia.org/r/#/c/238623/ . This is still a v... [05:15:48] 10Analytics, 10Analytics-EventLogging, 10Patch-For-Review: Provide a robust way of logging events without blocking until network request completes; use sendBeacon - https://phabricator.wikimedia.org/T44815#444885 (10Mattflaschen-WMF) >>! In T44815#3623566, @Mattflaschen-WMF wrote: > This is still a valid iss... [06:53:30] 10Analytics, 10Analytics-EventLogging, 10AbuseFilter, 10CirrusSearch, and 30 others: Possible WMF deployed extension PHP 7 issues - https://phabricator.wikimedia.org/T173850#3541977 (10Paladox) I think they fixed most of the false positives now. Could someone re run the linter and update the description pl... [07:55:27] joal: o/ [08:00:51] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Use Prometheus for Kafka JMX metrics instead of jmxtrans - https://phabricator.wikimedia.org/T175922#3623734 (10elukey) >>! In T175922#3622156, @Ottomata wrote: > HM, why are we making an 'analytics' prometheus instance for this?... [08:03:11] joal: I had a chat with ops this morning for the druid proxy and there were some concerns related to it being not enough [08:04:07] for example if $attacker finds a way to query-inject $horrible-thing and bypass the AQS limits to query only public data [08:04:31] I think that we'll get an email in response, so if you could prepare a response that would be great :) [08:05:25] my thoughts are that the POST will have hardcoded parameters in AQS, so $attacker should be able to change AQS's code to inject his query [08:05:38] buuuut I am ignorant on this part so I'll defer to you :) [08:28:50] Hi elukey [08:29:28] This is indeed the case: the datasource is harcoded in restbase for queries [08:30:02] addshore: o/ - do you know anything about https://tools.wmflabs.org/openstack-browser/server/phragile-pro.phragile.eqiad.wmflabs ? [08:31:41] joal: the only remote concern could be then if $attacker is able to call AQS with some $parameters able to override those hardcoded values [08:32:13] (I am trying to come up with all possible uses cases, at some point we'll draw the line) [08:33:10] elukey: we should double check with services if restbase-query-parameters are escaped for potential javascript attacks (because that's the only hing I can think of) [08:33:31] let's summon the master of Restbase mobrovac :D [08:33:47] yes, servants? [08:33:49] :D [08:33:51] ahahahaha [08:33:54] ciao Marko [08:33:59] sorry to bother you [08:34:01] ciao [08:34:03] :D [08:34:08] np, what's up? [08:34:31] we are exploring all the possible attack scenarios for AQS when it will be able to contact druid [08:34:45] (there is an email with ops for this) [08:34:51] mobrovac: Would you give me a sock [08:34:52] ? [08:35:42] context question: will aqs just proxy requests to druid directly or will you be filling cass like you do for other data? [08:36:04] joal: i'm missing a sock myself, so i can give you that extra one, i'd save me from trying to find the other one [08:36:09] i have a left sock :) [08:37:36] mobrovac: give it to me Master !!!! [08:37:40] for the query params, we relay them as they are, because we almost don't use them [08:37:40] mobrovac: anyway :) [08:38:07] that is, we usually construct our own query params [08:38:10] mobrovac: We relay queries to druid straight away [08:38:20] but query param sanitisation is a pretty good point [08:38:43] mobrovac: The only query param that doesn't get parsed as of now is project, but it's easy to actually check it [08:39:01] the problem with doing it generically is that legitimate values in one place might pose threats in another [08:39:19] oh right right, you have those checked [08:39:29] others need to be converters, so we know they are sane, but project is not double checked - Will make sure itr is [08:39:39] +1 [08:39:57] great, hanks mobrovac :) elukey, makes sense for you as well ? [08:40:21] it might make sense to put all the available projects in the spec [08:40:42] that way they can be auto-checked and provide a list of options on the docs page [08:41:39] mobrovac: ~800 projects ... Thats a lot [08:41:50] mobrovac: Is it woerth? [08:42:19] yeah, i know [08:42:36] and you have the project in every end point, that would get crazy long [08:42:41] yeah, not worth it probably [08:43:04] you can use yaml refs, though, so you would need to write them only once [08:43:13] but, it's a minor detail, so up to you really [08:43:24] s/once/in one place/ [08:43:37] mobrovac: Having a regex for whitelist seems easier to maintain as well [08:45:36] yup sure [08:46:02] it's going to be a gigantic one, so make sure to comment it properly [08:47:09] makes sense to me as well [08:47:39] mobrovac: I'll actually make small - It's no big deal to have non-existant projects, they'll get a no-result answer, but I want to make sure the project field is not of injection type [08:47:57] yup good point [08:48:20] As for comments mobrovac, you know me, I don't comme :-P [08:48:33] hahaha [08:50:20] elukey: I know some things! Why what's up? [08:52:18] addshore: there you go! https://gerrit.wikimedia.org/r/#/c/379499/ [08:52:39] joal: please add me to the change as a reviewer when you put it together [08:53:53] elukey: ack! [08:58:12] addshore: the instace should be phragile-pro.phragile.eqiad.wmflabs but I can't access :( [08:59:49] mobrovac: just added you - The change is huge so far (a lot of code for druid, but please feel free to have a look) [09:00:26] mobrovac: I also plan to split huge files into smaller ones [09:00:39] joal: is it correct to say that druid does not offer multi-tenancy right ? [09:01:03] correct elukey [09:01:29] elukey: Druid says user management should be done client-side (which is a middle ware in our case) [09:02:47] I am not goint to tell this to ops otherwise I'll get killed [09:02:48] ahahhaah [09:02:55] elukey: :) [09:03:00] I mean, this is a big WTF [09:03:07] in my opinion [09:03:09] elukey: Druid's job is to compute fast - It does so :) [09:03:21] I'd have preferred them to say "we have no plan for that" [09:03:23] elukey: Druid is backend only, that is [09:03:27] kk joal, thnx [09:03:41] elukey: okay! Can't look at it this second but will try to today! [09:03:50] i agree with elukey, not having any kind of auth(n|z) is a big wtf for any type of service [09:03:54] joal: it doesn't matter in my opinion, authentication is orthogonal from being a backend or not [09:04:14] addshore: sure sure whenever you have time! Low priority [09:04:34] I hear the concern mobrovac and elukey - I however have no answer :) [09:13:41] joal: another question for you - are the code changes big in AQS to be able to make this work? IIUC no, but let me know otherwise [09:17:25] and will Restbase be able to contact druid in the future? (meaning, people reusing the code that we are writing) [09:17:36] elukey: Not really no, it's just a function checking validity of one parameter against a regex [09:17:58] elukey: I have no idea if people will reuse our code :) [09:18:10] elukey: some of it is reusable, some of it isn't :) [09:18:27] got it [09:23:47] elukey: Do you think we''ll manage to have ops approval despite the 'non-auth' thing? [09:25:07] joal: it seems so, but not sure if we'll get complete agreement [09:26:01] elukey: no agreement from ops means no go ... Should we think of an alternative solution? [09:26:04] joal: next question ... the new code to contact druid will be the in AQS codebase right? So totally owned by us? [09:27:01] correct elukey [09:30:25] joal: the main concern seems still to be the absence of multi-tenancy, but we'll probably need to ack it and take responsibility for this technical choice. The only realy alternative is to have two separate clusters for public/private [09:30:43] elukey: right [09:31:12] elukey: we could go for that, just means more hardware, and still a punch in the network [09:44:49] (03CR) 10Mobrovac: "Gave just a quick look at the code, looks ok, some comments in-lined." (035 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805) (owner: 10Joal) [09:47:14] Thanks mobrovac for comments ! [09:47:25] yw! [09:47:52] mobrovac: we will add LVS before druid indeed, so I'll remove the host-list and only use one without RR [09:48:54] cool :) [09:48:57] About the scheme mobrovac, allowing to have it not set is mandatory for our test strategy (using a fake restbase endpoint [09:49:39] ah ok, didn't catch that, but that was more of a stylish suggestion than anything else [09:50:00] (i'm assuming you are addressing the "move the schema to a file" comment) [09:50:14] makes sense mobrovac - we actually create a new internal endpoint for tests that fakes a druid [09:50:22] mobrovac: I will :) [09:50:46] mobrovac: I need to work some more on this today (renames, files split etc) - will let you know when beter :) [09:51:11] kk cool joal [10:55:58] * fdans lunch! [10:59:28] joal: good news, it seems that the worst that can happen is that we'll need to implement some sort of basic validation on the druid nodes for the POST data [11:00:00] elukey: that doesn't strike me as 'good news', but ok :) [11:00:40] joal: are you going to be more pessimist than me today? :D [11:03:37] huhuh :D [11:33:37] * elukey lunch + errand! [11:58:08] FYI, I'm instaling apache security updates on thorium [12:00:35] taking a break :) [12:13:59] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3624115 (10phuedx) a:03phuedx I'm going to take the time to pin down the root cause… [13:23:48] 10Analytics, 10Analytics-Cluster, 10Operations, 10Research-management: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#3624319 (10Ottomata) Should be AMD FirePro S9150 according to quote. [13:31:28] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3624352 (10phuedx) I can reproduce this in my local development environment very easily. I suspect that th... [13:31:44] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3624353 (10phuedx) I believe this is an instance of the late subscriber problem. If all of the page's reso... [13:48:57] elukey: i started looking into the prometheus alerts yesterday [13:49:03] but got stuck because I didn't know what the names would be :/ [13:49:11] so that's why i got all up in yo biznasssss [13:49:11] :) [13:51:17] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog, and 2 others: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3624400 (10phuedx) [13:51:53] ottomata: hello! You were quite right, but I didn't get the time today to check the cluster naming :( [13:52:00] we can do it together if you want [13:52:10] I just created the first draft of the tls proxy [13:52:11] no problem! you got that tls stuff [13:52:11] yeah [13:52:53] lemme know if i can help at all [13:53:08] end of day yesterday i started working on cergen tests [13:53:24] nice! [13:53:26] milimetric: can I grab you for a bit later today and run through the strata talk with you? [13:53:34] or this morning i guess? [13:56:28] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog, and 2 others: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3624406 (10phuedx) @ovasileva: FYI I believe that I've figured out the problem here and have... [13:58:48] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog, and 3 others: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3624412 (10phuedx) Large 1 or a small 2, mostly because of the risk involved in changing the... [14:03:02] ottomata: yes! now or later is good [14:03:36] milimetric: gimme 15ish mins then let's batcave, maybe we can talk a bit first about acoupel of slides, then i can run through [14:03:43] k, cool [14:04:27] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog, and 3 others: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3624441 (10ovasileva) good news! [14:36:07] milimetric: bc? [14:47:28] ottomata: (not urgent) today I checked the current druid puppet code and I am wondering if some stuff could be moved from the module to profiles, like for example the ferm rules [14:48:01] it seems strange to mention ANALYTICS_NETWORK in the module's classes no? [14:59:34] (03CR) 10Mforns: [C: 031] "Awesome work!! I added a couple comments. For my better understanding rather than things to change. Cheers!" (035 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/379000 (https://phabricator.wikimedia.org/T174174) (owner: 10Joal) [15:01:37] 10Analytics, 10Analytics-Cluster, 10Operations, 10Research-management: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#3624619 (10dr0ptp4kt) Thanks, @ottomata. Any chance you could take a look at the GPU? I’d like to watch to learn something about the setup of this in Debian. We r... [15:05:47] 10Analytics, 10Analytics-Cluster, 10Operations, 10Research-management: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#3624627 (10Ottomata) Haha, I guess I can? But you know as much as I do! [15:19:13] 10Analytics, 10Analytics-Cluster, 10Operations, 10Research-management: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#3624653 (10dr0ptp4kt) Ha! That’s what they all say :P Around the next couple weeks? I could set up a time to watch and try to read up on the manuals. I don’t hav... [15:42:34] elukey: +1 to moving ferm out of module classes [15:42:55] super, will work on that tomorrow morning :) [16:04:53] 10Analytics-EventLogging, 10Analytics-Kanban: Implement purging scheme for eL data on top of eventlogging refine - https://phabricator.wikimedia.org/T176426#3624787 (10Nuria) [16:05:36] 10Analytics, 10Analytics-EventLogging: Implement purging scheme for eventlogging data on top of eventlogging refine - https://phabricator.wikimedia.org/T176426#3624809 (10Nuria) [16:10:10] 10Analytics, 10Analytics-EventLogging: Implement purging scheme for eventlogging data on top of eventlogging refine - https://phabricator.wikimedia.org/T176426#3624825 (10fdans) a:05Ottomata>03None [16:11:23] 10Analytics, 10Analytics-Wikistats: Improve Wikistats UI for mobile - https://phabricator.wikimedia.org/T176143#3614935 (10fdans) [16:13:31] 10Analytics: Give +w permission for users in /srv folder in SWAP Machines - https://phabricator.wikimedia.org/T176093#3613257 (10fdans) We'll solve this problem when we get new hardware. [16:21:04] 10Analytics: Mount dumps on SWAP machines (notebook1001.eqiad.wmnet / notebook1002.eqiad.wmnet) - https://phabricator.wikimedia.org/T176091#3613231 (10fdans) p:05Triage>03Low [16:21:11] 10Analytics: Give +w permission for users in /srv folder in SWAP Machines - https://phabricator.wikimedia.org/T176093#3613257 (10elukey) We are expecting to get new hardware during the next quarter as part of the scheduled hardware refresh :) [16:22:53] 10Analytics, 10Analytics-Wikistats, 10Easy, 10I18n, 10Patch-For-Review: WikiReportsLocalizations.pm still fetches language names from SVN - https://phabricator.wikimedia.org/T64570#670885 (10fdans) We're hoping to launch the alpha of Wikistats 2.0 by the end of this quarter. Moving this to deprioritised. [16:23:02] 10Analytics, 10Analytics-Cluster, 10Operations, 10Patch-For-Review, 10User-Elukey: rack/setup/install new kafka nodes kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T167992#3624895 (10Cmjohnson) [16:24:24] 10Analytics-EventLogging, 10Analytics-Kanban, 10Page-Previews, 10Readers-Web-Backlog, and 3 others: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3607883 (10fdans) [16:29:57] 10Analytics: Correct pageview_hourly and derived data for T141506 - https://phabricator.wikimedia.org/T175870#3624912 (10fdans) [16:29:59] 10Analytics, 10Research: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207#3624913 (10fdans) [16:37:20] 10Analytics-Kanban: Add "PhantomJS" to the list of bots in webrequest definition. - https://phabricator.wikimedia.org/T175707#3624934 (10fdans) [16:37:21] 10Analytics: Correct pageview_hourly and derived data for T141506 - https://phabricator.wikimedia.org/T175870#3606092 (10Milimetric) While we do want to fix data when we have infrastructure problems, we want to approach this type of issue as a "Pageview definition" problem. So we are adding this to the broader... [16:37:25] 10Analytics-Kanban: Add "PhantomJS" to the list of bots in webrequest definition. - https://phabricator.wikimedia.org/T175707#3600811 (10fdans) [16:38:23] 10Analytics-Kanban: Order hardware labs storage for mediawiki history analytics friendly DB - https://phabricator.wikimedia.org/T175604#3624945 (10fdans) [16:39:10] 10Analytics-Kanban: Procure hardware to refresh jupyter notebooks - https://phabricator.wikimedia.org/T175603#3624947 (10fdans) [16:40:22] 10Analytics, 10Analytics-Wikistats: Line graph-related Wikistats changes - https://phabricator.wikimedia.org/T175582#3624949 (10fdans) 05Open>03declined [16:41:16] 10Analytics, 10Analytics-Cluster: CamusPartitionChecker does not work when topic names have '.' or '-' in them. - https://phabricator.wikimedia.org/T171099#3624952 (10fdans) [16:42:04] 10Analytics, 10EventBus, 10Wikimedia-Stream: Hits from private AbuseFilters aren't in the stream - https://phabricator.wikimedia.org/T175438#3593335 (10fdans) Hey @Nirmos can you provide a bit more context on this? [16:43:59] 10Analytics-Kanban, 10Analytics-Wikistats: Punch hole so AQS can access druid hosts - https://phabricator.wikimedia.org/T175299#3624975 (10fdans) [16:44:36] 10Analytics-Kanban, 10Analytics-Wikistats: Punch hole so AQS can access druid hosts - https://phabricator.wikimedia.org/T175299#3589434 (10fdans) 05Open>03declined [16:44:38] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Add edits endpoint to AQS using druid as a backend - https://phabricator.wikimedia.org/T174174#3624978 (10fdans) [16:50:32] * elukey off! [16:57:16] dr0ptp4kt: haha, what I mean by that is I have no idea [16:57:22] but, we can maybe figure it out together? [16:57:27] hangout sometime and google/poke around? [17:03:57] yeah ottomata will set it up [17:37:14] milimetric: come on in! [17:37:15] bc [17:37:21] or a-team whoever wants [17:37:23] bc [17:38:10] ping mforns milimetric joal , we were going to go over presentation, come if you are interested and ok to say no [17:38:16] cc fdans sorry [17:39:44] sorry, off right now to meet a friend :( [18:22:30] ottomata: Mwarf, missed that :( [18:23:51] np! [18:23:57] might do again monday or tuesday :) [18:24:09] ottomata: I'd have liked to hear it, is all :) [18:24:16] Ah ! great :) [18:59:43] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana - https://phabricator.wikimedia.org/T174815#3625337 (10Tbayer) >>! In T174815#3592763, @Nuria wrot... [19:12:56] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana - https://phabricator.wikimedia.org/T174815#3625355 (10Nuria) @Tbayer: we can put all data in hado... [20:01:25] (03PS3) 10Joal: Add mediawiki-history-metrics endpoints [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805) [20:02:02] milimetric, mforns, nuria_ - Any of you has some time to discuss this CR ? --^ [20:02:16] joal, I do [20:02:37] same, omw [20:03:09] omw [20:28:29] (03PS4) 10Joal: Add mediawiki-history-metrics endpoints [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805) [20:28:43] Gone to bed a-team, see you tomorrow [20:28:48] byeeee [20:29:57] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana - https://phabricator.wikimedia.org/T174815#3625570 (10Tbayer) >>! In T174815#3625355, @Nuria wrot... [20:32:29] (03CR) 10Joal: "Thanks @mobrovac and @nuria - code updated." (036 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805) (owner: 10Joal) [20:35:36] (03PS5) 10Joal: Update mediawiki-history-reduced oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/379000 (https://phabricator.wikimedia.org/T174174) [20:38:44] 10Analytics-EventLogging, 10Analytics-Kanban, 10Page-Previews, 10Readers-Web-Backlog, and 4 others: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3625588 (10Krinkle) [21:48:16] 10Analytics, 10Discovery, 10Discovery-Analysis, 10RfC: RFC: Requirements for analytics stats processor - https://phabricator.wikimedia.org/T150028#3625761 (10debt) 05Open>03declined This work was for grafana and graphite updates for the mapping service; we probably won't ever get around to doing this. [22:06:32] How interesting are reports of inconsistencies in the mediawiki history table? E.g. pages that are listed as “created at X, deleted at Y”, but the logging table indicates it should instead be “created at X, deleted at W, (re)created at Z, deleted at Y” [22:07:10] Or am I reading the data wrong? [22:12:04] Nettrom: please file tickets and we can look into it, did you look at : https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits#Limitations_of_the_historical_datasets [22:12:27] Nettrom: there rae issues such us page-ids getting "reused" that we will not be able to "fix" [22:14:22] nuria_: Yeah, I know about that page, which is why I thought I’d ask first. Looks like this one is a reused page ID, so “wontfix” then. And I’m happy to work around it. [22:15:21] Nettrom: ya, puff there are several of those , please -as always_ edit docs so this is also clear to other users [22:16:51] nuria_: I’ll try to add an example to the limitations page [22:17:09] Nettrom: many thanks [22:44:17] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana - https://phabricator.wikimedia.org/T174815#3625907 (10Nuria) We are talking about one Popup table...