[08:30:11] elukey: o/ [08:30:14] addshore: o/ [08:30:38] hey joal [08:30:58] elukey: I said yesterday I'd teach you to load the test dataset in qs ;) [08:32:11] elukey: let me know when is a good time :) [08:32:39] joal: you are supposed to me on vacation right? :P [08:32:49] elukey: tomorrow only ;) [08:32:57] elukey: well, tomorrow and Friday :) [08:33:01] plus there is an big problem ongoing with the mw appservers :( [08:33:04] elukey: Today I work (a bit) [08:33:21] elukey: you tell me when :) [08:33:24] joal: o/ [08:33:30] Hey addshore [08:33:41] I'm about to deploy the oozie part of your patch [08:33:49] :D [08:33:52] addshore: The scala part was released yesterday [08:59:44] !log deploying refinery from tin [09:01:28] (PS1) Addshore: Accept statsd and graphite hosts with ports [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298703 (https://phabricator.wikimedia.org/T140081) [09:01:59] (PS1) Addshore: Accept statsd and graphite hosts with ports [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298704 (https://phabricator.wikimedia.org/T140081) [09:02:30] (CR) Addshore: [C: 2] Accept statsd and graphite hosts with ports [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298703 (https://phabricator.wikimedia.org/T140081) (owner: Addshore) [09:02:33] (CR) Addshore: [C: 2] Accept statsd and graphite hosts with ports [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298704 (https://phabricator.wikimedia.org/T140081) (owner: Addshore) [09:05:53] !log Deploying refinery to HDFS [09:07:23] :D [09:16:15] addshore: Shall I start your job in production with the real graphite naemspace? [09:16:34] addshore: Also, when in time should it start? [09:21:46] how far back does the pageview table start joal ? [09:22:23] addshore: I prefer to return you the question the other direction: How much of historical data do you REALLY need ? [09:22:29] I mean, it would be awesome if we could go back to when the extension in question was initially deployed! [09:24:12] Which would be since the 11 May! [09:24:28] addshore: this year? [09:24:33] yup [09:24:40] addshore: Sounds correct :) [09:24:59] addshore: meaning, sounds feasible and not too bad (better than correct) [09:25:30] addshore: not too bad in term of datasize to rework (man , I'm having problem making complete sentences this morning) [09:25:40] :D need a coffee? ;) [09:25:49] addshore: Already drank some ;) [09:25:57] addshore: might need more :) [09:27:01] addshore: I'll launch you job in prod, using starting date 2016-05-01 [09:27:04] addshore: Ok? [09:27:08] Yup!!! [09:27:40] and I should be able to see them running on https://hue.wikimedia.org/oozie/list_oozie_workflows/ ? [09:28:22] addshore: https://hue.wikimedia.org/oozie/list_oozie_coordinator/0020538-160630131625562-oozie-oozi-C/ [09:28:59] :) [09:29:36] addshore: I'll check everything looks good (a couple of successfull runs, and data available) before calling done ;) [09:32:13] great! [09:46:22] addshore: looks like we're having a problem :( [09:46:31] addshore: no data showing up in graphite [09:46:50] yeh :/ I know graphite can be slow sometimes, but often not this slow! [09:51:39] addshore: maybe since there is no data, no value is sent :) [09:51:45] addshore: let's wait a bit [09:51:58] yeh, and this thing was only deployed to 4 small wikis to begin with! [09:52:28] But, its done 11 says now and nothing! :/ [09:52:58] *days [09:53:04] addshore: let's wait some again [09:56:03] joal: some just appeared :) [09:56:13] looks like it is working! :D [09:56:39] addshore: being patient is sometime a skill I lack, so I work on it on a daily basis ;) [09:58:05] addshore: Looks like we can call this done :) [10:11:40] oh wait, joal it totally make sense it only just started appearing, as the thing on just got to the 11th of May! [10:11:56] And yes, it wouldn't have any values to send until then! [10:11:58] addshore: it was true a few minutes ago ;) [10:12:21] addshore: congrats, your first analytics-cluster job is in production :D [10:12:26] woo! [10:12:29] * joal claps for addshore :) [10:12:50] I did notice there was 1 bug with it, there is a rouge '.user.' in the metric name, but I'm not going to worry about that [10:13:14] Right, now https://grafana.wikimedia.org/dashboard/db/article-placeholder should slowly populate! [10:14:11] addshore: cool :) [10:46:37] Analytics, TCB-Team, WMDE-Analytics-Engineering, TCB-Team-Sprint-2016-07-14: Enable basic tracking for beta features - https://phabricator.wikimedia.org/T140226#2457121 (Lea_WMDE) [10:56:24] Analytics, TCB-Team, WMDE-Analytics-Engineering, TCB-Team-Sprint-2016-07-14: Enable basic tracking for beta features - https://phabricator.wikimedia.org/T140226#2457121 (Addshore) > Show the total number of page visits Should this be per wiki or aggregated? Beta features are enabled on the page... [11:21:37] joal: are namespaces normalized to english in the pageview table? [11:22:44] addshore: for sure not :) [11:22:54] ;_; [11:23:06] okay, then the query for the articleplaceholder this has a bad assumption! [11:23:24] addshore: you probably don't recall I told you to be carefull with namespaces and languages :) [11:23:40] I don't recall, ;( damn! [11:23:41] https://github.com/wikimedia/analytics-refinery-source/commit/b4d91d6ff770b0b81ade450018179b902c7e7499#diff-baedbe39a5b6ad4f03ff533937d11fb0R88 [11:27:46] joal: I guess I have to change the query to look for %:AboutTopic% instead? But I guess even that could be translated.. [11:30:27] addshore: I actually don't know how namespaces are internationalized, so I can't really help on that [11:30:39] addshore: sorry :( [11:30:50] addshore: I assume we should stop the oozie job? [11:31:05] yup [11:32:26] addshore: the only normalization we do with page titles is url decoding, and spaces replaced by dashes [11:33:54] joal: mediawiki does stuff in https://github.com/wikimedia/mediawiki/blob/master/languages/messages/MessagesLzz.php#L21 [11:33:58] for example [11:35:33] addshore: right [11:42:58] Right, so joal, some ideas of how to solve this issue in my head right now... 1) do something icky to normalize all namespaces / add a field with the normalized namespace (probably lots of work etc) 2) write a query that checks all namespace aliases too? (probably not very efficient) 3) Ignore the namespace and look for %:AboutTopic% (what if AboutTopic gets [11:42:58] translated? 4) Add something to x_analytics header for the views of the special page? 5) crawl into a corner [11:44:23] oooooh, wait [11:44:37] I think the namespace id is already in x_analytics *checks* [11:46:29] okay, it looks like it is for regular pages but not for special pages! but this might be a solution! [11:50:36] addshore: hm [11:50:43] addshore: This wouldn't work for mobile [11:50:52] addshore: but except from that, sounds viable [11:50:58] I could make it always add the ns to the header, and also add the code of the special page in the case of it being a special page [11:51:16] that would mean I wouldn't be able to get any past data, but I would be able to reliably get future data [11:51:25] addshore: I think we'd want the namespace in any page type [11:52:21] addshore: if any, of course [11:53:06] yup, right now the namespace is only added if the page has an id, which thus excludes special pages [11:53:43] (PS4) Amire80: Add a script for post-processing interlanguage links stats [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/291358 (https://phabricator.wikimedia.org/T139327) [11:53:48] addshore: I think it would be interesting to know if the 'AboutTopic' is translated or not [11:53:52] (PS5) Amire80: Add a script for post-processing interlanguage links stats [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/291358 (https://phabricator.wikimedia.org/T139327) [11:54:13] (CR) Amire80: [C: -1] "(I'll change a bunch of things before merging)" [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/291358 (https://phabricator.wikimedia.org/T139327) (owner: Amire80) [11:54:24] new varnishkafka installed on cp3008.esams [11:54:41] I am checking kafka logs from stat1002 and it seems good [11:59:08] elukey: do you know if hadoop servers got restarted yesterday night? [11:59:18] joal: it is not tranlsated, but probably ill be in the future [11:59:44] addshore: Then you have a way to build up your historical data [12:00:22] addshore: here is what I suggest: Modify the job to work with non-translated AboutTopic, and have it runnning soon [12:00:45] addshore: Then build up sending ns for special (and maybe other ?) special pages [12:01:18] joal: specific daemons or the hosts? [12:01:33] elukey: any actuqally [12:01:52] elukey: The regular cassandra loading job got killed yesterday, and I wonder why [12:02:01] joal, awesome! I'll be back after lunch! [12:03:24] addshore: enjoy lunch [12:04:43] elukey: I monitor a month loading using regular CQL loader on new AQS, and really it works better [12:05:31] super [12:06:06] joal: node managers could have been restarted yesterday at 13:00 UTC, I can see some variations in GC metrics [12:06:09] possible? [12:06:35] elukey: job failed around 2am this morning [12:06:51] elukey: the cassandra thing is rather counterintuitive ... [12:09:08] ah no I don't think so then [12:09:15] elukey: ok:) [12:09:25] joal: --expand "the cassandra thing is rather counterintuitive" [12:09:27] :) [12:09:32] huhu [12:09:50] elukey: I would have expected cassandra to better deal with compacting bulk loaded data [12:10:10] ah yes [12:10:25] I had a lot of expectations too :( [12:10:45] but maybe it is a super new thing that needs a lot of knowhow before getting it right [12:10:49] Well I mean, from a loading perspective, the thing is really better ! [12:16:36] elukey: by the way, after wiping the cassandra cluster: cqlsh -u cassandra -p cassandra aqs1004-a -f /srv/deployment/analytics/aqs/deploy/scripts/insert_monitoring_fake_data.cql [12:19:29] ahhhhh [12:19:31] thanksssss [12:19:35] adding it to the AQS docs [12:21:17] https://wikitech.wikimedia.org/wiki/Analytics/AQS#Add_fake_data_to_Cassandra_after_wiping_the_cluster [12:25:11] I take a break a-team, see you later [12:42:14] Analytics-Kanban: Multimedia health metrics stalled since Feb 2016 - https://phabricator.wikimedia.org/T140137#2457492 (mforns) The dashboard looks good again. https://edit-analysis.wmflabs.org/multimedia-health/#projects=commonswiki/metrics=Uploaders [12:59:54] Analytics, DBA, Patch-For-Review: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#2457599 (mforns) @jcrespo Please, see above the patch I created. When the file has a permanent location in puppet, I will add some documentation on EventLogging's Wikitech page. Th... [13:08:35] team I am upgrading varnishkafka to 1.0.11-1 in all cache maps [13:08:45] tomorrow I'll do cache misc if everything will be fine [13:09:08] and I'll also add the -T option value to the configuration [13:09:12] via puppet [13:09:18] that should finally solve all our oozie problems [13:14:13] !log varnishkafka upgraded from 1.0.10-1 to 1.0.11-1 manually on cp3008.esams (misc) and via apt for the whole cache maps cluster [13:14:37] elukey, joal: FYI: https://docs.datastax.com/en/cassandra/2.2/cassandra/tools/toolsSSTableOfflineRelevel.html [13:15:37] urandom: thanks! but I thought it wasn't needed :/ [13:16:49] elukey: no sure what you mean, i was just reminded of it, and thought i'd mention it given the troubles you've had with LCS dragging behind the import [13:16:59] s/no sure/not sure/ [13:20:04] urandom: probably it is me being ignorant :) I thought that the L0 could not get too big since cassandra should automagically rebalance SSTables between levels [13:20:38] (PS1) Addshore: Use x_analytics header to match special ns for Special:AboutTopic [analytics/refinery/source] - https://gerrit.wikimedia.org/r/298724 (https://phabricator.wikimedia.org/T138500) [13:20:38] This is often the case when atypical write load is experienced (eg. bulk import of data, node bootstrapping). [13:20:40] (PS1) Addshore: Use x_analytics header to match special page name [analytics/refinery/source] - https://gerrit.wikimedia.org/r/298725 (https://phabricator.wikimedia.org/T138500) [13:20:42] ah yes [13:20:51] so it must be used only in exceptional cases [13:20:53] got it thanks :) [13:20:58] didn't read it carefully [13:21:12] elukey: with the exception of l0, there is no concurrency within levels [13:21:56] l0 is an exception because it does size-tiered within level l0 [13:23:02] so you can't merge tables from l0 into l1, until l1 is below the threshold, and that will bottleneck on what can be done by one thread [13:23:42] ah this bit is not mentioned in the datastax doc that I've read! [13:24:06] because it mentions that l0 sstables get compacted in l1 [13:24:16] but no mention about the threshold [13:24:22] ok now it makes sense [13:25:27] elukey: when i looked the other day, i still had the feeling something else was wrong [13:26:19] it does size-tiered in l0, and it seemed to me that it should have been doing more to get the count down in l0 [13:26:19] there is surely something that we didn't do in the proper way, we have the same feeling.. but at the same time we need to put a boundary on the amount of time spent in investigating bulk loading :( [13:26:24] (PS1) Addshore: Use webrequest in wikidata/articleplaceholder_metrics [analytics/refinery] - https://gerrit.wikimedia.org/r/298726 (https://phabricator.wikimedia.org/T138500) [13:26:36] elukey: oh, yes, i totally understand [13:26:49] joal started to bulk load with the "old" way, and we should be done in ~2 months if I got it correctly [13:27:20] joal: I think I have made all of the patches needed for the 'plan' [13:27:23] but if we load data backwards (like more recent months first) we could in theory put a node or more in production early [13:30:17] i wish there had been more time to test time-window compaction strategy [13:30:31] i wish i had maybe pushed that harder early on [13:30:46] FWIW, it is now in mainline, and deprectes date-tiered [13:31:04] deprecates, (sheesh can't type) [13:45:21] !log restarting hadoop nodemanagers to apply log aggregation retention check interval change [13:45:27] (will check oozie jobs after :p ) [13:52:29] urandom: yeah I know, time constraints :( [13:52:37] ottomata: o/ [13:53:43] hiii [14:12:58] addshore: , applied [14:12:59] https://gist.github.com/ottomata/ebc0798bc93333745adda131e197068c [14:13:06] those are the crons that should be present, ja? ^ [14:13:26] yup, looks good! [14:14:00] cool [14:14:19] And I just merged everything in my scripts repo :) [14:14:33] but afaik that should be the last puppet change needed really! [14:14:50] So many thanks! :) I must buy you a beer! [14:15:23] ottomata: I have two questions about scap and permissions for you if you have time [14:16:10] oh ottomata how often does puppet run? could you run it once more to pull the things I just merged? [14:17:11] addshore: more or less every 20/30 mins [14:22:47] elukey: fo sho! [14:22:49] addshore: ja can do [14:22:56] awesome! :) [14:22:57] elukey: wassup? [14:23:02] Hi urandom and ottomata [14:23:56] ottomata: so two questions - 1) /srv/deployment/analytics/refinery/* is owned by trebuchet:wikidev and the same on analytics1027 by root:root [14:24:14] meanwhile in https://wikitech.wikimedia.org/wiki/Scap3/Migration_Guide it seems that we should use deploy-service [14:24:46] (hi joal!) [14:25:26] elukey: hmm, using deploy-service is optional, we can use whatever user we want, would ahve to make a new user [14:25:30] which might be good for refinery [14:25:32] so 1) do we need to keep these permissions? 2) Do we need to assume that we'll wipe everything before starting or do we want to use a different dir? [14:25:49] elukey: i think if we deploy to the same dir, we might want to wipe eerything, or at least move aside [14:26:04] yeah, maybe something-backup [14:26:06] scap keeps the actual deployment in a cache dir, and moves symlinks around [14:26:12] yep yep [14:26:45] i'm not sure about this, but i think probably we should create a new deployment group ssh key for refinery, or analytics in general [14:26:48] there is one for eventlogging [14:26:50] you can follow its example [14:26:58] maybe just make a 'deploy-analytics' or something [14:27:16] that'll be the trickiest part of the scap move i think [14:34:24] mmmm [14:35:25] so you're saying a new ssh user + keys to add to the trusted store [14:36:59] ja, or, at least new keys [14:37:05] maybe we can reuse a user... [14:37:06] hm [14:37:12] not sure which though [14:37:14] not really a good one [14:37:21] stats? naw. [14:37:49] i think anyway [14:50:34] (CR) Nuria: [C: -1] Use x_analytics header to match special ns for Special:AboutTopic (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/298724 (https://phabricator.wikimedia.org/T138500) (owner: Addshore) [14:51:27] (CR) Nuria: "Query needs to be adjusted, see comments on other changesets." [analytics/refinery] - https://gerrit.wikimedia.org/r/298726 (https://phabricator.wikimedia.org/T138500) (owner: Addshore) [14:52:53] I am wondering if the ssh keys are handled transparently by scap [14:53:03] I don't think that we need to mess with the trusted store etc.. [14:53:20] but we should only care about the users on the analytics1027 host.. [14:53:22] no? [14:53:31] maybe I can follow up with releng [14:58:15] Analytics, Collaboration-Team-Interested, Community-Tech, Editing-Analysis, and 6 others: statistics about edit conflicts according to page type - https://phabricator.wikimedia.org/T139019#2416933 (Zache) Hmmph, in context of current edithaton it seems that it would be very nice that in case of e... [15:02:19] nuria_: x_analytics_map.ns may not always be populated. Does that need special handeling? [15:05:07] elukey: no, they are not transparently handled [15:05:17] 20after4 attempted to puppetize that [15:05:22] but it didn't go through with ops [15:05:26] and he abandoned the attempt [15:05:37] elukey: check out all the eventlogging scap stuff [15:05:42] it should work just like that i think [15:06:01] adding the a new key to the key store is a manual process though [15:06:03] but you only have to do it once [15:06:15] but, ja, ask releng, they might have advice [15:06:19] joal: yt? [15:06:26] Hi ottomata [15:06:36] ottomata: wasup? [15:06:51] so, i'm looking into the ovewrite redirect thing [15:07:18] afaict, revisions are not orphaned [15:07:37] they are properly moved into the archive table when the previously existant redirect page is deleted [15:08:05] would it be useful to have the lastest revision of the deleted redirect page, as well as its page id? [15:08:15] we could do something similiar for that as we did for the other page states [15:08:19] and save it in a similiar subobject [15:08:21] hah [15:08:23] page move is so crazy [15:08:38] so many page ids and revisions touched [15:09:14] 1. target page state altered, so old and new state. [15:09:14] 2. new redirect page created [15:09:14] 3. possible redirect page at new title deleted and all revisions archived [15:09:31] but ja, coudl do something like [15:10:01] overwritten_redirect_page_state: (i'm chaging _info to _state) [15:10:01] page_id [15:10:01] page_title [15:10:01] page_namespace [15:10:01] rev_id [15:10:03] just like we do for the others [15:10:20] even t hough title and namespace are redundant (again) [15:10:23] thoughts? [15:10:40] ottomata: why not [15:10:57] hahah [15:11:06] ottomata: I don't like the _state too much (makes me think of events vs statefull). but it's a detail [15:11:20] joal: this event is recording a change in state [15:11:21] (CR) Addshore: Use x_analytics header to match special ns for Special:AboutTopic (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/298724 (https://phabricator.wikimedia.org/T138500) (owner: Addshore) [15:11:24] no? [15:11:29] ottomata: What Ilike with your suggestions is that it makes more things explicit [15:11:58] normal events don't also include the old state, but we kinda have to with mw db [15:12:04] since thigns are just moved all over the place [15:12:33] ottomata: correct, it's a state change - And the names we use for other state changes were _change [15:12:33] ok joal, i'll do this and put up a patch and you can see it and we can work on it then [15:12:41] oh its still change [15:12:51] just the subojects i'm calling _state instead of info [15:12:52] so [15:12:53] ottomata: Like revision_visibility_change [15:12:55] ja [15:13:02] that would have [15:13:09] old_visibility_state, new_visibility_state [15:13:12] That would mean page_move becomes page_change? [15:13:16] naw [15:13:18] :) [15:13:25] haha [15:13:26] excapt [15:13:27] ha [15:13:28] i mean [15:13:34] i guess it sorta is, right? not page change [15:13:34] but [15:13:36] page_title_change [15:13:37] :p[ [15:13:45] hmmm [15:13:51] i don't think we should change it, but that is more correct [15:14:06] ottomata: we used the _change over other events because there was _old and _new --> state changed [15:14:08] ottomata: could you +2 this one https://gerrit.wikimedia.org/r/#/c/298743/ (missed in the last change.... again...) *facepalm* [15:14:22] joal: a page_delete is emitted during a redirect overwrite [15:14:24] which makes sense [15:14:31] in here, we are exactly in the same pattern :) [15:14:34] ok [15:15:02] ja [15:15:24] addshore: done. [15:15:27] joal: yeahhhh hm. [15:15:28] thanks! [15:15:30] i dunno, what do you think [15:15:31] haha [15:15:34] we are making SO many changes [15:15:38] page_title_change? [15:15:38] haha [15:15:48] ottomata: I think we can keep move [15:15:52] ja i htink so too [15:15:55] ottomata: it's clearer [15:16:01] ha,i mean, we are adding all this extra info into page_move [15:16:08] but, that info does exist in other events [15:16:13] ottomata: Then the inner name of states, state works better than info [15:16:15] a page move emits [15:16:33] page_move, 2 revision_creates (if leaving behind a redirect), and a page_delete (if overwriting a redirect) [15:16:42] ottomata: I wouldn't mind that redundancy [15:16:54] yea, me neither, easier to work with more info than less [15:17:06] ottomata: I also actually think it would be great to have a page_create event, even if for now it's also a revision_create one [15:17:12] ottomata: agreed [15:17:14] ja [15:17:19] ottomata: +1 for your suggestion :) [15:17:29] ok, will send patch in a bit [15:17:32] this requires a mw core change too [15:17:50] man this is a LOT to keep in your head [15:17:51] 3 repos [15:17:55] haha [15:17:57] so many changes [15:18:14] dan and marcel's brains must be bursting [15:18:28] what what? [15:19:03] hehe [15:21:49] joal: any news about these page_creation_dts? [15:21:51] do we need them? [15:22:44] ottomata: Arf, forgot to ask milimetric [15:22:57] hi [15:23:12] ottomata: I think we don't, mforns agreed yesterday, but we'd rather have milimetric opinion :) [15:23:17] milimetric: batcave? [15:23:36] k [15:23:51] ottomata, hi, can you quickly look at https://gerrit.wikimedia.org/r/#/c/298605/ please? :] [15:25:11] mforns: don't know nuthin about it, but happy to merge, shall I? [15:25:36] ottomata, the multimedia dashboard is stalled because of that, the output directory was wrong [15:25:43] it should be multimedia-health [15:26:38] ottomata, so basically since we refactored the puppet code for reportupdater, those reports weren't updating in the right folder, this change fixes that [15:26:55] so if it makes sense to you and you can merge it, would be awesome :] [15:27:13] k [15:27:29] thanks! [15:27:48] ottomata: just read the messages, thanks for the pointers :) [15:28:06] :) [15:28:33] addshore: I think it will help to run your queries against webrequest table, as they are now they will take a long time, there are other issues besides the ns [15:28:59] addshore: like the view_count field does not exists [15:29:23] ottomata: Agred with milimetric : Let's remove them ! [15:29:27] great [15:29:36] ottomata: Same as with sha1 (if needed, we'll add them later) [15:29:39] yup, but view count can be resolved by instead doing a count(*) [15:29:51] joal: aaah, sha1 nooooo! [15:29:52] :) [15:29:55] sha1 for what? revisions? [15:30:06] I just tested the query (although there is no data for it to select) and it ran in 321.374 seconds [15:30:11] huhuhu milimetric, no, for revision_visibility [15:30:28] milimetric: a boolean if sha1 is visible [15:30:31] not an actual sha1 [15:30:36] milimetric: in revision_visibility_change, sha1 visibility is equivalent to text visibility [15:30:43] So we decided to keep on ly one [15:31:12] oh! [15:31:15] ok, that's fine [15:31:22] I thought you meant get rid of rev_sha1 [15:31:28] which is hugely important :) [15:32:59] (PS1) Addshore: Alter where config is read from [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298776 [15:33:14] (PS1) Addshore: Alter where config is read from [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298777 [15:33:21] (CR) Addshore: [C: 2] Alter where config is read from [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298777 (owner: Addshore) [15:33:31] (CR) Addshore: [C: 2] Alter where config is read from [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298776 (owner: Addshore) [15:34:27] joal: do you prefer [15:34:41] 'ovewritten_redirect_page_state', or just 'overwritten_page_state' [15:34:49] it is true that the ovewritten page will always be a redirect [15:34:59] but maybe that is not important to have in the name, and we can just put it in the coments [15:35:09] ? [15:35:17] (PS2) Addshore: Use x_analytics header to match special ns for Special:AboutTopic [analytics/refinery/source] - https://gerrit.wikimedia.org/r/298724 (https://phabricator.wikimedia.org/T138500) [15:37:05] (PS2) Addshore: Use x_analytics header to match special page name [analytics/refinery/source] - https://gerrit.wikimedia.org/r/298725 (https://phabricator.wikimedia.org/T138500) [15:40:14] (Merged) jenkins-bot: Alter where config is read from [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298777 (owner: Addshore) [15:40:17] (Merged) jenkins-bot: Alter where config is read from [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298776 (owner: Addshore) [15:44:17] Analytics-Kanban: Notify all schema owners that the auto-purging is about to start {tick} - https://phabricator.wikimedia.org/T135191#2291279 (mforns) [15:50:45] Analytics-Kanban: Multimedia health metrics stalled since Feb 2016 - https://phabricator.wikimedia.org/T140137#2458211 (Jdforrester-WMF) Open>Resolved Thank you everyone! [15:56:43] wikimedia/mediawiki-extensions-EventLogging#568 (wmf/1.28.0-wmf.10 - 4660fab : Chad Horohoe): The build has errored. [15:56:43] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/4660fab4d35b [15:56:43] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/144496968 [16:09:54] nuria_: so, other things I could add to the query on webrequest, for now at least the thing I am checking is only deployed on wikipedias, so normalized_host.project_class = 'wikipedia' could help [16:10:20] Also none of the rows that I want will have a page_id so I guess either page_id = 0 or page_id = NULL ? [16:10:46] addshore: your select as you have it would not work, do try it on hive command line and see errors. there are a few that i see [16:11:04] addshore: but more filtering will be good [16:11:05] hmm, in PS2 there are no errors! [16:12:20] page_id (which is an int) says "This may not always be set, even if the page is actually a pageview." does that mean there will be Nulls or 0s? :) [16:17:26] addshore: do take a look at the table, likely is null [16:18:16] What is the most efficent way to test that? [16:25:45] addshore: look at records of webrequest table, you are going to find many nulls [16:26:16] So, for example I just did "select * from webrequest where year = 2016 AND month = 7 AND day = 12 AND hour = 2 AND page_id IS NULL limit 1;" but is there a better way? [16:26:17] addshore: you can do query , limit your data to 1 hour and inspect what you get for possible gotchas. [16:26:24] ahh, okay! good good! [16:26:56] addshore: the difference between using webrequest and pageview_hourly is that you are looking (on webrequest) at many requests that are not pageviews [16:27:14] yup :/ [16:27:22] addshore: and thus, they have defaults for fields that are populated in a regular pageview [16:27:59] The best long term solution might be to carry over the namespace ID into the pageview table? and for special pages the normalized special page name? I'm not sure how easy / hard that is [16:36:30] addshore: not likely to happen on the near term [16:36:44] addshore: but do file a ticket for it if you feel it is a must [16:37:14] well, not a must, but then these queries must be against webrequest not pageview :) [17:18:23] milimetric: , mforns, might need a little brain bouncing with schema stuff in a bit, first gotta get lunch though [17:18:29] milimetric: will you be around for a bit? [17:18:29] ok [17:18:32] yes [17:18:33] all day [17:18:38] also, mforns we can talk about public datasets or whatever it was [17:18:41] ottomata, ok ping :] [17:18:42] ottomata: we should just hang out also [17:18:43] if you are still around when i get back [17:18:45] ja [17:18:49] ok ottomata [17:18:56] yall wanted a mw core /eventbus modification tutorial too, ja? [17:19:06] yes [17:19:14] k will ping yall in a bit... [17:19:17] ottomata, I'll be here for 30 mins, then will be away for one hour and then back [17:19:18] bon apetit [17:19:29] mforns: you busy for those 30? [17:19:34] nope [17:19:38] cave! [17:19:41] k [17:20:51] going offline people! byyyee o/ [17:29:35] (PS3) Addshore: Use x_analytics header to match special ns for Special:AboutTopic [analytics/refinery/source] - https://gerrit.wikimedia.org/r/298724 (https://phabricator.wikimedia.org/T138500) [17:29:51] (PS3) Addshore: Use x_analytics header to match special page name [analytics/refinery/source] - https://gerrit.wikimedia.org/r/298725 (https://phabricator.wikimedia.org/T138500) [17:44:45] (PS2) Addshore: Ignore namespace when matching Special:AboutTopic [analytics/refinery/source] - https://gerrit.wikimedia.org/r/298723 (https://phabricator.wikimedia.org/T138500) [17:45:38] (PS4) Addshore: Use x_analytics header to match special ns for Special:AboutTopic [analytics/refinery/source] - https://gerrit.wikimedia.org/r/298724 (https://phabricator.wikimedia.org/T138500) [17:45:45] (PS4) Addshore: Use x_analytics header to match special page name [analytics/refinery/source] - https://gerrit.wikimedia.org/r/298725 (https://phabricator.wikimedia.org/T138500) [17:51:14] mforns: you froze I think [17:51:27] milimetric, I can't hear you [17:51:30] ok will rejoin [17:51:39] reportcard very slow http://www.webpagetest.org/result/160713_B4_165X/1/details/ [18:00:28] (PS1) Addshore: Stop generating data for wikidata-api-wbgetclaims dash [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298806 (https://phabricator.wikimedia.org/T140231) [18:00:59] (PS1) Addshore: Stop generating data for wikidata-api-wbgetclaims dash [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298807 (https://phabricator.wikimedia.org/T140231) [18:01:05] (CR) Addshore: [C: 2] Stop generating data for wikidata-api-wbgetclaims dash [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298807 (https://phabricator.wikimedia.org/T140231) (owner: Addshore) [18:01:15] (CR) Addshore: [C: 2] Stop generating data for wikidata-api-wbgetclaims dash [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298806 (https://phabricator.wikimedia.org/T140231) (owner: Addshore) [18:01:31] (Merged) jenkins-bot: Stop generating data for wikidata-api-wbgetclaims dash [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298807 (https://phabricator.wikimedia.org/T140231) (owner: Addshore) [18:01:42] (Merged) jenkins-bot: Stop generating data for wikidata-api-wbgetclaims dash [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298806 (https://phabricator.wikimedia.org/T140231) (owner: Addshore) [18:12:07] Nemo_bis: sorry but we do not support the reportcard anymore, we know we need to replace it but it will not happen in the near term [18:12:39] Nemo_bis: it is been slow since day 1 as limn loads every page of js know to man [18:17:20] *every piece [18:18:51] nuria_: no, it got slower [18:19:19] Nemo_bis: I believe you but given that it has gotten no maintenance is not surprising [18:19:29] And I'm just letting you know, not saying any action is needed [18:20:50] Nemo_bis: ok, thank you i feel i wanted to call out that we have not done any work to it and to be honest, my preference would be to do away with it entirely [18:22:16] * Nemo_bis has never used that website so will not complain [18:24:20] Nemo_bis: ll right! [18:24:33] addshore: who is populating the fields that you are putting on x-analytics? [18:25:04] nuria_: they are added by the WikimediaEvents extension. You should see the patch that each is added by in the Depends-On field in the commit message [18:25:32] (CR) Nuria: [C: -1] "I am kind of loosing track a bit with so many CRs but none of these X-analytics headers are documented here: https://wikitech.wikimedia.or" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/298725 (https://phabricator.wikimedia.org/T138500) (owner: Addshore) [18:25:42] addshore: add to where? [18:25:50] The changes are merged in master for the extension and also in the branch being deployed 30 mins :) [18:26:01] nuria_: added to the x-analytics header [18:26:14] as I thought, clear CPU overload https://tools.wmflabs.org/nagf/?project=analytics#h_limn1 [18:26:43] addshore: Added to x-nalytics header on the request? [18:27:07] addshore: as in "appended to the http header?" [18:27:10] ns => https://github.com/wikimedia/mediawiki-extensions-WikimediaEvents/blob/master/WikimediaEventsHooks.php#L38 special => https://github.com/wikimedia/mediawiki-extensions-WikimediaEvents/blob/master/WikimediaEventsHooks.php#L45 [18:28:08] addshore: and (asking from the ignorance) why are we using that extension instead of the x-analytics one? [18:28:43] Now that I don't know, *looks around quickly* [18:29:37] Okay, as far as I can tell the XAnalytics extension is only to provide a means to add an xanalytics header, although by default it would be empty [18:30:19] right, it should manage entries on that header: https://phabricator.wikimedia.org/rEWMV017f9d845c45697c1e84b50d8a16e65f60b68fee [18:30:35] ^ addshore [18:30:46] other extensions then hook in to actually add things to the header, WikimediaEvents is where the Wikimedia specific header things are! [18:32:29] addshore: seems a bit convoluted but ok, regardless , values added to x-analytics need to be documented here: https://wikitech.wikimedia.org/wiki/X-Analytics [18:33:06] Ahh, I will update the docs! (I may also add a link to that doc page from the code so I don't miss it again) [18:34:00] addshore: how are things set on header on requests that are cached? [18:34:23] I imagine the header would be the same as the response would be the same. [18:34:58] addshore: mmm.. x-analytics are not sent on response [18:35:25] addshore: are you sure values are being dumped into webrequest table? [18:36:02] They are sent on the response! https://usercontent.irccloud-cdn.com/file/j0IZbpxQ/ [18:36:37] And yes I am 100% sure they will make it to the webrequest table, I did the same with 'loggedIn' not too long ago! [18:37:29] addshore: sent on response for any one request? [18:37:55] addshore: ok, having cleared that the values are in webrequest we just need to document them [18:38:01] yes, well any 1 request to mediawiki :) [18:38:19] I'll make a patch adding a link to the docs page in the code now, then add them to the docs! [18:38:54] oh, except for gerrit is currently down for me, so I can't push the patch :D [18:47:08] nuria_: https://wikitech.wikimedia.org/w/index.php?title=X-Analytics&diff=746636&oldid=710297 all updated [18:50:01] Also added a note to the code https://gerrit.wikimedia.org/r/#/c/298815/ [18:53:13] addshore: thank you [18:53:45] addshore: do run your selects and fine tune them a bit , you do not need to screen every record, you can filter by content-type [19:00:16] is having more filters ever a bad thing? [19:01:00] as I understand it is_pageview = TRUE is a pretty solid filter and should make the content-type filter redundant [19:01:44] Also, does the order of the conditions matter at all? ie. should the filters that exclude more things be toward the start of the query? [19:05:40] nuria_: can you send invite to the other quarterly review - I might attend some of it [19:05:52] madhuvishy: yes, let me see [19:06:01] no hurry [19:09:51] milimetric: got more page move qs [19:09:53] am very confused [19:10:23] ok, sure, cave! [19:34:53] milimetric, ottomata, back [19:36:51] Analytics-Cluster, Analytics-Kanban, Operations, ops-eqiad: analytics1049.eqiad.wmnet disk failure - https://phabricator.wikimedia.org/T137273#2459359 (Cmjohnson) Open>Resolved [20:03:49] back too, was in 1:1 [20:03:55] ok batcave mforns? milimetric? [20:04:59] omw [20:25:23] Analytics-Tech-community-metrics: GrimoireLib sometimes displays different names for same user ID; link does not display (existing) contributor data - https://phabricator.wikimedia.org/T140299#2459564 (Aklapper) [20:25:32] Analytics-Tech-community-metrics: GrimoireLib sometimes displays different names for same user ID; link does not display (existing) contributor data - https://phabricator.wikimedia.org/T140299#2459579 (Aklapper) p:Triage>Low [20:25:47] Analytics-Tech-community-metrics, Developer-Relations (Apr-Jun-2016): Make GrimoireLib display *one* consistent name for one user - https://phabricator.wikimedia.org/T118169#1793275 (Aklapper) Issue in last comment split into dedicated T140299.