[08:25:43] 10Analytics, 06Developer-Relations, 10MediaWiki-API, 06Reading-Admin, and 3 others: Is User-Agent data PII when associated with Action API requests? - https://phabricator.wikimedia.org/T154912#3090359 (10Qgil) Sorry for being late to this party. From a Developer Relations point of view, being able to extra... [08:29:15] (03PS23) 10Joal: Add mediawiki history spark jobs to refinery-job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/325312 (https://phabricator.wikimedia.org/T144717) [08:31:27] (03PS5) 10Joal: Add oozie jobs for mw history denormalized [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341030 (https://phabricator.wikimedia.org/T160074) [08:35:09] 06Analytics-Kanban: Productionise standard metrics from mediawiki denormalized history - https://phabricator.wikimedia.org/T160151#3090367 (10JAllemandou) [08:35:53] (03PS11) 10Joal: [WIP] Port standard metrics to reconstructed history [analytics/refinery] - 10https://gerrit.wikimedia.org/r/322103 (https://phabricator.wikimedia.org/T160151) (owner: 10Milimetric) [08:36:20] (03PS6) 10Joal: Add oozie jobs for mw history denormalized [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341030 (https://phabricator.wikimedia.org/T160074) [08:38:11] 06Analytics-Kanban: Update sqoop job to add infra and version partition - https://phabricator.wikimedia.org/T160152#3090385 (10JAllemandou) [08:38:35] (03PS2) 10Joal: Update sqoop and namespace_map scripts for versioning [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341586 (https://phabricator.wikimedia.org/T160152) [08:40:30] 06Analytics-Kanban: Provide 2 static files to differenciate prod and labs projects to sqoop in - https://phabricator.wikimedia.org/T160153#3090404 (10JAllemandou) [08:41:57] 06Analytics-Kanban: Synchronise changes for productionisation of mediawiki history jobs - https://phabricator.wikimedia.org/T160154#3090422 (10JAllemandou) [08:43:20] 06Analytics-Kanban: Synchronise changes for productionisation of mediawiki history jobs - https://phabricator.wikimedia.org/T160154#3090422 (10JAllemandou) [08:45:00] 06Analytics-Kanban: Create hive tables and queries for standrard metrics computation out of mediawiki denormalized history - https://phabricator.wikimedia.org/T160155#3090446 (10JAllemandou) [08:45:25] (03PS12) 10Joal: [WIP] Port standard metrics to reconstructed history [analytics/refinery] - 10https://gerrit.wikimedia.org/r/322103 (https://phabricator.wikimedia.org/T160155) (owner: 10Milimetric) [08:49:27] 06Analytics-Kanban: Synchronise changes for productionisation of mediawiki history jobs - https://phabricator.wikimedia.org/T160154#3090468 (10JAllemandou) [08:49:54] (03PS7) 10Joal: Add oozie jobs for mw history denormalized [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341030 (https://phabricator.wikimedia.org/T160074) [08:49:57] Yay ! [08:50:08] Finally my stuff on history is organised :) [08:56:20] \o/ [09:50:59] for those of you using R: http://www.talosintelligence.com/reports/TALOS-2016-0227/ [09:58:53] 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review: Change userAgent field to user_agent_map in EventCapsule - https://phabricator.wikimedia.org/T153207#3090604 (10Nemo_bis) Thanks for the note. [10:40:51] helloooo [10:47:42] o/ [10:54:21] !log executed set global innodb_flush_log_at_trx_commit=2; on bohrium as test [10:54:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:24:07] * elukey lunch! [11:51:38] joal ok to run the job against production v2? :) [11:58:18] fdans: yes [11:58:30] fdans: yesterday job took ~çhours [11:58:34] 9 sorry [11:58:40] niiice [11:58:42] fdans: Let's double a parameter [11:58:53] joal: double? [11:58:59] double check [11:59:08] I have problems with my typing today :) [11:59:08] sure, which one [12:00:31] cassandra_parallel_loaders fdans [12:00:39] let's seeeee [12:00:55] joal: it's set to 1 [12:02:10] fdans: let's put it to 12 [12:02:22] yeeeeeahhhh turbo [12:02:25] anything else joal? [12:02:35] fdans: Let's also put cassandra_nodes to 12 instead of 6 [12:02:48] done [12:03:41] fdans: I'm a bit wondering about data quantity: on aqs1004-a, old projectview are 424M, new are 357M [12:03:48] elukey: any idea?--^ [12:04:42] and it's not possible to do a row, right? [12:04:48] row count* [12:06:13] changing keyspace name to local_group_default_T_pageviews_per_project_v2 [12:06:19] fdans: [12:06:32] fdans: It might be due to number of sstables being smaller [12:06:46] right [12:06:52] There are 6 tables in old, 4 in new [12:07:10] So data should be better compressed in new [12:07:24] so, proceed right joal ? [12:08:14] no fdans, waiting for elukey [12:08:22] gotcha [12:08:39] in the meantime, uploading to hdfs... [12:23:55] (03PS1) 10Joal: [WIP] Add oozie job for standard metrics computation [analytics/refinery] - 10https://gerrit.wikimedia.org/r/342197 (https://phabricator.wikimedia.org/T160151) [12:24:41] * elukey back! [12:25:27] joal: it makes sense to me that new data could be smaller due to less sstables, but.. maybe checking row counts wouldn't be super bad! [12:32:45] elukey: row count is hard ! [12:32:59] elukey: I used nodetool-a cfstats, and there are diffs :( [12:35:21] ah yes it is hard [12:35:25] of course [12:37:55] elukey: in cfstats, old says 9603 keys while new says 6912 [12:38:09] what's the name of the new one? [12:38:46] also those are the keys on the instance? [12:39:10] if so we could sum up the estimates for a cassandra rack [12:39:18] and see if they change a lot [12:39:26] but not sure if it is reliable enough [12:43:27] elukey: I know diffs can be big (for instance cassandra can say 128 keys while there actually have been only 2 inserts° [12:43:50] elukey: process seems correct, but I don't know what to think :S [12:45:06] joal: so what's the status? You guys have loaded the new keyspace and you are waiting to make the switch? [12:45:15] (sorry I didn't follow closely :( ) [12:46:07] elukey: fdans has tested loading: works fine [12:46:17] elukey: he tested on a fake keyspace [12:46:32] * fdans nods [12:46:55] elukey: now I'm wondering about the diffs in size / keys in the old keyspace and the new (test) one [12:47:22] elukey: we want to start loading on real v2 keyspace today, but I wanted to try to have confimration on data before [12:47:33] elukey: looks like it's not really feasible [12:47:57] yeah it seems a bit weird though that simple things like counting keys are so difficult [12:48:00] :/ [12:48:06] I blame fdans [12:48:07] :D [12:49:04] yeah not being able to count keys, totally my bad :P [12:49:21] is there a way to make tests like verifying data at borders, randomly in the middle of the key range, etc.. just to check that we have the expected results? (you might already have done it or it could be useless) [12:49:37] it is not like checking the raw number of keys [12:49:42] but it could give a good indication [12:49:58] size in Mb is not good, number of keys is kinda weird to check [12:50:03] elukey: completely feasible [12:50:10] could write an acceptance test that queries every timestamp to check for holes in data? (this is ignorant me talking) [12:51:34] fdans: feasible [12:52:10] fdans: I'd go for elukey way though: take a few random projects, random other dimensions value, and check for some timestamp [12:53:28] likee, checking that the number of results for a range is consistent with production? [12:54:55] (joal ^) [12:55:33] fdans: that would do [12:55:42] fdans: currently trying to it :) [12:56:08] joal: I'd try, but as you know I have no access to aqs prod [13:02:36] elukey, fdans: Looked all all-projects, all-access, all-user, 3 granularities - Everything looks correct (didn't check the full timestamp range though° [13:03:02] fdans: However there are some prblems with the imported data: hive query needs to be rerun with some limits [13:03:59] elukey: So, I don't know if the thing is related to single import vs multi-import + cluster resized, but the data space taken by the new data is quite smaller :) [13:08:03] elukey, fdans: did some checks with counting rows for a single key -- seems fine [13:08:13] fdans: quick baticueva for hive limits? [13:08:22] joal: on my way! [13:09:15] * joal dances La baticueva - tcha tcha tcha [13:20:02] 06Analytics-Kanban, 06Operations, 10Traffic, 06Wikipedia-iOS-App-Backlog, and 2 others: Periodic 500s from piwik.wikimedia.org - https://phabricator.wikimedia.org/T154558#3090867 (10elukey) I followed https://piwik.org/docs/optimize-how-to/ and applied `set global innodb_flush_log_at_trx_commit=2;` as root... [13:36:45] (03PS1) 10Fdans: Change keyspace name in per project cassandra oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/342205 (https://phabricator.wikimedia.org/T156312) [14:06:17] taking a reak a-team :) [14:06:27] k joal :] [14:50:16] fdans, helloooo [14:50:38] do you have 10 minutes for legacy pageviews synchup before standup? [15:03:25] mforns: gimme 10min and I'll be there! [15:03:32] Finishing my 4pm lunch [15:03:51] 🇪🇸 [15:08:04] mforns: cuando quieras [15:21:47] 06Analytics-Kanban: Wikistats 2.0 prototype: Dashboard page - https://phabricator.wikimedia.org/T160176#3091141 (10Milimetric) [15:24:22] fdans, hellooouu [15:38:22] milimetric: https://grafana.wikimedia.org/dashboard/file/server-board.json?var-server=bohrium&panelId=18&fullscreen&from=now-3h&to=now [15:38:48] \o/ [15:39:23] you're excited that there was a big spike in disk usage? [15:39:50] !log applied innodb_buffer_pool_size = 512M and restarted mysql on bohrium [15:39:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:39:58] milimetric: --^ :) [15:40:06] check the last week's metrics [15:41:18] well let's see over the next couple of days if things stay low [15:41:30] looks like the stock market to me :) [15:41:34] there are some basic mysql tunables that might give us a lot of relief [15:41:46] hahaahah yes I realized it, sorry, I thought it was clearer [15:41:55] but basically the spike dropped right after the restart [15:42:16] ok, cool. Looks like it was low a couple days earlier this week, but maybe it's just low usage [15:42:22] (on piwik in general) [15:42:56] https://grafana.wikimedia.org/dashboard/file/server-board.json?var-server=bohrium&panelId=7&fullscreen&from=now-24h&to=now might be better [15:42:57] yeah, makes sense, you might be right. I mean, maybe piwik is a lot better than even the piwik developers think and they never had someone like you to tune it. [15:43:13] ooooh, aaaaaah [15:43:15] at this time of the day CPU patterns (IOWait above all) rise a lot [15:43:34] but we'd need a couple of days of data [15:43:52] I also made the binlog not fsyncing after every transaction [15:43:59] earlier on during the day [15:44:54] sync_binlog is 0 now, and from what I gathered is not that good :D [15:45:04] so we could set it maybe to something like 1000 [15:45:06] or more [15:45:22] very relaxed contraints for the moment [15:45:38] the other question that I have is.. do we have backups? [15:45:46] hm.. where does it sync the binlog, nothing replicates from here... [15:46:11] no backups, no [15:46:30] I think we specified that and people said it was fine early on [15:46:34] not sure how they feel now [15:47:06] milimetric: well to the disk no? Or I misunderstood? [15:47:17] let's add backups :D [15:47:29] sure, backups are good [15:48:04] * milimetric reads what sync_binlog is [15:49:26] oh ok, nothing to do with replication [15:49:27] 06Analytics-Kanban, 06Operations, 10Traffic, 06Wikipedia-iOS-App-Backlog, and 2 others: Periodic 500s from piwik.wikimedia.org - https://phabricator.wikimedia.org/T154558#3091217 (10elukey) Added `innodb_buffer_pool_size = 512M` and `innodb_flush_log_at_trx_commit 2` to `/etc/mysql/my.cnf` restarted mysql... [15:49:35] elukey: so 0 is "whenever the OS feels like it" [15:49:44] and 1000 is after every 1000 writes, wouldn't 1000 be slower? [15:49:58] or do you think 1000 would smooth it out and be less spikey [15:50:08] not sure what "whenever the OS feels like it" means in real life [15:51:35] milimetric: it depends from tons of factor, but we can use a value that guarantees a bit of resiliency [15:51:50] 1000 is a lot, and shouldn't affect that much bohrium.. [15:51:53] but we can tune it [15:52:07] for the moment I'd leave the current settings, and check out on monday [15:53:20] cool [15:54:29] * milimetric has complex thoughts on the aesthetics of tuning [16:02:13] * urandom waves at elukey [16:04:16] * elukey waves at urandom remembering about the meeting invite not sent [16:04:34] :) [16:05:43] urandom: would you be free in ~1 hour? [16:08:34] 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review: Change userAgent field to user_agent_map in EventCapsule - https://phabricator.wikimedia.org/T153207#3091241 (10Nuria) a:05fdans>03Nuria [16:08:34] elukey: sure [16:09:13] urandom: thanks! [16:09:26] urandom: I saved all the logs on /home/elukey on aqs* [16:09:46] and timeline is https://wikitech.wikimedia.org/wiki/Incident_documentation/20170223-AQS [16:11:43] hangout with the andreescu twins [16:11:50] https://usercontent.irccloud-cdn.com/file/DQMSjo81/Screen%20Shot%202017-03-10%20at%2017.11.09.png [16:12:05] 10+ to the shot [16:16:08] fdans: if you want to be part of aqs-users let's create the task :) [16:16:13] Hey milimetric: I know you're deep in view [16:16:26] lol [16:16:29] hey joal [16:16:30] milimetric: would you have a quick look at some patches next week? [16:16:30] elukey: gimme a few minutes, I'm with marcel :) [16:16:38] fdans: even on Monday :) [16:16:40] joal: I can look now [16:16:50] am I a reviewer on them? [16:17:05] I think you are, let me check [16:18:20] the two from this morning, yes? [16:18:26] 9:38? [16:18:34] milimetric: I'm think of: https://gerrit.wikimedia.org/r/#/c/341586/ , https://gerrit.wikimedia.org/r/#/c/341030/, and that https://gerrit.wikimedia.org/r/#/c/325312/ [16:18:45] milimetric: others are from next week [16:18:49] FOR [16:18:52] sorry [16:19:06] ottomata: agreed blogpost is ready for comms, you did tried the code sample on plain chrome no extension, right? [16:19:10] mforns: I'd be interested if you could have a look at the big scala one (checking for recent modifs) [16:19:28] joal, sure will do on monday :] [16:20:12] Thanks mforns - 2 things are to be chaecked: hack to add infra/version partitions, and hack not to use hiveContext [16:20:25] ya [16:20:28] it worked [16:20:48] milimetric: 1 of these patches is small, 1 if big, and one is huge - No rush :) [16:22:46] 10Analytics, 06Developer-Relations, 10MediaWiki-API, 06Reading-Admin, and 3 others: Is User-Agent data PII when associated with Action API requests? - https://phabricator.wikimedia.org/T154912#3091247 (10Nuria) >Sorry for being late to this party. From a Developer Relations point of view, being able to ext... [16:30:08] ottomata: k, let's see if coms answers the thread [16:38:29] (03CR) 10Milimetric: "just style nits, feel free to merge whether or not you fix that stuff." (034 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341586 (https://phabricator.wikimedia.org/T160152) (owner: 10Joal) [16:38:45] I gotta make lunch and will look at the others later jo [16:41:48] nuria: i'm trying to figure out which of krinkles' patches need merged [16:41:49] it hink they are [16:41:55] https://gerrit.wikimedia.org/r/#/c/341724/3 and https://gerrit.wikimedia.org/r/#/c/341723/3 [16:42:04] but, the first one there has krinkle's -1 on it [16:42:06] ottomata: can talk in abit [16:42:13] ok [17:02:54] urandom: I am free now if you are! [17:03:02] otherwise we can do it next week [17:03:04] no rush [17:06:35] elukey: sure [17:07:44] elukey: should i ring you, or...? [17:09:13] urandom: sure! [17:29:41] 06Analytics-Kanban, 10ChangeProp, 06Operations, 10Reading-Web-Trending-Service, 06Services (watching): Upgrade librdkafka 0.9.4 on SCB and Varnishes - https://phabricator.wikimedia.org/T159379#3091385 (10Ottomata) FYI, I've installed librrdkafka on cp1042, cp1058 (cache misc) and cp1052 (cache text) serv... [17:40:13] joal: in T160083, you listed two download-project-namespace-map jobs [17:40:13] T160083: Create cron job in puppet sqooping prod and labs DBs - https://phabricator.wikimedia.org/T160083 [17:40:17] one for prod and one for labs [17:40:29] but, the -x arg is the same [17:40:31] will that work? [17:51:44] * elukey afk! [17:51:47] byyeeee [18:20:57] (03CR) 10Ottomata: "Not sure I fully understand this --version flag. It seems to be to be just a free-form partition that allows us to do full sqoop imports " [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341586 (https://phabricator.wikimedia.org/T160152) (owner: 10Joal) [18:21:31] joal: disregard previous questions, i understand after reviewing ^ [18:33:48] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3091603 (10IKhitron) [18:40:27] ottomata: Yeah, I was unsure about the navtiming conversion because it filters 2 schemas instead of one. [18:40:42] ottomata: Wasn't sure if it's discourages to use without calling .filter(), or if its essentially the same as before [18:40:45] discouraged* [18:40:59] Perhaps there's a way to get keys() of handles and pass both to filter() somehow? [18:41:15] I also considered calling filter() twice, but not sure what that will do internally [18:42:01] Krinkle: if you used Kafka, you could just subscribe to the two schema topics you are interested in [18:42:09] but, i don't know the navtiming well [18:43:09] ottomata: Yeah. Makes sense. But for now, would you say it's effectively the same as before with zmq? Not less efficient? Not "bad" to call eventlogging without filter() ? [18:45:03] 10Analytics-Tech-community-metrics, 06Developer-Relations (Jan-Mar-2017): Go through default Kibana widgets; decide which ones are not relevant for us and remove them - https://phabricator.wikimedia.org/T147001#3091653 (10Aklapper) Naming scheme doc for custom items released: https://github.com/grimoirelab/pan... [18:47:50] bye team, have a nice weekend! [18:51:39] hm Krinkle i don't think I understand [18:51:46] in your code I dont' see a removed 'filter' call [18:52:01] ottomata: That's because I wasn't using eventlogging [18:52:18] compare to ve.py, or the doc example of eventlogging [18:52:45] ohhhhhh [18:52:46] gotcha [18:52:53] yeah, Krinkle i don't htink eventlogging filter is doing anything special [18:53:03] so doing filtering on your own would be equivalent [19:00:19] Krinkle: let me know what you want merged, i'm ready :) [19:24:59] ottomata: OK. 10min [19:28:07] 10Analytics-Tech-community-metrics: Updated data in mediawiki-identities DB not deployed onto wikimedia.biterg.io? - https://phabricator.wikimedia.org/T157898#3091831 (10Lcanasdiaz) We are still having some issues with Gerrit (sigh) ... we're working on this until it is fixed, hopefully early next week [20:41:32] * milimetric going out for a bit, will craft some more on the dashboard later