[00:10:24] (Draft1) Paladox: Update svn.wikimedia.org links to phabricator [analytics/wikistats] - https://gerrit.wikimedia.org/r/316289 (https://phabricator.wikimedia.org/T64570) [00:10:26] (Draft2) Paladox: Update svn.wikimedia.org links to phabricator [analytics/wikistats] - https://gerrit.wikimedia.org/r/316289 (https://phabricator.wikimedia.org/T64570) [00:11:37] (CR) Paladox: [C: 1] Replace Bugzilla links by Phabricator links [analytics/wikistats] - https://gerrit.wikimedia.org/r/315417 (owner: Aklapper) [06:13:17] * elukey restarts oozie jobs [06:13:50] from what I can see the problem is always a store overflow timeout :/ [06:14:01] so today I am going to double the limit that we have [06:15:38] !log created oozie coordinator 0034143-160922102909979-oozie-oozi-C to restart webrequest-load-check_sequence_statistics-wf-upload-2016-10-17-1 [06:16:45] !log created oozie coordinator 0034149-160922102909979-oozie-oozi-C to restart webrequest-load-check_sequence_statistics-wf-upload-2016-10-17-2 [06:17:42] !log created oozie coordinator 0034153-160922102909979-oozie-oozi-C to restart webrequest-load-check_sequence_statistics-wf-upload-2016-10-17-3 [06:18:30] !log created oozie coordinator 0034161-160922102909979-oozie-oozi-C to restart webrequest-load-check_sequence_statistics-wf-upload-2016-10-17-4 [06:23:48] ok the good news is that I have some examples of Miley Cyrus crawled links but without End timestamp and VSL timeouts.. [06:27:46] all right brb in a bit, not a wonderful start of the week till now :P [07:03:48] hi all, is there someone awake? :) I am trying to program something java-hadoop related but I cant import any org.apache.hadoop packages on stat [07:33:07] AlexK: hi! can you give us more details about the problem? I am probably not able to help but I am sure it will be useful for whoever is reading :) [07:35:35] Well, I want to perform some analysis over files in hdfs, therefore I want to write a Java program, which uses some hadoop features. Thus I need to import packages like org.apache.hadoop.mapreduce.* which cannot be found by javac ._. [07:35:53] !log created oozie coordinator 0034240-160922102909979-oozie-oozi-C to restart webrequest-load-check_sequence_statistics-wf-upload-2016-10-17-6 [07:38:32] I mean, is there any jar locally available or shall I just put my necessary sources there by myself? I was asuming that there may be some pre installed packages available via path [07:41:13] so on stat1004 I can see a lot of hadoop jars under /usr/lib [07:41:34] what stat host are you using? And also what Java path? [07:41:54] (sorry these are probably trivial questions but I'd need to know a bit more to help) [07:42:03] I am on stat1002 [07:43:02] aye, I just made a beginners mistake I guess... I kind of forgot putting my include dir... sorry for bothering [07:43:05] /usr/lib/hadoop-0.20-mapreduce is there [07:43:21] AlexK: no bother, I am happy to help :) [08:24:57] !log upgraded nodejs on aqs100[56] (already done on aqs1004) [17:16:56] milimetric. mforns_brb : I am going to put my service workers changes on standby again cause latency wise in chrome they hurt. Not a huge lot, but I see an increase on how long does it take to pass the data to the application. Also, tenging this is still too green, devtools works better than last time I tried these changes but not well enough. [17:18:38] joal we good! [17:18:47] ulimit nofiles 65536 [17:18:47] Ohhhyeah :) [17:19:13] (PS9) Nuria: Service Worker to cache locally AQS data [analytics/dashiki] - https://gerrit.wikimedia.org/r/302755 (https://phabricator.wikimedia.org/T138647) [17:19:39] Thanks a lot ottomata ! [17:19:59] milimetric, mforns_brb : changes are tiny though, less than 100 lines of code with comments: https://gerrit.wikimedia.org/r/#/c/302755/9/src/layouts/metrics-by-project/service-worker.js [17:27:55] Analytics: Pageview API: Limit (and document) size of data you can request - https://phabricator.wikimedia.org/T134524#2723307 (MusikAnimal) With the new awesome AQS cluster performance is not as much of an issue now, correct? I will simply say that with #tool-labs-tools-pageviews, an "all-time" query, July... [17:28:10] ottomata: so... do all ulimits changes require a restart of the process? [17:29:30] not necessarily, but if you are changing a hard ulimit, then yes [17:29:34] a-team: sorry if I missed grooming but I tried to help ops with the outage :( [17:29:58] in this case i wouldn't know how to change a soft ulimit for nodemanager though [17:30:08] only root can change hard limiits [17:30:16] Analytics: Pageview API: Limit (and document) size of data you can request - https://phabricator.wikimedia.org/T134524#2723309 (Nuria) We will test limits as to responses and times and update this tcket when we have the data. If we are aiming for a percentile 99 of , < 500 ms response (example) we want to al... [17:30:20] and root spawns this process as yarn user [17:30:36] so the ulimit needs to be changed before the nodemanager is started [18:00:02] * elukey afk! [18:10:16] nuria, do you want us to review it and merge the service workers changes, then? or standby? [18:11:03] mforns: nah, nothing to do for now. From what i see latency wise they will not help, they will still help in case there rae network issues but it cannot be at the cost of hurting perf in any other instance [18:11:18] nuria, ok [18:14:20] joal, yt still? [18:16:05] Analytics: Pageview API: Limit (and document) size of data you can request - https://phabricator.wikimedia.org/T134524#2723457 (MusikAnimal) >>! In T134524#2723309, @Nuria wrote: > We will test limits as to responses and times and update this tcket when we have the data. If we are aiming for a percentile 99... [18:38:16] oh man, I lost the sync-up with recruiting meeting :[ sorry [19:03:15] hey mforns [19:03:20] hi joal :] [19:03:23] mforns: was away for diner :) [19:03:27] Whatsup? [19:04:07] I was looking at the oozie alerts, ops week, I just wanted to know if you already did some reruns or sth? [19:04:33] mforns: I think elukey re-ran everything today [19:04:42] oh ok [19:04:48] ;0 [19:04:58] thx joal [19:05:10] mforns: He logs coord starts in the chan I think if you want to double check :) [19:05:16] np mforns [19:05:22] ok [19:07:59] joal, yes, alerts stop after elukey re-runs the jobs [19:08:01] thx! [19:08:06] no prob [19:08:20] mforns: If you're interested I can tell you how to react [19:08:29] joal, I am :] [19:08:33] mforns: But it can wait a potential real case situation :) [19:08:37] ok mforns :) [19:08:58] whatever you prefer [19:09:01] so when a DATA LOSS ERROR email is sent, it means the load/refine process has not completed [19:09:12] (whatever the reason) [19:09:38] aha [19:09:42] actually I'm not precise enough [19:10:11] so when a DATA LOSS ERROR email is sent, it means the load/refine process has not completed because of problem in sequences number in raw webrequest files [19:10:34] I see [19:10:37] So whether the reason is real data loss or fake data loss (currently it's fake), process has stopped and dependent jobs are not run [19:11:24] aha [19:11:59] In our case, since dataloss os fake (due wrong ordering of rows, therefore sequence numbers) we launch a new job with a different threshold, in order for it not to fail [19:12:45] Regular jobs are managed with a bundle, and here there only one coord of the bundle to relaunch, so we launch dedicated coordinators for that [19:13:02] aha, which threshold is that? [19:13:46] https://github.com/wikimedia/analytics-refinery/blob/master/oozie/webrequest/load/bundle.properties#L75 [19:14:04] ok ok [19:14:05] mforns: But this is the bundle - We only want to restart one of the coords of the bndle [19:14:13] aha [19:14:19] mforns: So I have a coord.propoerties file on stat1004 [19:15:39] mforns: stat1004:/home/joal/code/oozie_props/coord_load_webrequest_upload.properties [19:15:50] looking [19:16:26] mforns: this properties file launches a coordinator for upload webrequest_source only, and you can override the error threshold with -D [19:16:32] ok, there are 2, one for upload and one for misc [19:16:45] ok [19:17:21] I also have typical commands to reload the job in case: sudo -u hdfs oozie job --oozie $OOZIE_URL -Drefinery_directory=hdfs://analytics-hadoop$(hdfs dfs -ls -d /wmf/refinery/2016* | tail -n 1 | awk '{print $NF}') -Dqueue_name=production -Doozie_launcher_queue_name=production -Doozie_launcher_memory=256 -Dstart_time=2016-09-29T13:00Z -Dstop_time=2016-09-29T13:59Z -config /home/joal/c [19:17:21] mforns: I follow https://wikitech.wikimedia.org/wiki/User:Elukey/Analytics/Oozie :) [19:17:27] ode/oozie_props/coord_load_webrequest_upload.properties -run [19:17:32] that is basically --^ [19:17:43] elukey: You are a better documenter than I am ;) [19:17:49] Thanks for that [19:18:13] awesome, thanks a lot [19:18:21] elukey why did you rerun 4 jobs? [19:18:46] ignorant question [19:18:51] because I was ignorant :D [19:19:07] it was early in the morning and I went auto-pilot [19:19:10] xD, so just one is enough [19:19:15] ? [19:19:24] oh yes, if the dates are contiguous [19:19:29] but I didn't think about it [19:19:31] I see [19:20:04] Analytics: Inconsistant data in #all-sites-by-os-and-browser fot IE7 - https://phabricator.wikimedia.org/T148461#2723684 (Zebulon84) [19:20:31] thanks joal and elukey :] [19:20:37] np mforns :) [19:20:47] WIll leave you for today I think a-team :) [19:20:52] See y'all tomorrow ! [19:21:06] mforns: I also opened a phab task about the issue, it is in my last email.. let me know if you want more info! [19:21:13] bye joal! [19:21:22] bye joal ! [19:21:36] elukey, ok will do [19:23:29] going afk, byeee o/ [19:23:51] (CR) Bearloga: [C: 1] [search] Add support for generator api requests [analytics/refinery/source] - https://gerrit.wikimedia.org/r/315503 (owner: DCausse) [19:37:20] Analytics: Inconsistant data in #all-sites-by-os-and-browser fot IE7 - https://phabricator.wikimedia.org/T148461#2723684 (Nuria) @Zebulon84: Percentage of pageviews. [19:41:00] Analytics: Inconsistant data in #all-sites-by-os-and-browser fot IE7 - https://phabricator.wikimedia.org/T148461#2723684 (mforns) This may happen, because newer versions of IE still run in compatibility mode when detecting old* html syntax. In this mode the user agent sent in the headers may be the one of IE... [20:06:21] Analytics-Visualization: Re-evaluate pages system in Wikimedia Report Card - https://phabricator.wikimedia.org/T41122#2723759 (mforns) Open>declined The analytics team is porting all Limn Dashboards to Dashiki, and does not plan to work on this as a side effect. [20:37:46] Analytics, GitHub-Mirrors, Documentation, Easy: Mark documentation about limn as deprecated - https://phabricator.wikimedia.org/T148058#2723850 (greg) [22:05:54] Analytics-Kanban: Improve mediawiki data redaction - https://phabricator.wikimedia.org/T146444#2723986 (Milimetric) @jcrespo: we met up with labs and I have a better understanding of the problem. We came up with a draft solution that I'll detail here, and we're interested in your thoughts. Motivation. For... [22:12:36] Analytics, Analytics-Dashiki: Just an idea: poly-graph - https://phabricator.wikimedia.org/T148469#2723992 (Milimetric) [22:17:25] milimetric, ported redactron logic to python? [22:19:47] Krenair: yeah, chase was saying you did that [22:20:02] 'cause the original's perl and was hard to read? [22:21:03] among other things [22:21:06] but yeah it was the script that sets up and maintains the views [22:21:11] Analytics-Kanban, EventBus, Wikimedia-Stream, Services (watching): Kafka SSE Prototype - https://phabricator.wikimedia.org/T148470#2724018 (Ottomata) [22:21:21] I don't think it counts as part of redactron [22:22:51] Krenair: my bad, will edit [22:22:58] it might! [22:23:01] I just thought it didn't [22:23:12] but then I don't know much about the other parts of the santisation process [22:23:57] yeah, it seems like a convoluted snake :) It's all good, we'll untangle over the next few quarters, get this all cleaned up [22:24:49] cool