[02:21:56] 10Analytics, 10Research: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207 (10leila) @Nuria a few more thoughts: * You should make a decision if you want to label human activity which is bot-like as bot or not. For example, if I'm playing a Wikipedia game such as h... [06:15:57] morning :) [06:16:04] joal: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/445654/ - ready to merge? [07:25:36] yes elukey :) [07:25:39] morning [07:32:19] Could not load config from '/etc/turnilo/config.yaml': bad indentation of a sequence entry at line 458, column 8 [07:32:59] amazing [07:33:10] sending a fix now, patched manually [07:33:45] elukey: Aouch :( [07:33:52] elukey: sorry didn't notice :( [07:35:33] me too :) [07:37:41] all right fixed [07:38:05] Thanks elukey :) [07:44:48] 10Analytics, 10Operations, 10procurement, 10User-Elukey: eqiad | (14 + 6) hadoop hardware refresh and expansion - https://phabricator.wikimedia.org/T199673 (10elukey) a:05elukey>03None [07:45:01] 10Analytics, 10Operations, 10procurement, 10User-Elukey: eqiad | (3) Labs Data Lake hardware - https://phabricator.wikimedia.org/T199674 (10elukey) a:05elukey>03None [08:30:28] Thanks luca for helping on the SWAP/LDAP task :) [08:31:59] :) [08:32:10] I mean, three italians eager to work on data [08:32:21] :D [08:34:16] :) [08:37:22] this year I'd really love to find time to learn spark [08:37:32] those notebooks seems a awesome way to do it :) [08:39:56] joal: for the next LDAP ticket: if you tag the task "LDAP-Access-Requests", they'll show up on the SRE Clinic Duty workboard and will usually be processed quickly as well [08:40:40] Noted moritzm, thanks :) [08:41:11] elukey: notebooks can be nice, yes, however I think a good IDE is even better [08:41:40] elukey: notebooks will be great if you want to, in addition to process data, show nice graphs [08:42:06] elukey: if it's more about processing data, IDEs might be easier (for typing, autocompletion etc) [08:42:36] elukey: And, I know a guy who knows another guy who knows again someone else that maybe could help [08:47:53] :) [08:49:59] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog, 10Readers-Web-Kanbanana-Board: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904 (10phuedx) Excellent! Thanks, @Nuria. I'll try that out now. [08:50:03] nuria_ could you have a look at my patch, please [08:50:04] https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/443409/ [10:40:26] * elukey lunch! bb in ~2h [11:24:34] (03PS4) 10Fdans: Adds empty dir removal to hive partition dropping jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) [11:34:20] (03CR) 10Fdans: Adds empty dir removal to hive partition dropping jobs (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) (owner: 10Fdans) [13:02:43] hi fdans :) [13:02:50] Just reading you last patch [13:03:35] I wonder about the two comments in the table_parent_path (to be renamed) function [13:04:54] fdans: let me know when you have a minute to talk :) [13:33:50] joal: o/ [13:34:04] \o [13:34:37] fdans: Wanted to be sure of the behavior of that funtion :) [13:35:12] joal: OH DAMN i just realised of something [13:35:20] ? [13:35:33] joal: the function is cool, but I'm messing up the way I delete stuff [13:35:47] ah? [13:35:51] Didn't trealize [13:36:21] joal: I'm not checking whether the parent folder is empty after each child deletion [13:36:32] which means only the deepest dir would be deleted [13:36:58] like... [13:37:49] https://www.irccloud.com/pastebin/SdlWSGsb/ [13:39:00] joal: if the contents of dir 1.2 were deleted, this would only delete dir 1.1, but not dir 1, because it checks for size once, at the beginning [13:39:13] I did that to minimize hdfs calls, but I need to change it [13:43:29] fdans: size is not 0 if the dir contains folders? [13:44:41] joal: not with hdfs dfs -du -s right? [13:45:33] hm [13:45:37] I need to test fdans [13:47:07] fdans: hdfs dfs -du -s /wmf/data/wmf/webrequest/webrequest_source=te/year=2015 [13:47:14] fdans: hdfs dfs -du -s /wmf/data/wmf/webrequest/webrequest_source=text/year=2015 [13:47:17] sorry [13:48:23] that should return size 0 [13:50:15] yessir [13:51:12] joal: but I need to do "check size - delete - check size -delete" instead of "check all sizes - delete all that apply" [13:54:36] fdans: given parent folders size are 0 when children size is 0, we actually could delete only once by picking the upmost parent with size 0 [13:56:17] joal: yea that's what I was thinking, sorting the paths by string length asc, deleting only once per "waterfall" [13:56:38] joal: ok changing it :) [13:56:51] fdans: great :) [14:16:29] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog, 10Readers-Web-Kanbanana-Board: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904 (10phuedx) I captured just over an hours worth of data between 12:22 and 13:29 UTC u... [14:18:25] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog, 10Readers-Web-Kanbanana-Board: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904 (10phuedx) a:05Ottomata>03phuedx [14:22:36] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog, 10Readers-Web-Kanbanana-Board: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904 (10phuedx) [14:31:42] joal: restarted https://wikitech.wikimedia.org/wiki/Incident_documentation/20180711-kafka-eqiad#Kafka_considerations [14:32:08] this time is more generic, after testing in labs creating topics seems not to be the culprit [14:41:28] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog, 10Readers-Web-Kanbanana-Board: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904 (10phuedx) [15:05:54] ping elukey [15:05:58] ping milimetric [15:08:48] ouch sorry! [15:51:56] 10Analytics, 10MinervaNeue, 10Readers-Web-Backlog, 10Design: [SPIKE] Sticky header instrumentation - https://phabricator.wikimedia.org/T199157 (10ovasileva) [15:52:10] 10Analytics, 10MinervaNeue, 10Readers-Web-Backlog, 10Design: [Spike] Sticky header instrumentation - https://phabricator.wikimedia.org/T199157 (10ovasileva) [15:52:14] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10ayounsi) Note that before switching to a default reject+log, I added terms to permit traffic to text-lb, misc-lb, and lists on port 443... [15:54:01] 10Analytics, 10MinervaNeue, 10Readers-Web-Backlog, 10Design: [Spike] Sticky header instrumentation - https://phabricator.wikimedia.org/T199157 (10ovasileva) [16:07:53] 10Analytics, 10EventBus, 10Operations, 10SCB, 10Services (blocked): EventStreams accumulates too much memory on SCB nodes in CODFW - https://phabricator.wikimedia.org/T199813 (10mobrovac) p:05Triage>03Unbreak! [16:24:28] 10Analytics, 10EventBus, 10Operations, 10Wikimedia-Stream, and 2 others: EventStreams accumulates too much memory on SCB nodes in CODFW - https://phabricator.wikimedia.org/T199813 (10mobrovac) [16:43:48] (03PS1) 10Nuria: Preparing for release 2.3.3 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/446364 [16:45:16] mforns: about to deploy wikistats ok? [16:45:23] ok! [16:45:28] will check [16:45:29] https://gerrit.wikimedia.org/r/#/c/analytics/wikistats2/+/446364/ [16:49:29] 10Analytics, 10MinervaNeue, 10Readers-Web-Backlog, 10Design: [Spike] Sticky header instrumentation - https://phabricator.wikimedia.org/T199157 (10phuedx) > We need to decide whether to send multiple events per page or follow an approach similar to https://meta.wikimedia.org/wiki/Schema:MobileWikiAppPageScr... [16:49:59] (03CR) 10Nuria: [V: 032 C: 032] Preparing for release 2.3.3 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/446364 (owner: 10Nuria) [16:51:45] (03PS1) 10Nuria: Release 2.3.3 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/446368 [16:52:33] mforns: this is the release changeset https://gerrit.wikimedia.org/r/#/c/analytics/wikistats2/+/446368/ [16:53:04] nuria_, looks ok! [16:53:19] (03CR) 10Nuria: [V: 032 C: 032] Release 2.3.3 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/446368 (owner: 10Nuria) [16:53:31] mforns: submitted, will move kanban items [16:53:58] 10Analytics: Problems with external referrals? - https://phabricator.wikimedia.org/T195880 (10Nuria) [16:54:36] 10Analytics, 10MinervaNeue, 10Readers-Web-Backlog, 10Design: [Spike] Sticky header instrumentation - https://phabricator.wikimedia.org/T199157 (10ovasileva) [16:54:38] 10Analytics, 10Readers-Web-Backlog: Problems with external referrals? - https://phabricator.wikimedia.org/T195880 (10Nuria) [16:54:51] 10Analytics, 10Readers-Web-Backlog: Problems with external referrals? - https://phabricator.wikimedia.org/T195880 (10Nuria) p:05High>03Triage [17:00:11] 10Analytics, 10MinervaNeue, 10Readers-Web-Backlog, 10Design: [Spike 8hrs] Sticky header instrumentation - https://phabricator.wikimedia.org/T199157 (10ovasileva) [17:01:14] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog, and 2 others: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904 (10phuedx) 05Open>03Resolved The error rate of ~0.00188% is borne out by another two hours of data... [17:18:19] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Q1 2018/19 Analytics procurement - https://phabricator.wikimedia.org/T198694 (10RobH) [17:59:46] a-team: I just updated https://wikitech.wikimedia.org/w/index.php?title=Incident_documentation/20180711-kafka-eqiad again, I found something interesting [18:00:31] the OOM issue and the max open files issue were not completely separated as I thought, since only kafka1002 had the fix to raise the maximum limit [18:00:57] kafka1001 was showing the problem as well, but it wasn't restarted (it got an hour later) [18:01:33] so this changes everything, I'll try to do more tests during the next days :) [18:30:52] * elukey off! [19:38:30] (03PS1) 10Reedy: Rename foundationwiki [analytics/refinery] - 10https://gerrit.wikimedia.org/r/446399 (https://phabricator.wikimedia.org/T188776) [19:39:12] (03PS2) 10Reedy: Rename foundationwiki [analytics/refinery] - 10https://gerrit.wikimedia.org/r/446399 (https://phabricator.wikimedia.org/T188776) [19:46:02] (03CR) 10Reedy: "Not sure if this needs some handholding to rename the old data" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/446399 (https://phabricator.wikimedia.org/T188776) (owner: 10Reedy) [19:59:41] Trying to query druid with curl as described in https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid#Access_via_command_line `curl -v -X POST "http://druid1001.eqiad.wmnet:8082/druid/v2/?pretty" -H "content-type: application/json" -d @druid_query_test.json` and the following query (exported from Superset) but I get "Connection to 10.64.5.101 failed." and "Connection timed out" [19:59:41] https://www.irccloud.com/pastebin/fWmlLri2/druid_query_test.json [20:00:08] tried from stat1005 and stat1006 [20:02:01] hi bearloga - having a look [20:05:08] joal: thanks! 👍 [20:06:39] bearloga: this one --> https://gist.github.com/jobar/106cec2604d8f593b7558438c42455cb has worked for me from both stat1004 and 5 [20:07:09] I don't know if the problem comes from parameter ordering or file reading, but at least you have a working solution :) [20:20:09] joal: thank you for the working example! Problem was I had http_proxy env set in my .bashrc [20:30:00] bearloga: Wouldn't have guessed :) [20:30:10] Gone for tonight team :) [20:30:40] joal: thanks again! and have a lovely rest of the day :) [20:31:23] Hi! I'm trying to validate events on beta cluster, but both `/srv/log/eventlogging/all-events.log` and `/srv/log/eventlogging/client-side-events.log` are empty... [20:38:35] 10Analytics-Kanban: Hadoop Sanitization. Drop older than 90 days partitions in events database - https://phabricator.wikimedia.org/T199836 (10Nuria) [20:40:26] 10Analytics-Kanban: Hadoop Sanitization. Drop partitions older than 90 days in events database - https://phabricator.wikimedia.org/T199836 (10Nuria) [20:52:21] chelsyx: system probably needs a restart, processes on beta cluster seem to die all the time, let me see [20:54:08] chelsyx: rebooting [21:01:33] nuria_: I shared with chelsyx how to check if it needs rebooting and how to reboot [21:02:13] bearloga: sounds good, just ping on channel so we know it happen [21:07:09] Thanks nuria_ and bearloga ! [23:18:19] 10Analytics, 10EventBus, 10Operations, 10Wikimedia-Stream, and 3 others: EventStreams accumulates too much memory on SCB nodes in CODFW - https://phabricator.wikimedia.org/T199813 (10Liuxinyu970226)