[03:37:19] <yurik>	 anyone alive around?  looking for some geo guidance on wmf.request table )
[05:34:48] <wikibugs>	 Analytics: Junk in wmf.webrequest.uri_host field - https://phabricator.wikimedia.org/T95836#1201517 (Yurik) NEW
[06:39:51] <wikibugs>	 Analytics: Junk in wmf.webrequest.uri_host field - https://phabricator.wikimedia.org/T95836#1201543 (Yurik) I did some stats to see the most frequent cases, those that might actually cause wrong results (if filtered/processed incorrectly), rather than just annoyance (like a random web sites). The most common...
[08:35:09] <grrrit-wm>	 (PS1) Yurik: CountryCounts.hql implementation, lower(uri_host) [analytics/zero-sms] - https://gerrit.wikimedia.org/r/203650
[08:35:53] <grrrit-wm>	 (CR) Yurik: [C: 2 V: 2] CountryCounts.hql implementation, lower(uri_host) [analytics/zero-sms] - https://gerrit.wikimedia.org/r/203650 (owner: Yurik)
[10:53:43] <wikibugs>	 Analytics-EventLogging: agent_type field does not work for anything except last few hours - https://phabricator.wikimedia.org/T95806#1201679 (Ironholds) Open>Invalid a:Ironholds That's not a bug. The complexity of regenerating ~60 days of data, where a day is 24*60*125000 rows, is extreme, and addin...
[15:14:35] <wikibugs>	 Analytics-Tech-community-metrics, ECT-April-2015: Instructions to update user data in korma - https://phabricator.wikimedia.org/T88277#1201801 (Dicortazar) I've added information about SortingHat in the Contributions section [1]. I may add extra details if needed. Hope this is useful!  Some old text was r...
[23:20:36] <yurik>	 cluster is having issues and needs more nodes :)
[23:45:16] <yurik>	 killed application_1424966181866_83018 - rsrvdmem 350208
[23:51:24] <Ironholds>	 yurik, more precisely?
[23:51:34] <yurik>	 Ironholds, ??
[23:51:42] <yurik>	 the cluster was hanging
[23:51:45] <Ironholds>	 was that the error you got, orr?
[23:51:45] <yurik>	 totally flat
[23:51:53] <yurik>	 http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&hreg[]=analytics1012.eqiad.wmnet|analytics1018.eqiad.wmnet|analytics1021.eqiad.wmnet|analytics1022.eqiad.wmnet&mreg[]=kafka.server.BrokerTopicMetrics.%2B-BytesOutPerSec.OneMinuteRate&gtype=stack&title=kafka.server.BrokerTopicMetrics.%2B-BytesOutPerSec.OneMinuteRate&aggregate=1
[23:52:25] <yurik>	 running mapred job showed identical list
[23:52:48] <yurik>	 so i killed one of my jobs - didn't help
[23:52:59] <Ironholds>	 uhm
[23:53:02] <yurik>	 killed the one with the highest rsrvdmem - and it worked
[23:53:15] <yurik>	 just warning in case you might need to re-run it
[23:53:25] <Ironholds>	 run hadoop job -list on stat1002
[23:53:41] <Ironholds>	 and tell me who you think is responsible for there being no memory
[23:53:44] <Ironholds>	 wait, you killed someone else's job?
[23:53:51] <yurik>	 correct
[23:53:54] <Ironholds>	 who owned it?
[23:54:07] <yurik>	 hfds
[23:54:14] <Ironholds>	 ...
[23:54:17] <Ironholds>	 and what are your jobs doing?
[23:54:48] <yurik>	 calculating per country per domain usage
[23:54:51] <yurik>	 for one day
[23:55:01] <Ironholds>	 got a link to the script?
[23:55:15] <yurik>	 sec, need to check it in
[23:55:18] <Ironholds>	 and, what for?
[23:55:38] <yurik>	 zero
[23:55:56] <yurik>	 Ironholds, could you document the hadoop job https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
[23:55:59] <Ironholds>	 okay. more specifically, for a zero production-level task or?
[23:56:08] <yurik>	 i only see the mapred
[23:56:11] <Ironholds>	 what's the question you're answering?
[23:56:32] <yurik>	 it is for the portal
[23:56:43] <Ironholds>	 Imagine for a second that I do not know what the portal is
[23:56:45] <yurik>	 how much traffic / requests / pagveiews a country generates
[23:56:57] <yurik>	 we need to plot this for our partner
[23:56:59] <yurik>	 hold on
[23:57:01] <yurik>	 typing :)))
[23:57:23] <yurik>	 partners need to see their own usage graphs + compare it with their entire country
[23:57:31] <Ironholds>	  gotcha
[23:57:50] <Ironholds>	 are the jobs checked/CRd by analytics or just introduced at your end?
[23:57:56] <yurik>	 we have pure zero (except that it doesn't include uploads/desktop, even though we need that too later on)
[23:58:14] <yurik>	 not checked yet
[23:58:25] <Ironholds>	 so, to summarise; you decided to run two jobs using 485376M and 497664M of memory respectively, on a weekend, to calculate a dataset for partners, without checking in the jobs for code review.
[23:58:26] <yurik>	 hold on, let me check it in
[23:58:29] <Ironholds>	 When these didn't run fast enough, you killed one of the regularly scheduled, production-level maintenance and ETL tasks, without telling anybody in advance or asking permission.
[23:58:44] <yurik>	 not exactly right :)
[23:58:51] <yurik>	 i run jobs all the time
[23:58:58] <yurik>	 i have tons of the schedlued, they update zero portal site
[23:59:00] <yurik>	 on schedule
[23:59:13] <yurik>	 (under my cron)
[23:59:16] <Ironholds>	 okay, but this job was not CRd, not logged, not checked?
[23:59:29] <Ironholds>	 and, you realise that the scheduling of the task is not my biggest bugbear here?
[23:59:38] <yurik>	 the cron part was written with otomata, but he never checked the job itself
[23:59:43] <yurik>	 this one was very similar to it
[23:59:57] <yurik>	 yes, but noone has ever mentioned any review process for the jobs