[00:53:17] Analytics-EventLogging, MediaWiki-extensions-WikimediaEvents, Performance: Convert WikimediaEvents statsv.js to use sendBeacon - https://phabricator.wikimedia.org/T112843#1647748 (Krinkle) NEW [01:00:15] ebernhardson: which column in which db? I wanna check that out [01:46:08] bearND|afk: ebernhardson db, table test. https://phabricator.wikimedia.org/T112295#1647481 [01:46:42] * ebernhardson oops [01:47:08] should be bearloga, but hes not here no more :) [05:35:51] Analytics-EventLogging, Continuous-Integration-Config: Set up jsduck test job for EventLogging - https://phabricator.wikimedia.org/T88343#1648046 (Krinkle) [08:57:07] Analytics-Tech-community-metrics, Developer-Relations, MediaWiki-Extension-Requests, Possible-Tech-Projects: A new events/meet-ups extension - https://phabricator.wikimedia.org/T99809#1648370 (Qgil) Adding #Possible-Tech-Projects in an attempt to get more community attention. Removing #Developer-R... [08:57:12] Analytics-Tech-community-metrics, MediaWiki-Extension-Requests, Possible-Tech-Projects: A new events/meet-ups extension - https://phabricator.wikimedia.org/T99809#1648372 (Qgil) [09:21:13] (CR) Joal: [C: 2 V: 2] Update changelog for version v0.0.19 [analytics/refinery/source] - https://gerrit.wikimedia.org/r/238814 (owner: Joal) [09:21:59] !log deploying refinery-source in archiva [09:47:39] (PS2) Joal: Bump core and hive jar versions [analytics/refinery] - https://gerrit.wikimedia.org/r/237419 [09:48:12] (CR) Joal: [C: 2] "Self merging for deploy." [analytics/refinery] - https://gerrit.wikimedia.org/r/237419 (owner: Joal) [09:48:29] (CR) Joal: [V: 2] "Self merging for deploy." [analytics/refinery] - https://gerrit.wikimedia.org/r/237419 (owner: Joal) [09:48:45] !log deploying refinery source [09:48:57] !log deploying refinery [09:50:20] git up [09:50:23] oops [10:14:46] !log issue deploying refinery [10:20:44] !log camus broken since 2015-09-16T22:00:00 [11:26:41] !disabled puppet agent on analytics1027 [11:27:05] !log commented camus cron on analytics1027 [11:27:32] !log launched manual camus using wmf5 version of camus jar on analytics 1027 [11:28:55] !log suspsended refine-load oozie bundle before all that :) [11:38:21] !log manual camus job finished successfully [11:39:00] !log puppet agent was actually not disabled --> changed the cron for camus back, so camus should restart by itself [11:56:03] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1648715 (Dicortazar) @Qgil, if we want to measure 'Median age of open changesets authored by volunteers in the last three... [11:58:08] !log automatic trun of camus worked fine [12:02:20] !log Camus catching back, will take some time [12:28:01] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1648792 (Qgil) We are requesting four metrics. The two first are "Number of open changesets waiting for review authored by... [12:37:32] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1648794 (Dicortazar) Ok, I'll update the task description to add the 'waiting for review' requirement. It is now clear to... [12:38:10] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1648795 (Dicortazar) [13:06:24] Analytics-Kanban: Add a 'Guard' job for pageviews {hawk} [13 pts] - https://phabricator.wikimedia.org/T109739#1648865 (JAllemandou) a:JAllemandou [13:38:23] Hey ottomata [13:38:32] Let me know when available for debrief :) [13:38:32] morning [13:38:34] just saw your email [13:38:36] whatsup? [13:38:52] same as last time: some offset files not written [13:38:59] ! [13:39:14] Launched a manual camus with previsou jar version --> worked [13:39:19] So now it catches up [13:39:35] load is paused [13:39:41] and needs some rerun [13:40:14] wow [13:40:22] so same thing we did last time fixed it too [13:40:32] yup [13:40:42] a chance ! [13:41:10] Plus another issue: refinery deployment broken on stat1002 because of git-fat :( [13:41:15] Man, that's a day [13:42:54] ottomata: Brand new breakage now ! [13:42:57] camus [13:43:00] ! [13:43:40] https://gist.github.com/jobar/39660498e2880da4306f [13:44:39] (PS1) Ottomata: Remove non-existant broker names from list of brokers that camus uses [analytics/refinery] - https://gerrit.wikimedia.org/r/239100 [13:44:47] joal: that is not relevant ^98% sure [13:44:54] but i should have done that a while ago [13:45:12] yeah, joal that is for bootstrapping [13:45:21] it is trying some non existant brokers on startup [13:45:27] but it eventually uses a real one [13:45:40] well, so far, no mapreduce logging :( [13:45:45] (CR) Ottomata: [C: 2 V: 2] Remove non-existant broker names from list of brokers that camus uses [analytics/refinery] - https://gerrit.wikimedia.org/r/239100 (owner: Ottomata) [13:45:50] So, broken I think :( [13:45:58] eh? [13:46:15] yup, log has no map-red log as usual [13:47:12] Ahhhhh, my bad, started now :) [13:47:27] i think this is the original error [13:47:28] task error: Task KILL is received. Killing attempt! [13:47:28] task error: Error: java.io.IOException: Failed to move from hdfs://analytics-hadoop/wmf/camus/webrequest/2015-09-17-10-50-09/_temporary/1/_temporary/attempt_1441303822549_30085_m_000055_0/data.webrequest_maps.12.11.1442437200000-m-00055 to hdfs://analytics-hadoop/wmf/data/raw/webrequest/webrequest_maps/hourly/2015/09/16/21/webrequest_maps.12.11.254.328651.1442437200000 [13:47:28] at com.linkedin.camus.etl.kafka.mapred.EtlMultiOutputCommitter.commitFile(EtlMultiOutputCommitter.java:155) [13:47:28] at com.linkedin.camus.etl.kafka.mapred.EtlMultiOutputCommitter.commitTask(EtlMultiOutputCommitter.java:115) [13:47:28] at org.apache.hadoop.mapred.Task.commit(Task.java:1165) [13:47:28] at org.apache.hadoop.mapred.Task.done(Task.java:1027) [13:47:29] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345) [13:47:29] at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) [13:47:30] at java.security.AccessController.doPrivileged(Native Method) [13:47:30] at javax.security.auth.Subject.doAs(Subject.java:415) [13:47:31] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) [13:47:31] at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) [13:47:50] Yeah, but why ????? [13:48:08] mannnn... [13:48:18] why indeed! [13:48:27] hulo nuria [13:48:44] A chance I didn't tried to deploy yesterday night :) [13:51:39] joseph, batcav? [13:51:44] sure ottomata [13:55:02] !log [14:12:00] (PS1) Ottomata: Update refinery-core and refiner-hive with proper 0.0.19 jars [analytics/refinery] - https://gerrit.wikimedia.org/r/239105 [14:12:18] (CR) Ottomata: [C: 2 V: 2] Update refinery-core and refiner-hive with proper 0.0.19 jars [analytics/refinery] - https://gerrit.wikimedia.org/r/239105 (owner: Ottomata) [14:16:15] joal|away: should be fixed, i've seen this happen before too. often I download the jars from archiva and add those to refinery, rather than using the ones my laptop built [14:34:01] ottomata: If I just need a bog standard mysql table accessible on the analytics cluster can taht be done on analytics-store? Or should I just use hive? [14:34:08] *database no table.... [14:34:24] ? [14:37:47] So I want to store and query some stuff, can I just create a database on the analytics-store host called "addshore" or something for that, or should I just create a scratch db in hive? [14:38:05] infact, I'm not even sure creating dbs on analytics-store is possible... just checking before I even try :) [14:38:12] addshore: you are welcome to create a scratch db in hive, but i'm not sure about analytics-store [14:38:15] Ironholds: maybe knows? [14:38:18] addshore: is this for your reports we were talking about the other day? [14:38:28] what did I do? [14:38:40] nuria: potentially, still basically getting an overview of everything right now [14:38:43] addshore: there is the staging db you can use to store tables [14:38:46] addshore, staging, yeah [14:38:56] also, you're with the WMF now? YAY! What are ya doing? [14:38:57] ahh, not got to that part of the docs yet ;) [14:39:01] addshore: take a look, you cannot create dbs but can create tables on that one [14:39:12] ahh, there is a staging db in analytics-store? [14:39:29] epic! [14:39:40] Ironholds: not WMF, back with WMDE again ;) [14:39:44] aha! [14:39:49] then yeah, use staging :) [14:40:02] awesome :) [14:40:10] (what are you working on?) [14:40:58] mainly development, I'm part of the "TCB" team over there, a community focused team. [14:41:18] and a bit of Wikidata again or course [14:41:49] But got access to the analytics cluster last week to also do some analysis of wikidata things, mainly api related. Though we are also trying to setup a lovely wikidata centric dashboard ;) [14:43:47] Thanks ottomata for fixing [14:43:53] I think I get the point :) [14:44:28] ottomata: where are in deployment phase ? [14:52:31] ottomata: [14:52:35] https://gist.github.com/jobar/2548f2a46c5732fece20 [14:53:43] Analytics-Kanban: Document work so far on Last access uniques and validating the numbers {bear} [8 pts] - https://phabricator.wikimedia.org/T112010#1649216 (kevinator) [14:53:44] Analytics-Cluster, Analytics-Kanban, Epic: {bear} Last Access Counts - https://phabricator.wikimedia.org/T88647#1649215 (kevinator) [14:54:10] Analytics-Kanban: Document work so far on Last access uniques and validating the numbers {bear} [8 pts] - https://phabricator.wikimedia.org/T112010#1649217 (kevinator) Open>Resolved [14:54:12] Analytics-Cluster, Analytics-Kanban, Epic: {bear} Last Access Counts - https://phabricator.wikimedia.org/T88647#1016919 (kevinator) [14:56:54] joal: when you have time we can talk about https://phabricator.wikimedia.org/T108843 [14:58:59] Analytics, Wikidata: analytics stat1002 xmldatadumps wikidata entities symlink incorrect - https://phabricator.wikimedia.org/T112892#1649256 (Addshore) NEW [15:27:46] I will be <5 minutes late to standup. Getting to office. [15:30:51] madhuvishy, ottomata : standuppppp [15:30:59] ah sorry madhu [15:45:43] Analytics-Kanban: Mark schemas as "containing user-inputed textual data" and add publish section to the docs {tick} - https://phabricator.wikimedia.org/T112271#1649516 (mforns) [15:45:52] Analytics-Kanban: Mark schemas as "containing user-inputed textual data" and add publish section to the docs {tick} - https://phabricator.wikimedia.org/T112271#1649517 (mforns) a:mforns [15:58:46] o/ joal [15:58:51] hey halfak [15:58:56] I ran a job last night on the altiscale cluster. [15:59:09] The goal was to recompress the diff datasets we have as bz2 [15:59:15] k [15:59:31] They were snappy and my work with ottomata yesterday showed that snappy data is useless outside of hadoop. [15:59:40] yeah, saw that :( [16:00:05] Anyway, in order to do the re-compression, I couldn't figure out a good way to make sure that the sorting and partitioning didn't get lost without sorting and partitioning again! [16:00:16] So, I did that. [16:00:17] Rifht :) [16:00:39] It took ~2 hours to re-sort and partition and recompress a few TB of data. [16:00:46] I wanted to provide an example this morning, and then got feeded to the big camus god :( [16:01:03] Yeah. I just saw that email. So I'll keep my question brief. [16:01:33] Do you think that sorting/partitioning could have possibly happened in that 2 hours or did hadoop somehow reap the benefit of having the input files already sorted and partitioned? [16:02:02] I don't think so [16:02:12] Which one? [16:02:16] IFor recompression only, you don't need reduce phase [16:02:34] joal, indeed, but I did have one. [16:02:37] Normally you'd only map: read uncompressing, then write compressed [16:02:49] Yeah. That's what I wanted, but didn't know how to do. [16:02:50] hm, then I don't know what happened isnside :) [16:02:57] OK. :) [16:03:01] Sorry :) [16:03:10] I'll queue my thoughts up for next time we discuss this processing work. [16:03:21] I'll have lots of updates next Tuesday :) [16:03:22] I'll try to provide an exmplae either tonight or tomorrow [16:03:28] cool ! [16:03:36] Good luck with camus [16:03:45] Almost done, backfilling now :) [16:27:04] Analytics-Backlog: Define a first set of metrics to be worked for wikistats 2.0 - https://phabricator.wikimedia.org/T112911#1649727 (JAllemandou) [16:30:32] Analytics-Backlog: Update stats.wikimedia.org's pipeline to use new pageview definition - https://phabricator.wikimedia.org/T112913#1649740 (kevinator) NEW [16:50:15] Analytics-Backlog: Doc cleanup day 2.0 - https://phabricator.wikimedia.org/T112024#1649855 (ggellerman) Delete documents as well [17:09:07] Analytics-Kanban, Privacy: Identify possible user identity reconstruction using location and user_agent_map pageview aggregated fields to try to link to IPs in webrequest - https://phabricator.wikimedia.org/T108843#1649913 (madhuvishy) [17:13:07] Analytics-Backlog: Update stats.wikimedia.org's pipeline to use new pageview definition - https://phabricator.wikimedia.org/T112913#1649918 (madhuvishy) p:Triage>High [17:15:35] Analytics-Backlog: Define a first set of metrics to be worked for wikistats 2.0 - https://phabricator.wikimedia.org/T112911#1649927 (madhuvishy) p:Triage>Normal [17:16:36] Analytics-Backlog: Install snzip on stat1002 and stat1003 {hawk} - https://phabricator.wikimedia.org/T112770#1649930 (madhuvishy) a:Ottomata [17:17:49] hey a-team, need a second opinion on a bash submission [17:18:22] ottomata, cool, we're in backlog grooming, but I can have a look afterwards if you want [17:18:27] k [17:18:29] danke [17:18:47] Analytics-Backlog: Write in-depth dashiki documentation {crow} - https://phabricator.wikimedia.org/T112685#1649935 (madhuvishy) p:Triage>Normal a:Nuria [17:19:25] Analytics-Kanban: Write in-depth dashiki documentation {crow} - https://phabricator.wikimedia.org/T112685#1642517 (madhuvishy) [17:20:14] Analytics-Backlog, Analytics-Wikimetrics, Puppet: Cleanup Wikimetrics puppet module so it can run puppet continuously without own puppetmaster {dove} - https://phabricator.wikimedia.org/T101763#1649942 (madhuvishy) p:Normal>Low [17:22:39] ottomata: shall I resume load ? [17:23:18] camus is 1-2h late, and load about 18h [17:23:28] I think camus will have caught up before load :) [17:24:20] Analytics-Backlog: Create new table for 'referer' aggregated data - https://phabricator.wikimedia.org/T112284#1649974 (madhuvishy) [17:24:21] Analytics-Backlog: Define a first set of metrics to be worked for wikistats 2.0 - https://phabricator.wikimedia.org/T112911#1649975 (madhuvishy) [17:24:30] Analytics-Backlog: Create new table for 'referer' aggregated data - https://phabricator.wikimedia.org/T112284#1649978 (madhuvishy) p:Triage>Normal [17:25:35] sure! [17:28:03] !log load jobs restarted [17:59:02] mforns: ottomata https://wikitech.wikimedia.org/wiki/Analytics/EventLogging/Architecture should we mention here that the processors use pykafka's BalancedConsumer and can be parallelized for better throughput? [17:59:30] madhuvishy, sure! [17:59:38] yes! [17:59:44] madhuvishy: edit away! :) [17:59:49] yup adding :) [18:00:06] oh thanks madhuvishy! [18:02:34] nuria: Hulo ! [18:02:43] Talk about privaacy stuff ? [18:05:19] ottomata: mforns edited [18:06:25] cool madhuvishy! sorry for having forgotten that [18:06:44] no problem mforns! You wrote a lot of stuff :) [18:14:31] joal: yes [18:14:39] joal: batcave? [18:14:49] ottomata: which vagrant role should i enable if i wanted to test, say camus? analytics? or hadoop [18:15:58] nuria: sure [18:17:09] madhuvishy: you should just enable analytics [18:17:12] you will also need kafka [18:17:15] not sure if that is vagrantified atm [18:17:22] it should be easier now that there are good wmf trusty packages [18:17:25] that's fine, i have kafka [18:17:28] ok cool [18:17:33] i mean, hadoop will do then i think [18:17:34] nuria, cave ? [18:17:37] but its easier to just get all of analytlics [18:17:47] okay cool [18:18:45] joal: yes [18:19:40] who haven't i bugged for a task grade today! [18:19:42] madhuvishy: ! [18:19:47] will you grade a python task? [18:19:51] i just want a second opinion [18:19:56] you just have to chat me your thoughts after reading it [18:20:01] ottomata: sure [18:40:05] Lads, I'm off for tonight [18:40:19] madhu, I'm sorry I missed you again :( [18:40:26] Tmorrow ? [18:41:07] * joal|night hopes madhu is not angry after me [18:42:50] joal|night, good night! [18:44:20] joal|night: :) of course not! [18:44:31] good night :) let's talk tomorrow [18:49:59] Analytics-EventLogging, Beta-Cluster, Fundraising-Backlog, Labs, and 3 others: Betawiki EventLogging data is disappearing? - https://phabricator.wikimedia.org/T112926#1650382 (awight) NEW [18:53:12] ottomata: this was fixed yesterday right? ^ [18:54:36] Analytics-EventLogging, Beta-Cluster, Fundraising-Backlog, Labs, and 3 others: Betawiki EventLogging data is disappearing? - https://phabricator.wikimedia.org/T112926#1650405 (Ottomata) eventlogging and database have been moved to deployment-eventlogging03 instance. Needed to upgrade from Precise... [18:54:38] madhuvishy: thanks, commented [19:13:01] Analytics-Kanban, Privacy: Identify possible user identity reconstruction using location and user_agent_map pageview aggregated fields to try to link to IPs in webrequest - https://phabricator.wikimedia.org/T108843#1650550 (Nuria) The goal of this ticket is to establish how easy is to link records in pagev... [19:25:43] Analytics-EventLogging, Beta-Cluster, Fundraising-Backlog, MediaWiki-extensions-CentralNotice, and 2 others: Beta Cluster EventLogging data is disappearing? - https://phabricator.wikimedia.org/T112926#1650603 (greg) [19:28:03] Analytics-EventLogging, Beta-Cluster, Fundraising-Backlog, MediaWiki-extensions-CentralNotice, and 2 others: Beta Cluster EventLogging data is disappearing? - https://phabricator.wikimedia.org/T112926#1650613 (AndyRussG) Many thanks @Ottomata!! K I see it now in /var/log/eventlogging/all-events.l... [19:28:29] Analytics-EventLogging, Beta-Cluster, Fundraising-Backlog, MediaWiki-extensions-CentralNotice, and 2 others: Beta Cluster EventLogging data is disappearing? - https://phabricator.wikimedia.org/T112926#1650620 (greg) >>! In T112926#1650405, @Ottomata wrote: > What is the proper channel? * Absolut... [19:30:56] Analytics-EventLogging, Beta-Cluster, Fundraising-Backlog, MediaWiki-extensions-CentralNotice, and 2 others: Beta Cluster EventLogging data is disappearing? - https://phabricator.wikimedia.org/T112926#1650640 (Ottomata) Ok, thanks. I think I migrated this service very hastily, since we were chan... [19:33:42] Analytics-EventLogging, Beta-Cluster, Fundraising-Backlog, MediaWiki-extensions-CentralNotice, and 2 others: Beta Cluster EventLogging data is disappearing? - https://phabricator.wikimedia.org/T112926#1650654 (greg) I think it let's anyone email it. I clean the backlog daily, fwiw (thanks to `lis... [19:43:20] Analytics-EventLogging, Fundraising-Backlog: Nested EventLogging data doesn't get copied to MySQL - https://phabricator.wikimedia.org/T112947#1650731 (awight) NEW [19:45:31] Analytics-EventLogging, Fundraising-Backlog: Nested EventLogging data doesn't get copied to MySQL - https://phabricator.wikimedia.org/T112947#1650741 (awight) [19:46:52] nuria: still there? [19:57:26] ottomata: should i be able to git clone in stat1002? [19:57:34] or wget something [19:57:42] yes, but you need proxy env var [19:57:48] aah [19:57:49] https://wikitech.wikimedia.org/wiki/http_proxy [19:59:33] ottomata: yesssir [19:59:56] madhuvishy: anonymous http clone should work [20:00:08] ottomata: lemme know if you need anything [20:00:16] hmmm setting the proxy doesnt help too [20:00:33] madhuvishy: can't you do ... [20:00:51] nuria: was going to ping you about replace=True [20:00:54] it seems to work for me [20:01:05] what did you have problem with? [20:01:19] madhuvishy: git fetch https://gerrit.wikimedia.org/r/analytics/refinery refs/changes/some [20:01:27] ottomata: the problem was [20:01:49] that replace=true was not being passed correctly to the thread that is instantiated [20:02:30] ottomata: to do the sql inserts (as they are not done on the main process) [20:03:24] ottomata: but from comments on your changeset it looks like that is no longer an issue? [20:06:44] it looks like it is being passed, yea [20:06:59] PeriodicThread(interval=DB_FLUSH_INTERVAL, [20:06:59] target=store_sql_events, [20:06:59] args=(meta, events_batch), [20:07:00] kwargs={'replace': replace, [20:07:25] then in store_sql_events in thread target [20:07:26] insert(table, events, replace) [20:10:36] nuria: ottomata thanks, figured the proxy stuff out [20:49:04] Analytics-Kanban, Privacy: Identify possible user identity reconstruction using location and user_agent_map pageview aggregated fields to try to link to IPs in webrequest - https://phabricator.wikimedia.org/T108843#1651168 (Nuria) More info from IRC conversation: 1:38 PM nuria: So my specific e... [20:54:12] Analytics-EventLogging, Fundraising-Backlog: Nested EventLogging data doesn't get copied to MySQL - https://phabricator.wikimedia.org/T112947#1651209 (Ottomata) FYI, although not 100% announced yet, EventLogging data is going into HDFS now, so you should be able to query them using Hive there. There are... [21:22:38] Analytics-Backlog, Team-Practices: Get regular traffic reports on TPG pages - https://phabricator.wikimedia.org/T99815#1651371 (Awjrichards) [21:43:21] joal|night: thanks for your note at https://phabricator.wikimedia.org/T108925 - i would be available now but i guess it's late on your side ;) ... [21:44:02] ...will try to catch you tomorrow, either around noon PDT or earlier before 9:30am (16:30 utc) [22:18:04] Analytics-Backlog, Research-and-Data: Double check Article Title normalization - https://phabricator.wikimedia.org/T108867#1651710 (Halfak) I think that someone should run an analysis where we compare titles in the log to titles in the page table and review those that don't match. Doing this, we can get a... [22:25:10] Analytics, Analytics-Backlog, Research consulting, Research-and-Data: Too few page views for June/July 2015 - https://phabricator.wikimedia.org/T106034#1651732 (ggellerman) @kevinator Should this task be assigned to you when it is in the Radar column? [22:26:08] Analytics-Backlog, Research consulting, Research-and-Data: Workshop to teach analysts, etc about Quarry, Hive, Wikimetrics and EL {flea} - https://phabricator.wikimedia.org/T105544#1651733 (Halfak) Maybe we can do this at the allstaff? It seems that the recent survey suggests this might be a thing. [22:28:15] Analytics-Backlog, Research consulting, Research-and-Data: Workshop to teach analysts, etc about Quarry, Hive, Wikimetrics and EL {flea} - https://phabricator.wikimedia.org/T105544#1651738 (Deskana) @Ironholds may be interested in helping out here.