<?xml version="1.0" encoding="iso-8859-1" standalone="yes" ?>
<rss version="2.0">
<channel>
<title>Computational Medicine Center FAQ - Open questions</title>
<description>Frequently Asked Questions about the Computational Medicine Center</description>
<link>http://www.computationalmedicine.org/faq</link>	<item>
		<title><![CDATA[Whether i am eligible to get admission in cmputational ... (sikdar masood)]]></title>
		<description><![CDATA[Whether i am eligible to get admission in cmputational medicine course as a foreign medical graduate?. 

Please give me the name of Institutions  in USA that offer  computational medicine degree ? .]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_25</link>
		<pubDate>Fri, 01 Feb 2008 05:30:48 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[The challenge is over, what is the follow-up? How ... (Martijn Schuemie)]]></title>
		<description><![CDATA[The challenge is over, what is the follow-up? How can we find out what other people did (to achieve higher scores)? Are there any plans for publishing the outcome of this contest?]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_24</link>
		<pubDate>Thu, 24 May 2007 13:44:40 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[Hi, My team (login : ravi@wintricate.com) participated in the ... (Vishvas Vasuki)]]></title>
		<description><![CDATA[Hi, 

My team (login : ravi@wintricate.com) participated in the Medical NLP challenge. But, we are a little confused by the results. 

What is our team id?

Sincerely,
Vishvas]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_23</link>
		<pubDate>Thu, 05 Apr 2007 18:17:05 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[Hello, Now that the challenge is closed, is there ... (Thierry Delbecque)]]></title>
		<description><![CDATA[Hello,

Now that the challenge is closed, is there a way to get the test data with its associated ICD codes for each record, so that some further experiments may still be done ?

Thanks.



]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_22</link>
		<pubDate>Sun, 25 Mar 2007 10:04:28 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[We have been studying the behaviour of the data ... (Jon Patrick)]]></title>
		<description><![CDATA[We have been studying the behaviour of the data and computed the Precisions, Recall, F-value for each company. Could you provide us with the cost scores for each Company:
Company  P     R      F   No# of Lables 
c1      78.3  89.8  83.7   1397
c2      82.6  95.2  88.5   1404
c3      90.4  75.0  82.0   1011

These scores also indicate a number of issues with respect to the  quality of the companies' annotations but that can wait for discussion until another time.
thanks
jon]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_21</link>
		<pubDate>Mon, 19 Mar 2007 05:43:31 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[We are used to working in document classification tasks, ... (Ana)]]></title>
		<description><![CDATA[We are used to working in document classification tasks, but document collections, and the documents themselves, were larger than the ones given for the CMC'07 challenge. Moreover, there are reported results for those document collections, and we could compare the results obtained by our system with the published results. We would like to know whether a challenge similar to the CMC'07 has been organized in the past, in order to have an idea of the quality of our results. Thank you in advance.]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_20</link>
		<pubDate>Tue, 13 Mar 2007 10:37:29 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[We tried your website submission software to see if ... (Yutaka Yasui)]]></title>
		<description><![CDATA[We tried your website submission software to see if our XML format is correct (using the training data).  We got the following results.  Could you advise if we are doing something wrong, given that we got 255% and 4%?  In our assessment, the training set is predicted fairly well and cannot make sense of 255% and 4%.

Thank you very much.

FILE CONFORMS TO RELAX-NG SCHEMA: Passed
ALL CODES HAVE SAME ORIGIN: Passed
DOC IDS MATCH TRAINING/TESTING DATASET: Passed

You have some uncategorized documents

Main ranking measure: 255%
Cost sensitive accuracy: 4%

]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_19</link>
		<pubDate>Mon, 12 Mar 2007 22:48:32 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[Hi again, One thing. I wrote in my last ... (David Woolls)]]></title>
		<description><![CDATA[Hi again,

One thing.  I wrote in my last mail about the test data 'Those of us who have been trying to do it properly', whichh I realsie implies that some were not, which was not what I meant to say.  I was just rather cross at all of us haing wasted our time.

I've read Bernhard's question now and I really think you may have blown all your hard work and ours, which will be as big a disappointment to you as the rest of us.  I did think that you could allocate another set of ids then send them out in random order, but of course any of us can now map the entire set of history and impressions to the codes we know about.  It seems to me the integrity of the challenge is totally compromised.

You'll have saved $1700 at least.  Perhaps you should send us all the rest of the codes and get us to submit our own reports on what we did and how we got on, since we all in the perfect position to judge it now.  Not what you had in mind, but probably more useful, given the variability in coding as hasalready been pointed out by my analysis and others as you mentioned in the earlier posting about variation.

I had already decided to go for consistency over scoring accuracy, since it seemd to me more important to be able to explain results than to win.

Regards and good luck with your deliberations

David

]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_18</link>
		<pubDate>Mon, 12 Mar 2007 22:41:51 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[Hi, last week I've sent the question below (from ... (Bernhard Pfahringer)]]></title>
		<description><![CDATA[Hi,

last week I've sent the question below (from my GMail account, not my University one), but so far have not had any response. Just wondered whether you have actually received it.

cheers, Bernhard

original message:

Dear Challenge Organizers,

I think I have discovered a serious problem with the test data.

From my preliminary examination it seems that the test data
has cases grouped by unique codes in exactly the same order
as the training data. Therefore together with the information that
the data has been split 50:50 between training and test (you even
list these numbers for the 20odd most common unique codes
in the description) it is possible to determine test codes with
almost perfect F1 (there might be a few wrongs due to uneven
numbers of cases for some codes), all that without actually
examining clinical history or impression.

I am not sure anything can be done about that now, others
might discover that as well and it may influence what they
do with their classifiers. I initially picked it up when examing
test-data that coded PPD explicitly as purified protein derivate
and then noted that cases close used "TB test" and "TB converter"
(which is not present in the training data), and that all these cases
together formed on block of 16 cases in the test data. Looking at the
more common codes seems to clearly confirm this finding.
Of course I am tempted to map e.g. "TB test" to the same
"history_tb_related_test" feature that I use for "PPD", which will
make it easier for a learning algorithm to pick up the relationship
of this version to unique code 795.5 in this case.

please "don't shot the messenger"

cheers, Bernhard]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_17</link>
		<pubDate>Sun, 11 Mar 2007 21:37:27 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[1) Does each document in the Testing Set have ... (adi andrei)]]></title>
		<description><![CDATA[1) Does each document in the Testing Set have one and only one label, or do they have multiple labels sometimes, like in the training set (where documents have multiple majority codes)?

2) In this case, do we have to identify all majority labels for a document to get the best F-score ?]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_16</link>
		<pubDate>Fri, 09 Mar 2007 18:01:30 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[The answer given to FAQ ID #1034 includes the ... (Ira Goldstein)]]></title>
		<description><![CDATA[The answer given to FAQ ID #1034 includes the statement "You should submit one and only one code for each document."  Various other documents, including the Evaluation section of the Detail pdf, imply just the opposite.  

Can you clarify?

Thanks
]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_15</link>
		<pubDate>Thu, 08 Mar 2007 20:11:05 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[Change of policy from Challenge Description? I wrote in ... (David Woolls)]]></title>
		<description><![CDATA[Change of policy from Challenge Description?

I wrote in Add Content 3/7/07 after reading your response to Thierry's How many Codes question. It hasn't appeared so, I'm including it as a question this time :-)

This is what the Description says on page 6
"Both the gold standard and the participant submissions may (and usually will) assign more than  one code to each record, so this is a multi-label classification task. This is somewhat unusual in a machine learning setting." 
and the illustration of assessment on page 7 also treats the task as a multi-code submission exercise.  

Can you clarify the requirement, please?

Thanks 

David]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_14</link>
		<pubDate>Thu, 08 Mar 2007 12:50:35 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[It is still not clear if we should provide ... (adi andrei)]]></title>
		<description><![CDATA[It is still not clear if we should provide one single code for each document.
The training data has 2 or even 3 majority codes. Which one of them is the one we should evaluate our algorithms against ?
should our algorithm generate multiple codes in order to match all the majority codes for eac document,or just one code?
Thanks,
Adi]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_13</link>
		<pubDate>Wed, 07 Mar 2007 23:26:03 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[According to the description of the data collection (page ... (Cyril Goutte)]]></title>
		<description><![CDATA[According to the description of the data collection (page 6), the comprehensive set contained 25 categories and 22 labels.  The challenge data is a subset of this comprehensive dataset, yet it contains 45 labels and 94 categories.

Can you clarify this ?
Where do the additional labels/categories come from ?

Thanks.
]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_12</link>
		<pubDate>Tue, 06 Mar 2007 22:02:42 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[According to the description of the data collection (page ... (Cyril Goutte)]]></title>
		<description><![CDATA[According to the description of the data collection (page 6), the comprehensive set contained 25 categories and 22 labels.  The challenge data is a subset of this comprehensive dataset, yet it contains 45 labels and 94 categories.

Can you clarify this ?
Where do the additional labels/categories come from ?

Thanks.
]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_11</link>
		<pubDate>Tue, 06 Mar 2007 22:01:20 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[The Challenge Description pdf says that "Participants are obliged ... (Ira Goldstein)]]></title>
		<description><![CDATA[The Challenge Description pdf says that "Participants are obliged to provide an explicit answer for every document in the test set. You may choose to provide no codes for a document, but every document that is in the submission must have a counterpart in the test set."  and that "This data format allows a document to contain any number of codes, including zero, and applies equally to the training set, the test set and the participant submissions."  

When we try to validate our XML file we get the following error if the  is left out:
submit.xml:2022: element codes: Relax-NG validity error : Expecting an element, got nothing

When we try to validate our XML file we get the following errors if the  is included, but no code is provided:
submit.xml:2023: element code: Relax-NG validity error : Type NMTOKEN doesn't allow value ''
submit.xml:2023: element code: Relax-NG validity error : Error validating datatype NMTOKEN
submit.xml:2023: element code: Relax-NG validity error : Element code failed to validate content

How should we format the document if we choose not to provide a code?

Also, have you updated the sample-data.ng file to address the e-mail address problem?

Thanks  in advance for any clarification that you can provide.]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_10</link>
		<pubDate>Tue, 06 Mar 2007 02:31:35 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[The Challenge Details document states that "You may choose ... (Martijn Schuemie)]]></title>
		<description><![CDATA[The Challenge Details document states that "You may choose to provide no codes for a document".  However, it is unclear how this should be specified in XML. 
Removing the "" section, or leaving it empty will result in a "FILE CONFORMS TO RELAX-NG SCHEMA: failed" error. Completely removing those documents for which no codes were found results in a "DOC IDS MATCH TRAINING/TESTING DATASET: Failed" error. How can you specify that you found no codes for a document?]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_9</link>
		<pubDate>Mon, 05 Mar 2007 11:04:26 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[Hello, I have a very simple question, but I ... (Thierry Delbecque)]]></title>
		<description><![CDATA[Hello,

I have a very simple question, but I just don't want to misunderstand the directives.

In the README file that comes with the test data set, I have red "Please submit only one type of origin codes". 

Doe's it mean that for each document, we must propose only one IDC-9 code, or does it rather mean that we can propose several IDC-9 codes for the same document (as in the train data set), but always with the same origin code ?

In other words: in the following examples, which one is good, and which one is wrong ? (I guess the first is OK as obviously the second cannot be so ...)

example 1:
-----------

    
	
      
	786.2
      
	591

	
      
		...
		...

        
  




example 2:
-----------

    
	
      
	786.2
      
	591

	
      
		...
		...

        
  



Furthermore, do the  fragments have to appear in the response file ?

Thanks in advance.

Regards, Thierry Delbecque
]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_8</link>
		<pubDate>Sun, 04 Mar 2007 23:36:10 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[I don't see anywhere in the challenge description how ... (Aaron Cohen)]]></title>
		<description><![CDATA[I don't see anywhere in the challenge description how many submissions each group is allowed. Is there a limit to the number of runs that we can submit?]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_7</link>
		<pubDate>Sat, 03 Mar 2007 00:22:29 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[The Readme file that came with the Challenge Documents ... (Lois Childs)]]></title>
		<description><![CDATA[The Readme file that came with the Challenge Documents states:  "Results must be sent to jpestian@cchmc.org."  However, the validation website says that the last file uploaded "will be considered your official submission for the challenge."  Our submission bounced from the aforementioned email address.  Can you confirm that website upload is sufficient?   ]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=open#openq_6</link>
		<pubDate>Fri, 02 Mar 2007 14:08:02 GMT</pubDate>
	</item>
</channel>
</rss>