<?xml version="1.0" encoding="iso-8859-1" standalone="yes" ?>
<rss version="2.0">
<channel>
<title>Computational Medicine Center FAQ - The five questions posted most recently:</title>
<description>Frequently Asked Questions about the Computational Medicine Center</description>
<link>http://www.computationalmedicine.org/faq</link>	<item>
		<title><![CDATA[We are used to working in document classification tasks, but document collections, and the documents themselves, were larger than the ones given for the CMC'07 challenge. Moreover, there are reported results for those document collections, and we could compare the results obtained by our system with the published results. We would like to know whether a challenge similar to the CMC'07 has been organized in the past, in order to have an idea of the quality of our results. Thank you in advance.]]></title>
		<description><![CDATA[There have been other Challengs.  Trec (http://trec.nist.gov/) and BioCreative (http://biocreative.sourceforge.net/) come to mind.  We don't know, however, who those results will compare to this Challenge.   We do know that we will post all the data in mid-late April so you can compare your results with the original data sets.]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=artikel&amp;cat=1&amp;id=46&amp;artlang=en</link>
		<pubDate>Fri, 16 Mar 2007 13:23:13 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[We are exploring several approaches and would like to compute the F-score for each one. Are there any scripts that can help?]]></title>
		<description><![CDATA[
<p>
Yes. First, we've updated the site. Now, when you log in, you have the option to upload a file for evaluation of the XML schema and F-score. You can use this option to make sure your data meet the following requirements: (1) XML conforms to the RELAX-NG schema, (2) all codes have the same origin, and (3) all DOC IDs match the training or testing dataset. If your file passes these three tests, you will receive a report including the results of two additional tests: main ranking measure (F-measure) and cost sensitive accuracy.<br />
<br />
Second, we have made the evaluation script and necessary files available at the following URL: <a href="http://www.computationalmedicine.org/challenge/EvalScript.zip">http://www.computationalmedicine.org/challenge/EvalScript.zip</a>.<br />
<br />
By clicking this link, you will download a ZIP file that contains the evaluation script (main_ranking) and data (Fscoreeval.xml). The script is written in Perl and requires installation of XML::simple module. The Fscoreeval.xml is identical to the training data EXCEPT it only has the majority codes, not the codes from the three other coding groups.</p><p>03/14/07</p><p>NOTE: The most recent version of the software is the web-based version.  Preference should be given to it.</p><p />]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=artikel&amp;cat=1&amp;id=11&amp;artlang=en</link>
		<pubDate>Wed, 14 Mar 2007 13:49:28 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[Submission Problem.

We tried your website submission software to see if our XML format is correct (using the training data).  We got the following results.  Could you advise if we are doing something wrong, given that we got 255% and 4%?  In our assessment, the training set is predicted fairly well and cannot make sense of 255% and 4%.

Thank you very much.

FILE CONFORMS TO RELAX-NG SCHEMA: Passed
ALL CODES HAVE SAME ORIGIN: Passed
DOC IDS MATCH TRAINING/TESTING DATASET: Passed

You have some uncategorized documents

Main ranking measure: 255%
Cost sensitive accuracy: 4%

]]></title>
		<description><![CDATA[<p>This occurs when there are spaces after the code.  We have updated the submission software, but you may want to make sure there are no trailing spaces in your submission.</p><p>Related questions</p><p><a href="index.php?action=artikel&cat=1&id=10&artlang=en">I've noticed some unexpected content in the XML data. ...</a></p><p></p>]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=artikel&amp;cat=1&amp;id=47&amp;artlang=en</link>
		<pubDate>Wed, 14 Mar 2007 13:40:28 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[We have received a number of inquiries related to the order of the test data.  These inquiries have expressed concern that participants could use this order to game the system and report artificially high scores.  These are fair questions.]]></title>
		<description><![CDATA[This may be true, but we have, from the onset, indicated that the Challenge was based on scientific integrity and personal honor. Little effort and budget have been put into security or policing measures.  Rather our efforts focused on creating an anonymous, but real data set and an environment for the Challenge.   To that end we are asking you to focus on the Challenge of assigning codes to the text and pay no attention to the order of the test data.<br /><br />One may still ask: How can the results be validated if the possibility of gaming is present?  This too is a fair question.  To address this, we will ask for clarification from any contestant whose results we believe to be suspiciously high. We reserve the right to exclude from the challenge any results that we judge to have been obtained by means that go against the spirit of co-operative scientific endeavor. We will not do this lightly, but we will should the need arise.<br /><br />
]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=artikel&amp;cat=1&amp;id=45&amp;artlang=en</link>
		<pubDate>Tue, 13 Mar 2007 16:17:08 GMT</pubDate>
	</item>
	<item>
		<title><![CDATA[It is still not clear if we should provide one single code for each document.
The training data has 2 or even 3 majority codes. Which one of them is the one we should evaluate our algorithms against ?
should our algorithm generate multiple codes in order to match all the majority codes for eac document,or just one code?
Thanks,
Adi]]></title>
		<description><![CDATA[<p>To clarify this, there should be one and only one originator for the
entire submission, but multiple codes are acceptable for each record. 
For example, Cincinnati might be the originator for a submission.  A
record in that submission may be a chest x-ray that has the codes
786.2, 780.6 and 786.07.  Remember, though, the F-score is impacted by false positive and false negitive codes.</p><p><a href="index.php?action=artikel&cat=1&id=41&artlang=en">Change of rules re codes per document</a></p><p></p>]]></description>
		<link>http://www.computationalmedicine.org/faq/index.php?action=artikel&amp;cat=1&amp;id=43&amp;artlang=en</link>
		<pubDate>Thu, 08 Mar 2007 20:56:03 GMT</pubDate>
	</item>
</channel>
</rss>