Welcome to the CIRCA help page. Here you will find all the information
you need for a successful search.
There are three modes of querying CircaDB:
By gene symbol,
In all three modes using the wildcard character "*" is possible.
This is the default. Simply enter a set of terms and the engine will query for
each term separately, then join all of the results from each separate term together
into the final result. In other words a query for
will return all entries with the word
kinase together with any entry
inhibitor in their respective annotations.
This is usually not what one would want returned from such a query. This is where
the advanced query mode comes in.
Once the "advanced query mode" check-box has been selected, the query strings
are no longer simply split into individual queries. As the saying goes "with
great power comes great responsibility". In a nutshell, you must now use the
underlying Sphinx search engine's
extended syntax query mode.
Briefly, the default behavior of the query
kinase inhibitor would bring
back all entries that match both
kinase AND inhibitor.
Instead of an implicit
OR between terms, there is now an implicit
AND between terms.
To bring back the
OR behavior you would need to separate terms
| (a vertical bar, or "pipe") character. E.g.
kinase | inhibitor.
To bring back the specific phrase "kinase inhibitor", you would need to
surround both words with quotes. E.g.
Query for the exact phrase "kinase inhibitor" that also make mention of "mitochondrial"
"kinase inhibitor" mitochondrial
Query for kinases, but not any entry with "inhibitor" in the record
Even more examples
Sphinx is a large a powerful system. You can review all of the various
ways to use the extended syntax at the
Sphinx extended query syntax help page
CircaDB Query Fields
Having read the above, you may be wondering what fields you can use to query CircaDB. The annotation is straight from the Affymetrix annotation files. We have pulled out certain columns from that file and indexed them as follows:
||The Affymetrix probeset name
||The transcript ID
||NCBI gene symbol
||Full gene title
||The Entrez gene ID
||The SwisProt accession
||The RefSeq protein accession
||The RefSeq NA sequence accession
||The description from the Affymetrix annotation file
The probability filter can be used to narrow the output to the most significant results. The database can be filtered for
produced by the different algorithms. The given value is used as the upper cutoff.
We have applied CYCLOPS (Anafi et al., PNAS 2017), an algorithm designed to reconstruct sample order
in the absence of time-of-day information, to the public GTEx collection (GTEx Consortium, Nat. Genet. 2013)
of 632 human donors contributing 4,292 RNA-seq samples from 13 distinct human tissue types.
Additional sample information can be found at GTEx’s documentation page:
For each tissue that was CYCLOPS ordered, cosinor regression (modified, Anafi et al., PNAS 2017)
was used to test if individual genes are rhythmic or not. We only looked for rhythms with a period
of 24 hours. Gene-level expression data was filtered to exclude any gene with a read count of zero
(TPM = 0) in any sample. Following this, only the top 15,000 expressed genes by median TPM were
considered for each tissue.
Each point represents a single human donor. The x-axis is the CYCLOPS-estimated
sample phase in radians (from 0 to 2𝝿). The y-axis is the expression level (TPM)
from RNAseq. Phase has been adjusted so that time 𝝿 represents E-box phase (i.e. time
of peak expression of E-box target genes NR1D1, NR1D2, and PER3).
The probability of seeing data at least this extreme given the null hypothesis is true – that the dataset is not rhythmic.
FDR is the ratio of false-positives to total genes discovered at a particular threshold.
This is an adjustment to P-value in order to control the number of false discoveries when
simultaneously testing multiple hypotheses (Benjamini & Hochberg, J. R. Stat. Soc. 1995).
Coefficient of determination from cosinor regression. This is a measure of how well
the data fit to a 24 hour sine wave. A value of 1 is perfect – all of the points line up on the curve.
A measure of the magnitude of oscillation (i.e., effect size).
How often the cycle repeats. We tested only for patterns that repeat every 24 hours.
Identifies the time of peak expression.
There are three different algorithms to statistically test if a dataset
is cyclic or not. More information about these algorithms can be found
in the About tab.
For all the algorithms we tested cyclic behavior for a range of different periods, except for the DeLichtenberg were we only looked for a period of 24 hours.
As result we will see a plot, where the X-axis is the time in
hours and the Y-axis reflects the expression levels measured
on the individual gene chips. The background shaded areas represent the cycle of light and dark conditions based on CT times.
This statistical size, is the estimated probability of rejecting the null hypothesis. Here the null hypothesis is a dataset not being cyclic. The lower the p-value the more significant is the test result.
The q-value of a test is the minimum false discovery rate at which a gene is mistakenly called cyclic.
This value gives information on how often the cycle is repeated. Is the period
, the pattern reoccurs every 24 hours.
The phase reflects the timing in the individual tissues. The DeLichtenberg algorithm does not provide this value in its output.
|How do I select more than one experiment?
||"Ctrl+Right Click" lets you select multiple experiments ("Command+Right Click" for mac users)
|How do I request a data set be added to CircaDB?
Researchers can request that a particular data set be added by submitting an issue at our project page hosted at