This online tool generates a regular expression from nucleotide sequences which can include IUPAC codes. This allows to use any string/pattern search program (e.g. the linux commandline tool grep) to extract a given consensus sequence from a large file, for example a FASTA/FASTQ file obtained from a next generation sequencing experiment.
Consensus nucleotide sequence with IUPAC as extracted from the genome browser
Regular expression with ambigous IUPAC characters resolved:
Finding the sequencing in a FASTQ file on the commandline:
grep "GC[ACGT]ATAACT[AC]TGT[ACT]C" SAMPLE_1.fastq
Last updated on August 07, 2016
ecSeq is a bioinformatics solution provider with solid expertise in the analysis of high-throughput sequencing data. We organize public workshops and conduct on-site trainings on NGS data analysis.
Would you like to receive updates about our NGS trainings and solutions? Then sign-up for our newsletter