MetaSnek API

fastq_finder.py

convert_to_dictionary(paired_reads, unpaired_reads)

Converts paired and unpaired reads to a single dictionary.

Parameters:
  • paired_reads (set) –

    A list of tuples with sample name, R1 file, and R2 file (if available).

  • unpaired_reads (set) –

    A list of tuples with sample name and R1 file (for unpaired reads).

Returns:
  • dict
    • sample name (dict):
      • R1 (str): filepath of R1 reads file
      • R2 (str): filepath of R2 reads file or None for unpaired

parse_directory(file_list, r1_flags=['_R1.', '_R1_', '.R1.', '.R1_', '_1_', '_1.', '.1.', '.1_'], r2_flags=['_R2.', '_R2_', '.R2.', '.R2_', '_2_', '_2.', '.2.', '.2_'], ext_pattern='.(fasta|fastq|fq)(.gz)?$')

Pairs samples from a list of files.

Parameters:
  • file_list (list) –

    A list of file paths

  • r1_flags (list, default: ['_R1.', '_R1_', '.R1.', '.R1_', '_1_', '_1.', '.1.', '.1_'] ) –

    list of string patterns of allowable R1 flags

  • r2_flags (list, default: ['_R2.', '_R2_', '.R2.', '.R2_', '_2_', '_2.', '.2.', '.2_'] ) –

    list of string patterns of allowable R2 flags

  • ext_pattern (str, default: '.(fasta|fastq|fq)(.gz)?$' ) –

    (raw-)string of regex for matching file extension, eg ext_pattern=r".(fasta|fastq|fq)(.gz)?$"

Returns:
  • tuple

    A tuple containing two sets: - paired_files: A set of tuples with the sample name, unpaired file path, and paired file path. - unpaired_files: A set of tuples with the sample name and unpaired file path.

parse_samples(input_file_or_directory)

Work out if filepath is a file or directory and run appropriate parser

Parameters:
  • input_file_or_directory (str) –

    filepath for TSV file or directory of reads

Returns:
  • tuple

    A tuple containing two lists: - paired_reads: A list of tuples with the sample name, R1 file, and R2 file (if available). - unpaired_reads: A list of tuples with the sample name and R1 file (for unpaired reads).

parse_samples_to_dictionary(input_file_or_directory)

Convenience function to parse the samples directory or TSV and return the samples dictionary

Parameters:
  • input_file_or_directory (str) –

    filepath of samples TSV or directory

Returns:
  • dict
    • sample name (dict):
      • R1 (str): filepath of R1 reads file
      • R2 (str): filepath of R2 reads file or None for unpaired
      • S (str): filepath of singleton reads file or None

parse_tsv_file(file_path)

Parses a 2-4 column TSV file of sample names and sequencing reads (column 3/4 is optional)

Parameters:
  • file_path (str) –

    Path to the TSV file.

Returns:
  • tuple

    A tuple containing two lists: - paired_reads: A list of tuples with the sample name, R1 file, R2 file, and singleton file. - unpaired_reads: A list of tuples with the sample name and R1 file (for unpaired reads).

write_samples_tsv(dictionary, output_file)

Write the samples dictionary to a TSV file

Args:one dictionary: - sample name (dict): - R1 (str): filepath of R1 reads file - R2 (str): filepath of R2 reads file or None - S (str): filepath of singleton reads file or None output_file (str): filepath of output file for writing