Skip to content

Clunky formatting when reading picks.csv with pandas or csv #9

@lennijusten

Description

@lennijusten

The picks.csv output from PhaseNet contains some clunky formatting that requires the user to perform several string manipulations to properly format the itp, tp_prob, its, ts_prob columns.

I will show an example of reading the csv with pandas although reading the csv with the csv package runs into the same formatting issues . I will also share the function I had to make to correctly format the entries.

Pandas

import pandas as pd
df = pd.read_csv('output/picks.csv')

The result is a dataframe containing strings in the itp, tp_prob, its, ts_prob columns.

print(df['itp'][0])
>>>  '[   1 6620 8114]'

print(df['ts_prob'][0])
>>>  '[ 0.11291095  0.31720835  0.06021817]'

The values are not uniformly separated either which means the str.split() method can't be applied to convert the string into a list. Ideally, the csv would contain a uniform, comma-separated list of values. Another solution would be to also save a pickle file to the output directory that contains the lists in object form.

To fix the formatting with the current picks.csv, I made the following function:

import shlex
import pandas as pd

df = pd.read_csv('output/picks.csv')

def pickConverter(df):
    for col in ['itp', 'its']:
        pick_entry_list = []
        for x in range(len(df)):
            try:
                pick_entry_list.append(list(map(int, shlex.split(df[col][x].strip('[]')))))
            except AttributeError:
                pick_entry_list.append([])
                pass
        df[col] = pick_entry_list

    for col in ['tp_prob', 'ts_prob']:
        prob_entry_list = []
        for x in range(len(df)):
            try:
                prob_entry_list.append(list(map(float, shlex.split(df[col][x].strip('[]')))))
            except AttributeError:
                prob_entry_list.append([])
                pass
        df[col] = prob_entry_list
    return df

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions