Internal Functions
FMDData.states_dict Constant
states_dict
A Dictionary of States/UTs that can appear in the data set. The keys will be returned in the cleaning steps, and the values can be matched in the underlying datasets.
sourceFMDData._calculate_state_counts Method
_calculate_state_counts(table, original_df)
An internal function to handle the calculation of the state/serotype counts based upon the provided state/serotype seroprevalence values and total state counts. Because DataFrames handles tables as named tuples, we can extract information about the columns being passed from the regex selection and then use substitution strings to collect a view of the correct column of total state counts.
You probably want to use the user-facing function calculate_state_counts()
instead.
FMDData._calculate_state_seroprevalence Method
_calculate_state_seroprevalence(table, original_df)
An internal function to handle the calculation of the state/serotype counts based upon the provided state/serotype seroprevalence values and total state counts. Because DataFrames handles tables as named tuples, we can extract information about the columns being passed from the regex selection and then use substitution strings to collect a view of the correct column of total state counts.
You probably want to use the user-facing function calculate_state_seroprevalence()
instead.
FMDData._calculate_string_occurences Method
_calculate_string_occurences(
vals::Vector{S},
unique_vals::Vector{S} = unique(vals)
) where {S <: AbstractString}
Internal function to calculate how many times each unique string value occurs in a vector of strings
sourceFMDData._calculate_totals! Method
_calculate_totals!(
totals_dict::OrderedDict,
col::Vector{T},
colname::String,
) where {T <: Union{<:Union{<:Missing, <:Integer}, <:Integer}}
Internal function to calculate the serotype total.
sourceFMDData._check_all_required_serotypes Method
_check_all_required_serotypes(
all_matched_serotypes::T,
allowed_serotypes::T = default_allowed_serotypes,
) where {T <: AbstractVector{<:AbstractString}}
Internal function to check that all required serotypes provided in the data.
sourceFMDData._check_identical_column_names Method
_check_identical_column_names(df::DataFrame)
Check if the provided data has any duplicate column names.
Should be run BEFORE _check_similar_column_names()
as push!()
call in _check_similar_column_names
will overwrite previous Dict entry key (of similar column names) if there are exact matches.
FMDData._check_no_disallowed_serotypes Method
_check_no_disallowed_serotypes(
all_matched_serotypes::T,
allowed_serotypes::T = default_allowed_serotypes,
) where {T <: AbstractVector{<:AbstractString}}
Internal function to check that there are no disallowed serotypes provided in the data.
sourceFMDData._check_similar_column_names Method
_check_similar_column_names(df::DataFrame)
Check if any columns have similar names. Calculates if any column names are substrings of other columns names.
Should be run AFTER _check_identical_column_names()
as push!()
call will overwrite previous Dict entry key if there are exact matches.
FMDData._collect_totals_check_args Method
_collect_totals_check_args(
col::Vector{T},
colname::String,
_...
) where {T <: Union{Union{<:Missing, <:Integer}, <:Integer}}
Collect the necessary arguments to provide to the _calculate_totals!()
function for count-based columns. Uses _...
varargs to denote that additional arguments (relevant for seroprevalence calculations in other methods of this function) might be passed but are not used in this specific method for integer/count columns.
Arguments
col::Vector{T}
: The column vector of counts.colname::String
: The name of the column._...
: Varargs for unused parameters in this method.
Returns a Try.Ok
containing a tuple (col, colname)
to be unpacked and passed to _calculate_totals!
.
FMDData._combine_error_messages Method
_combine_error_messages(arr_of_errs::AbstractVector{T}; filter_ok = false) where {T <: Try.InternalPrelude.AbstractResult}
Internal function. Combines error messages from a vector of Try
results into a single string.
This is useful for aggregating multiple errors into a single, more informative error message.
Arguments
arr_of_errs
: A vector ofTry.Ok
orTry.Err
objects.filter_ok
: Iftrue
,Try.Ok
results are filtered out before combining messages. Defaults tofalse
.
FMDData._correct_serotype_counts! Method
_correct_serotype_counts!(
df::DataFrame;
statename_column = :states_ut,
allowed_serotypes = default_allowed_serotypes,
reg::Regex
)
Correct any serotype counts that have been miscalculated during the inferral steps, arising from rounding errors in the provided seroprevalence numbers that are then translated into counts to difference between initial and later dataframes. If the pre or post counts for all serotypes are 0, then all serotype specific counts must be 0 as well, so correct.
sourceFMDData._log_try_error Function
_log_try_error(res, type::Symbol = :Error; unwrap_ok = true)
Internal function. Checks a Try
result. If it's an Err
, it logs the error message and returns the unwrapped error. If it's an Ok
, it returns the unwrapped value by default.
This function helps manage control flow by logging non-critical errors without halting execution, while still allowing critical errors to be propagated.
Arguments
res
: TheTry.Ok
orTry.Err
object to check.type::Symbol
: The logging level to use ifres
is anErr
. Can be:Error
,:Warn
, or:Info
. Defaults to:Error
.unwrap_ok::Bool
: Iftrue
, returns the unwrapped value of anOk
result. Iffalse
, returns theTry.Ok
object itself. Defaults totrue
.
FMDData._totals_row_selectors Function
_totals_row_selectors(
df::DataFrame,
column::Symbol = :states_ut,
totals_key = "total";
allowed_serotypes = vcat("all", default_allowed_serotypes),
reg::Regex
)
Internal function to extract the totals row and the subset of dataframe rows that match the regex.
sourceFMDData._unwrap_err_or_empty_str Method
_unwrap_err_or_empty_str(res)
Internal funciton. Unwraps a Try.Err
to get its error message, or returns an empty string for a Try.Ok
.
This function is a helper for _combine_error_messages
, ensuring that only error messages are included in the final combined string.
FMDData.collect_all_present_serotypes Function
collect_all_present_serotypes(df::DataFrame, reg::Regex)
Return a vector of all column names that contain serotype information specified in the regex.
sourceFMDData.correct_state_name Method
correct_state_name(
input_name::String,
states_dict::Dict = FMDData.states_dict
)
Check if a state name is correctly spelled, or previously characterized and matched with a correct name. Returns the correct name if possible, or errors.
source