log_parser

This commit is contained in:
Danil Negrienko 2024-03-10 04:09:28 -04:00
parent def44c9d46
commit bb65cd30cc
11 changed files with 563 additions and 0 deletions

View File

@ -0,0 +1,21 @@
{
"authors": [
"angelikatyborska"
],
"files": {
"solution": [
"lib/log_parser.ex"
],
"test": [
"test/log_parser_test.exs"
],
"exemplar": [
".meta/exemplar.ex"
]
},
"language_versions": ">=1.10",
"forked_from": [
"go/parsing-log-files"
],
"blurb": "Learn about regular expressions by parsing logs."
}

View File

@ -0,0 +1 @@
{"track":"elixir","exercise":"log-parser","id":"9e120b56068248ce8a9964de674001e5","url":"https://exercism.org/tracks/elixir/exercises/log-parser","handle":"negrienko","is_requester":true,"auto_approve":false}

View File

@ -0,0 +1,4 @@
# Used by "mix format"
[
inputs: ["{mix,.formatter}.exs", "{config,lib,test}/**/*.{ex,exs}"]
]

24
elixir/log-parser/.gitignore vendored Normal file
View File

@ -0,0 +1,24 @@
# The directory Mix will write compiled artifacts to.
/_build/
# If you run "mix test --cover", coverage assets end up here.
/cover/
# The directory Mix downloads your dependencies sources to.
/deps/
# Where third-party dependencies like ExDoc output generated docs.
/doc/
# Ignore .fetch files in case you like to edit your project deps locally.
/.fetch
# If the VM crashes, it generates a dump, let's ignore it too.
erl_crash.dump
# Also ignore archive artifacts (built via "mix archive.build").
*.ez
# Ignore package tarball (built via "mix hex.build").
log-parser-*.tar

75
elixir/log-parser/HELP.md Normal file
View File

@ -0,0 +1,75 @@
# Help
## Running the tests
From the terminal, change to the base directory of the exercise then execute the tests with:
```bash
$ mix test
```
This will execute the test file found in the `test` subfolder -- a file ending in `_test.exs`
Documentation:
* [`mix test` - Elixir's test execution tool](https://hexdocs.pm/mix/Mix.Tasks.Test.html)
* [`ExUnit` - Elixir's unit test library](https://hexdocs.pm/ex_unit/ExUnit.html)
## Pending tests
In test suites of practice exercises, all but the first test have been tagged to be skipped.
Once you get a test passing, you can unskip the next one by commenting out the relevant `@tag :pending` with a `#` symbol.
For example:
```elixir
# @tag :pending
test "shouting" do
assert Bob.hey("WATCH OUT!") == "Whoa, chill out!"
end
```
If you wish to run all tests at once, you can include all skipped test by using the `--include` flag on the `mix test` command:
```bash
$ mix test --include pending
```
Or, you can enable all the tests by commenting out the `ExUnit.configure` line in the file `test/test_helper.exs`.
```elixir
# ExUnit.configure(exclude: :pending, trace: true)
```
## Useful `mix test` options
* `test/<FILE>.exs:LINENUM` - runs only a single test, the test from `<FILE>.exs` whose definition is on line `LINENUM`
* `--failed` - runs only tests that failed the last time they ran
* `--max-failures` - the suite stops evaluating tests when this number of test failures
is reached
* `--seed 0` - disables randomization so the tests in a single file will always be ran
in the same order they were defined in
## Submitting your solution
You can submit your solution using the `exercism submit lib/log_parser.ex` command.
This command will upload your solution to the Exercism website and print the solution page's URL.
It's possible to submit an incomplete solution which allows you to:
- See how others have completed the exercise
- Request help from a mentor
## Need to get help?
If you'd like help solving the exercise, check the following pages:
- The [Elixir track's documentation](https://exercism.org/docs/tracks/elixir)
- The [Elixir track's programming category on the forum](https://forum.exercism.org/c/programming/elixir)
- [Exercism's programming category on the forum](https://forum.exercism.org/c/programming/5)
- The [Frequently Asked Questions](https://exercism.org/docs/using/faqs)
Should those resources not suffice, you could submit your (incomplete) solution to request mentoring.
If you're stuck on something, it may help to look at some of the [available resources](https://exercism.org/docs/tracks/elixir/resources) out there where answers might be found.

View File

@ -0,0 +1,50 @@
# Hints
## General
- Review regular expression patterns from the introduction. Remember, when creating the pattern a string, you must escape some characters.
- Read about the [`Regex` module][regex-docs] in the documentation.
- Read about the [regular expression sigil][sigils-regex] in the Getting Started guide.
- Check out this website about regular expressions: [Regular-Expressions.info][website-regex-info].
- Check out this website about regular expressions: [Rex Egg - The world's most tyrannosauical regex tutorial][website-rexegg].
- Check out this website about regular expressions: [RegexOne - Learn Regular Expressions with simple, interactive exercises][website-regexone].
- Check out this website about regular expressions: [Regular Expressions 101 - an online regex sandbox][website-regex-101].
- Check out this website about regular expressions: [RegExr - an online regex sandbox][website-regexr].
## 1. Identify garbled log lines
- Use the [`r` sigil][sigil-r] to create a regular expression.
- There is [an operator][match-operator] that can be used to check a string against a regular expression. There is also a [`Regex` function][regex-match] and a [`String` function][string-match] that can do the same.
- Don't forget to escape characters that have special meaning in regular expressions.
## 2. Split the log line
- There is a [`Regex` function][regex-split] as well as a [`String` function][string-split] that can split a string into a list of strings based on a regular expression.
- Don't forget to escape characters that have special meaning in regular expressions.
## 3. Remove artifacts from log
- There is a [`Regex` function][regex-replace] as well as a [`String` function][string-replace] that can change a part of a string that matches a given regular expression to a different string.
- There is a [modifier][regex-modifiers] that can make the whole regular expression case-insensitive.
## 4. Tag lines with user names
- There is a [`Regex` function][regex-run] that runs a regular expression against a string and returns all captures.
[regex-docs]: https://hexdocs.pm/elixir/Regex.html
[sigils-regex]: https://hexdocs.pm/elixir/sigils.html#regular-expressions
[website-regex-info]: https://www.regular-expressions.info
[website-rexegg]: https://www.rexegg.com/
[website-regexone]: https://regexone.com/
[website-regex-101]: https://regex101.com/
[website-regexr]: https://regexr.com/
[sigil-r]: https://hexdocs.pm/elixir/Kernel.html#sigil_r/2
[match-operator]: https://hexdocs.pm/elixir/Kernel.html#=~/2
[regex-match]: https://hexdocs.pm/elixir/Regex.html#match?/2
[string-match]: https://hexdocs.pm/elixir/String.html#match?/2
[regex-split]: https://hexdocs.pm/elixir/Regex.html#split/3
[string-split]: https://hexdocs.pm/elixir/String.html#split/3
[regex-replace]: https://hexdocs.pm/elixir/Regex.html#replace/4
[string-replace]: https://hexdocs.pm/elixir/String.html#replace/4
[regex-modifiers]: https://hexdocs.pm/elixir/Regex.html#module-modifiers
[regex-run]: https://hexdocs.pm/elixir/Regex.html#run/3

146
elixir/log-parser/README.md Normal file
View File

@ -0,0 +1,146 @@
# Log Parser
Welcome to Log Parser on Exercism's Elixir Track.
If you need help running the tests or submitting your code, check out `HELP.md`.
If you get stuck on the exercise, check out `HINTS.md`, but try and solve it without using those first :)
## Introduction
## Regular Expressions
Regular expressions in Elixir follow the **PCRE** specification (**P**erl **C**ompatible **R**egular **E**xpressions), similarly to other popular languages like Java, JavaScript, or Ruby.
The `Regex` module offers functions for working with regular expressions. Some of the `String` module functions accept regular expressions as arguments as well.
~~~~exercism/note
This exercise assumes that you already know regular expression syntax, including character classes, quantifiers, groups, and captures.
If you need a refresh your regular expression knowledge, check out one of those sources: [Regular-Expressions.info](https://www.regular-expressions.info), [Rex Egg](https://www.rexegg.com/), [RegexOne](https://regexone.com/), [Regular Expressions 101](https://regex101.com/), [RegExr](https://regexr.com/).
~~~~
### Sigils
The most common way to create regular expressions is using the `~r` sigil.
```elixir
~r/test/
```
Note that all Elixir sigils support [different kinds of delimiters][sigils], not only `/`.
### Matching
The `=~/2` operator can be used to perform a regex match that returns `boolean` result. Alternatively, there are also `match?/2` functions in the `Regex` module as well as the `String` module.
```elixir
"this is a test" =~ ~r/test/
# => true
String.match?("Alice has 7 apples", ~r/\d{2}/)
# => false
```
### Capturing
If a simple boolean check is not enough, use the `Regex.run/3` function to get a list of all captures (or `nil` if there was no match). The first element in the returned list is always a match for the whole regular expression, and the following elements are matched groups.
```elixir
Regex.run(~r/(\d) apples/, "Alice has 7 apples")
# => ["7 apples", "7"]
```
### Modifiers
The behavior of a regular expression can be modified by appending special flags. When using a sigil to create a regular expression, add the modifiers after the second delimiter.
Common modifiers are:
- `i` - makes the match case-insensitive.
- `u` - enables Unicode specific patterns like `\p` and causes character classes like `\w`, `\s` etc. to also match Unicode.
```elixir
"this is a TEST" =~ ~r/test/i
# => true
```
[sigils]: https://hexdocs.pm/elixir/syntax-reference.html#sigils
## Instructions
After a recent security review you have been asked to clean up the organization's archived log files.
## 1. Identify garbled log lines
You need some idea of how many log lines in your archive do not comply with current standards.
You believe that a simple test reveals whether a log line is valid.
To be considered valid a line should begin with one of the following strings:
- [DEBUG]
- [INFO]
- [WARNING]
- [ERROR]
Implement the `valid_line?/1` function to return `true` if the log line is valid.
```elixir
LogParser.valid_line?("[ERROR] Network Failure")
# => true
LogParser.valid_line?("Network Failure")
# => false
```
## 2. Split the log line
Shortly after starting the log parsing project, you realize that one application's logs aren't split into lines like the others. In this project, what should have been separate lines, is instead on a single line, connected by fancy arrows such as `<--->` or `<*~*~>`.
In fact, any string that has a first character of `<`, a last character of `>`, and any combination of the following characters `~`, `*`, `=`, and `-` in between can be used as a separator in this project's logs.
Implement the `split_line/1` function that takes a line and returns a list of strings.
```elixir
LogParser.split_line("[INFO] Start.<*>[INFO] Processing...<~~~>[INFO] Success.")
# => ["[INFO] Start.", "[INFO] Processing...", "[INFO] Success."]
```
## 3. Remove artifacts from log
You have found that some upstream processing of the logs has been scattering the text "end-of-line" followed by a line number (without an intervening space) throughout the logs.
Implement the `remove_artifacts/1` function to take a string and remove all occurrence end-of-line text (case-insensitive) and return a clean log line.
Lines not containing end-of-line text should be returned unmodified.
Just remove the end of line string, there's no need to adjust the whitespaces.
```elixir
LogParser.remove_artifacts("[WARNING] end-of-line23033 Network Failure end-of-line27")
# => "[WARNING] Network Failure "
```
## 4. Tag lines with user names
You have noticed that some of the log lines include sentences that refer to users.
These sentences always contain the string `"User"`, followed by one or more whitespace characters, and then a user name.
You decide to tag such lines.
Implement a function `tag_with_user_name/1` that processes log lines:
- Lines that do not contain the string `"User"` remain unchanged.
- For lines that contain the string `"User"`, prefix the line with `[USER]` followed by the user name.
```elixir
LogParser.tag_with_user_name("[INFO] User Alice created a new project")
# => "[USER] Alice [INFO] User Alice created a new project"
```
You can assume that:
- Each occurrence of the string `"User"` is followed by one or more whitespace character and the user name.
- There is at most one occurrence of the string `"User"` on each line.
- User names are non-empty strings that do not contain whitespace.
## Source
### Created by
- @angelikatyborska

View File

@ -0,0 +1,23 @@
defmodule LogParser do
def valid_line?(line) do
line =~ ~r/^\[DEBUG|INFO|WARNING|ERROR\]/u
end
def split_line(line), do: String.split(line, ~r/\<[\~\*\=\-]*\>/i)
def remove_artifacts(line), do: String.replace(line, ~r/end-of-line\d+/i, "")
def tag_with_user_name(line) do
if String.contains?(line, "User") do
map =
Regex.named_captures(
~r/^(?<type>\[\p{Lu}+\])(?<message>[\w\s]+User[\s\n]+(?<user>\S+)[\s\n]*.*)$/,
line
)
Enum.join(["[USER]", map["user"], map["type"] <> map["message"]], " ")
else
line
end
end
end

28
elixir/log-parser/mix.exs Normal file
View File

@ -0,0 +1,28 @@
defmodule LogParser.MixProject do
use Mix.Project
def project do
[
app: :log_parser,
version: "0.1.0",
# elixir: "~> 1.10",
start_permanent: Mix.env() == :prod,
deps: deps()
]
end
# Run "mix help compile.app" to learn about applications.
def application do
[
extra_applications: [:logger]
]
end
# Run "mix help deps" to learn about dependencies.
defp deps do
[
# {:dep_from_hexpm, "~> 0.3.0"},
# {:dep_from_git, git: "https://github.com/elixir-lang/my_dep.git", tag: "0.1.0"}
]
end
end

View File

@ -0,0 +1,189 @@
defmodule LogParserTest do
use ExUnit.Case
describe "valid_line?/1" do
@tag task_id: 1
test "valid DEBUG message" do
assert LogParser.valid_line?("[DEBUG] response time 3ms") == true
end
@tag task_id: 1
test "valid INFO message" do
assert LogParser.valid_line?("[INFO] the latest information") == true
end
@tag task_id: 1
test "valid WARNING message" do
assert LogParser.valid_line?("[WARNING] something might be wrong") == true
end
@tag task_id: 1
test "valid ERROR message" do
assert LogParser.valid_line?("[ERROR] something really bad happened") == true
end
@tag task_id: 1
test "unknown level" do
assert LogParser.valid_line?("[BOB] something really bad happened") == false
end
@tag task_id: 1
test "line must start with level" do
assert LogParser.valid_line?("bad start [DEBUG] ") == false
end
@tag task_id: 1
test "level must be wrapped in square brackets" do
assert LogParser.valid_line?("ERROR something really bad happened") == false
end
@tag task_id: 1
test "level must be uppercase" do
assert LogParser.valid_line?("[warning] something might be wrong") == false
end
end
describe "split_line/1" do
@tag task_id: 2
test "splits into three sections" do
assert LogParser.split_line("[INFO] Start.<*>[INFO] Processing...<~~~>[INFO] Success.") == [
"[INFO] Start.",
"[INFO] Processing...",
"[INFO] Success."
]
end
@tag task_id: 2
test "symbols =, ~, *, and - can be freely mixed" do
assert LogParser.split_line(
"[DEBUG] Attempt nr 2<=>[DEBUG] Attempt nr 3<-*~*->[ERROR] Failed to send SMS."
) == [
"[DEBUG] Attempt nr 2",
"[DEBUG] Attempt nr 3",
"[ERROR] Failed to send SMS."
]
end
@tag task_id: 2
test "symbols other than =, ~, *, or - do not split" do
assert LogParser.split_line(
"[INFO] Attempt nr 1<=!>[INFO] Attempt nr 2< >[INFO] Attempt nr 3"
) == [
"[INFO] Attempt nr 1<=!>[INFO] Attempt nr 2< >[INFO] Attempt nr 3"
]
end
@tag task_id: 2
test "symbols between angular brackets aren't required" do
assert LogParser.split_line("[INFO] Attempt nr 1<>[INFO] Attempt nr 2") == [
"[INFO] Attempt nr 1",
"[INFO] Attempt nr 2"
]
end
@tag task_id: 2
test "angular brackets are required" do
assert LogParser.split_line("[ERROR] Failed to send SMS**[ERROR] Invalid API key.") == [
"[ERROR] Failed to send SMS**[ERROR] Invalid API key."
]
end
@tag task_id: 2
test "angular brackets must be closed required" do
assert LogParser.split_line("[ERROR] Failed to send SMS<**[ERROR] Invalid API key.") == [
"[ERROR] Failed to send SMS<**[ERROR] Invalid API key."
]
end
end
describe "remove_artifacts/1" do
@tag task_id: 3
test "removes a single 'end-of-line' followed by a line number" do
assert LogParser.remove_artifacts("[WARNING] Network Failure end-of-line27") ==
"[WARNING] Network Failure "
end
@tag task_id: 3
test "leaves other lines unchanged" do
assert LogParser.remove_artifacts("[DEBUG] Process started") ==
"[DEBUG] Process started"
end
@tag task_id: 3
test "removes multiple 'end-of-line's followed by line numbers" do
assert LogParser.remove_artifacts(
"[WARNING] end-of-line23033 Network Failure end-of-line27"
) == "[WARNING] Network Failure "
end
@tag task_id: 3
test "removes 'end-of-line' and line numbers even when not separated form the rest of the log by a space" do
assert LogParser.remove_artifacts("[WARNING]end-of-line23033Network Failureend-of-line27") ==
"[WARNING]Network Failure"
end
@tag task_id: 3
test "does not remove 'end-of-line' if not followed by a line number" do
assert LogParser.remove_artifacts("[INFO] end-of-line User disconnected end-of-lineXYZ") ==
"[INFO] end-of-line User disconnected end-of-lineXYZ"
end
@tag task_id: 3
test "does not remove 'end-of-line' if a number is separated by a space" do
assert LogParser.remove_artifacts("[DEBUG] Query runtime:end-of-line 6ms") ==
"[DEBUG] Query runtime:end-of-line 6ms"
end
@tag task_id: 3
test "is case-insensitive" do
assert LogParser.remove_artifacts("[DEBUG] END-of-LINE77 Process started End-Of-Line09") ==
"[DEBUG] Process started "
end
end
describe "tag_with_user_name/1" do
@tag task_id: 4
test "extracts user name and appends it to the line" do
assert LogParser.tag_with_user_name("[WARN] User James123 has exceeded storage space") ==
"[USER] James123 [WARN] User James123 has exceeded storage space"
end
@tag task_id: 4
test "leaves other lines unchanged" do
assert LogParser.tag_with_user_name("[DEBUG] Process started") ==
"[DEBUG] Process started"
end
@tag task_id: 4
test "multiple spaces can appear after the word 'User'" do
assert LogParser.tag_with_user_name("[INFO] User Bob9 reported post fxa3qa") ==
"[USER] Bob9 [INFO] User Bob9 reported post fxa3qa"
end
@tag task_id: 4
test "user name can be delimited by tabs" do
assert LogParser.tag_with_user_name(
"[ERROR] User\t!!!\tdoes not have a valid payment method"
) ==
"[USER] !!! [ERROR] User\t!!!\tdoes not have a valid payment method"
end
@tag task_id: 4
test "user name can be delimited by new lines" do
assert LogParser.tag_with_user_name("[DEBUG] Created User\nAlice908101\nat 14:02") ==
"[USER] Alice908101 [DEBUG] Created User\nAlice908101\nat 14:02"
end
@tag task_id: 4
test "user name can end with the end of the line" do
assert LogParser.tag_with_user_name("[INFO] New log in for User __JOHNNY__") ==
"[USER] __JOHNNY__ [INFO] New log in for User __JOHNNY__"
end
@tag task_id: 4
test "works for Ukrainian user names with emoji" do
assert LogParser.tag_with_user_name("[INFO] Promoted User АНАСТАСІЯ_🙂 to admin") ==
"[USER] АНАСТАСІЯ_🙂 [INFO] Promoted User АНАСТАСІЯ_🙂 to admin"
end
end
end

View File

@ -0,0 +1,2 @@
ExUnit.start()
ExUnit.configure(exclude: :pending, trace: true, seed: 0)